1
|
Mugurungi O, Mbunge E, Birri-Makota R, Chingombe I, Mapingure M, Moyo B, Mpofu A, Batani J, Muchemwa B, Samba C, Murigo D, Sibindi M, Moyo E, Dzinamarira T, Musuka G. Predicting sexually transmitted infections among men who have sex with men in Zimbabwe using deep learning and ensemble machine learning models. PLOS DIGITAL HEALTH 2024; 3:e0000541. [PMID: 38959248 PMCID: PMC11221700 DOI: 10.1371/journal.pdig.0000541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 05/28/2024] [Indexed: 07/05/2024]
Abstract
There is a substantial increase in sexually transmitted infections (STIs) among men who have sex with men (MSM) globally. Unprotected sexual practices, multiple sex partners, criminalization, stigmatisation, fear of discrimination, substance use, poor access to care, and lack of early STI screening tools are among the contributing factors. Therefore, this study applied multilayer perceptron (MLP), extremely randomized trees (ExtraTrees) and XGBoost machine learning models to predict STIs among MSM using bio-behavioural survey (BBS) data in Zimbabwe. Data were collected from 1538 MSM in Zimbabwe. The dataset was split into training and testing sets using the ratio of 80% and 20%, respectively. The synthetic minority oversampling technique (SMOTE) was applied to address class imbalance. Using a stepwise logistic regression model, the study revealed several predictors of STIs among MSM such as age, cohabitation with sex partners, education status and employment status. The results show that MLP performed better than STI predictive models (XGBoost and ExtraTrees) and achieved accuracy of 87.54%, recall of 97.29%, precision of 89.64%, F1-Score of 93.31% and AUC of 66.78%. XGBoost also achieved an accuracy of 86.51%, recall of 96.51%, precision of 89.25%, F1-Score of 92.74% and AUC of 54.83%. ExtraTrees recorded an accuracy of 85.47%, recall of 95.35%, precision of 89.13%, F1-Score of 92.13% and AUC of 60.21%. These models can be effectively used to identify highly at-risk MSM, for STI surveillance and to further develop STI infection screening tools to improve health outcomes of MSM.
Collapse
Affiliation(s)
- Owen Mugurungi
- AIDS and TB Programme, Ministry of Health and Child Care, AIDS & TB Programme, Harare, Zimbabwe
| | - Elliot Mbunge
- Department of Computer Science, University of Eswatini, P Bag 4 Kwaluseni Campus, Swaziland
| | - Rutendo Birri-Makota
- Department of Medicine, University of Zimbabwe College of Health Sciences, Harare, Zimbabwe
| | | | | | - Brian Moyo
- AIDS and TB Programme, Ministry of Health and Child Care, AIDS & TB Programme, Harare, Zimbabwe
| | - Amon Mpofu
- National AIDS Commission, Harare, Zimbabwe
| | - John Batani
- Faculty of Engineering and Technology, Botho University, Maseru, Lesotho
| | - Benhildah Muchemwa
- Department of Computer Science, University of Eswatini, P Bag 4 Kwaluseni Campus, Swaziland
| | | | | | | | - Enos Moyo
- Department of Medicine, University of Zimbabwe College of Health Sciences, Harare, Zimbabwe
| | | | - Godfrey Musuka
- Innovative Public Health and Development, Harare, Zimbabwe
- International Initiative for Impact Evaluation (3ie). Harare, Zimbabwe
| |
Collapse
|
2
|
Ge Q, Lu X, Jiang R, Zhang Y, Zhuang X. Data mining and machine learning in HIV infection risk research: An overview and recommendations. Artif Intell Med 2024; 153:102887. [PMID: 38735156 DOI: 10.1016/j.artmed.2024.102887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 03/07/2024] [Accepted: 04/27/2024] [Indexed: 05/14/2024]
Abstract
In the contemporary era, the applications of data mining and machine learning have permeated extensively into medical research, significantly contributing to areas such as HIV studies. By reviewing 38 articles published in the past 15 years, the study presents a roadmap based on seven different aspects, utilizing various machine learning techniques for both novice researchers and experienced researchers seeking to comprehend the current state of the art in this area. While traditional regression modeling techniques have been commonly used, researchers are increasingly adopting more advanced fully supervised machine learning and deep learning techniques, which often outperform the traditional methods in predictive performance. Additionally, the study identifies nine new open research issues and outlines possible future research plans to enhance the outcomes of HIV infection risk research. This review is expected to be an insightful guide for researchers, illuminating current practices and suggesting advancements in the field.
Collapse
Affiliation(s)
- Qiwei Ge
- Department of Epidemiology and Medical Statistics, School of Public Health, Nantong University, China
| | - Xinyu Lu
- Department of Epidemiology and Medical Statistics, School of Public Health, Nantong University, China
| | - Run Jiang
- Department of Epidemiology and Medical Statistics, School of Public Health, Nantong University, China
| | - Yuyu Zhang
- Department of Epidemiology and Medical Statistics, School of Public Health, Nantong University, China
| | - Xun Zhuang
- Department of Epidemiology and Medical Statistics, School of Public Health, Nantong University, China.
| |
Collapse
|
3
|
Alie MS, Negesse Y. Machine learning prediction of adolescent HIV testing services in Ethiopia. Front Public Health 2024; 12:1341279. [PMID: 38560439 PMCID: PMC10981275 DOI: 10.3389/fpubh.2024.1341279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 03/04/2024] [Indexed: 04/04/2024] Open
Abstract
Background Despite endeavors to achieve the Joint United Nations Programme on HIV/AIDS 95-95-95 fast track targets established in 2014 for HIV prevention, progress has fallen short. Hence, it is imperative to identify factors that can serve as predictors of an adolescent's HIV status. This identification would enable the implementation of targeted screening interventions and the enhancement of healthcare services. Our primary objective was to identify these predictors to facilitate the improvement of HIV testing services for adolescents in Ethiopia. Methods A study was conducted by utilizing eight different machine learning techniques to develop models using demographic and health data from 4,502 adolescent respondents. The dataset consisted of 31 variables and variable selection was done using different selection methods. To train and validate the models, the data was randomly split into 80% for training and validation, and 20% for testing. The algorithms were evaluated, and the one with the highest accuracy and mean f1 score was selected for further training using the most predictive variables. Results The J48 decision tree algorithm has proven to be remarkably successful in accurately detecting HIV positivity, outperforming seven other algorithms with an impressive accuracy rate of 81.29% and a Receiver Operating Characteristic (ROC) curve of 86.3%. The algorithm owes its success to its remarkable capability to identify crucial predictor features, with the top five being age, knowledge of HIV testing locations, age at first sexual encounter, recent sexual activity, and exposure to family planning. Interestingly, the model's performance witnessed a significant improvement when utilizing only twenty variables as opposed to including all variables. Conclusion Our research findings indicate that the J48 decision tree algorithm, when combined with demographic and health-related data, is a highly effective tool for identifying potential predictors of HIV testing. This approach allows us to accurately predict which adolescents are at a high risk of infection, enabling the implementation of targeted screening strategies for early detection and intervention. To improve the testing status of adolescents in the country, we recommend considering demographic factors such as age, age at first sexual encounter, exposure to family planning, recent sexual activity, and other identified predictors.
Collapse
Affiliation(s)
- Melsew Setegn Alie
- Department of Public Health, School of Public Health, College of Medicine and Health Science, Mizan-Tepi University, Mizan-Aman, Ethiopia
| | - Yilkal Negesse
- Department of Public Health, College of Medicine and Health Science, Debre-Markos University, Gojjam, Ethiopia
| |
Collapse
|
4
|
Hu M, Peng H, Zhang X, Wang L, Ren J. Building gender-specific sexually transmitted infection risk prediction models using CatBoost algorithm and NHANES data. BMC Med Inform Decis Mak 2024; 24:24. [PMID: 38267946 PMCID: PMC10809625 DOI: 10.1186/s12911-024-02426-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Accepted: 01/15/2024] [Indexed: 01/26/2024] Open
Abstract
BACKGROUND AND AIMS Sexually transmitted infections (STIs) are a significant global public health challenge due to their high incidence rate and potential for severe consequences when early intervention is neglected. Research shows an upward trend in absolute cases and DALY numbers of STIs, with syphilis, chlamydia, trichomoniasis, and genital herpes exhibiting an increasing trend in age-standardized rate (ASR) from 2010 to 2019. Machine learning (ML) presents significant advantages in disease prediction, with several studies exploring its potential for STI prediction. The objective of this study is to build males-based and females-based STI risk prediction models based on the CatBoost algorithm using data from the National Health and Nutrition Examination Survey (NHANES) for training and validation, with sub-group analysis performed on each STI. The female sub-group also includes human papilloma virus (HPV) infection. METHODS The study utilized data from the National Health and Nutrition Examination Survey (NHANES) program to build males-based and females-based STI risk prediction models using the CatBoost algorithm. Data was collected from 12,053 participants aged 18 to 59 years old, with general demographic characteristics and sexual behavior questionnaire responses included as features. The Adaptive Synthetic Sampling Approach (ADASYN) algorithm was used to address data imbalance, and 15 machine learning algorithms were evaluated before ultimately selecting the CatBoost algorithm. The SHAP method was employed to enhance interpretability by identifying feature importance in the model's STIs risk prediction. RESULTS The CatBoost classifier achieved AUC values of 0.9995, 0.9948, 0.9923, and 0.9996 and 0.9769 for predicting chlamydia, genital herpes, genital warts, gonorrhea, and overall STIs infections among males. The CatBoost classifier achieved AUC values of 0.9971, 0.972, 0.9765, 1, 0.9485 and 0.8819 for predicting chlamydia, genital herpes, genital warts, gonorrhea, HPV and overall STIs infections among females. The characteristics of having sex with new partner/year, times having sex without condom/year, and the number of female vaginal sex partners/lifetime have been identified as the top three significant predictors for the overall risk of male STIs. Similarly, ever having anal sex with a man, age and the number of male vaginal sex partners/lifetime have been identified as the top three significant predictors for the overall risk of female STIs. CONCLUSIONS This study demonstrated the effectiveness of the CatBoost classifier in predicting STI risks among both male and female populations. The SHAP algorithm revealed key predictors for each infection, highlighting consistent demographic characteristics and sexual behaviors across different STIs. These insights can guide targeted prevention strategies and interventions to alleviate the impact of STIs on public health.
Collapse
Affiliation(s)
- Mengjie Hu
- Department of General Practice, First Affiliated Hospital, Zhejiang University School of Medicine, 310003, Hangzhou, China
| | - Han Peng
- Clinical Research Institute, Zhejiang Provincial People's Hospital (Affiliated People's Hospital of Hangzhou Medical College), Hangzhou, China
| | - Xuan Zhang
- Department of Cardiology, The First Affiliated Hospital, Zhejiang University School of Medicine, 310003, Hangzhou, China
| | - Lefeng Wang
- Kidney Disease Center, the First Affiliated Hospital, College of Medicine, Zhejiang University, 310003, Hangzhou, China
| | - Jingjing Ren
- Department of General Practice, First Affiliated Hospital, Zhejiang University School of Medicine, 310003, Hangzhou, China.
| |
Collapse
|
5
|
Li J, Hao Y, Liu Y, Wu L, Liang H, Ni L, Wang F, Wang S, Duan Y, Xu Q, Xiao J, Yang D, Gao G, Ding Y, Gao C, Xiao J, Zhao H. Supervised machine learning algorithms to predict the duration and risk of long-term hospitalization in HIV-infected individuals: a retrospective study. Front Public Health 2024; 11:1282324. [PMID: 38249414 PMCID: PMC10796994 DOI: 10.3389/fpubh.2023.1282324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 12/13/2023] [Indexed: 01/23/2024] Open
Abstract
Objective The study aimed to use supervised machine learning models to predict the length and risk of prolonged hospitalization in PLWHs to help physicians timely clinical intervention and avoid waste of health resources. Methods Regression models were established based on RF, KNN, SVM, and XGB to predict the length of hospital stay using RMSE, MAE, MAPE, and R2, while classification models were established based on RF, KNN, SVM, NN, and XGB to predict risk of prolonged hospital stay using accuracy, PPV, NPV, specificity, sensitivity, and kappa, and visualization evaluation based on AUROC, AUPRC, calibration curves and decision curves of all models were used for internally validation. Results In regression models, XGB model performed best in the internal validation (RMSE = 16.81, MAE = 10.39, MAPE = 0.98, R2 = 0.47) to predict the length of hospital stay, while in classification models, NN model presented good fitting and stable features and performed best in testing sets, with excellent accuracy (0.7623), PPV (0.7853), NPV (0.7092), sensitivity (0.8754), specificity (0.5882), and kappa (0.4672), and further visualization evaluation indicated that the largest AUROC (0.9779), AUPRC (0.773) and well-performed calibration curve and decision curve in the internal validation. Conclusion This study showed that XGB model was effective in predicting the length of hospital stay, while NN model was effective in predicting the risk of prolonged hospitalization in PLWH. Based on predictive models, an intelligent medical prediction system may be developed to effectively predict the length of stay and risk of HIV patients according to their medical records, which helped reduce the waste of healthcare resources.
Collapse
Affiliation(s)
- Jialu Li
- Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Yiwei Hao
- Division of Medical Record and Statistics, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Ying Liu
- Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Liang Wu
- Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Hongyuan Liang
- Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Liang Ni
- Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Fang Wang
- Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Sa Wang
- Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Yujiao Duan
- Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Qiuhua Xu
- Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Jinjing Xiao
- Department of Clinical Medicine, Zhengzhou University, Zhengzhou, China
| | - Di Yang
- Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Guiju Gao
- Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Yi Ding
- Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Chengyu Gao
- Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Jiang Xiao
- Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Hongxin Zhao
- Clinical and Research Center of AIDS, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| |
Collapse
|
6
|
Wong NS, Tang W, Miller WC, Ong JJ, Lee SS. Expanded HIV testing in non-key populations - the neglected strategy for minimising late diagnosis. Int J Infect Dis 2024; 138:38-40. [PMID: 38036260 DOI: 10.1016/j.ijid.2023.11.034] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2023] Open
Affiliation(s)
- Ngai Sze Wong
- S.H. Ho Research Centre for Infectious Diseases, The Chinese University of Hong Kong, Hong Kong, China; Stanley Ho Centre for Emerging Infectious Diseases, The Chinese University of Hong Kong, Hong Kong, China; JC School of Public Health and Primary Care, The Chinese University of Hong Kong, Hong Kong, China.
| | - Weiming Tang
- University of North Carolina Chapel Hill Project-China, Guangzhou, China
| | - William C Miller
- Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Jason J Ong
- Melbourne Sexual Health Centre, Alfred Health, Melbourne, Australia; Central Clinical School, Monash University, Melbourne, Australia; Clinical Research Department, London School of Hygiene and Tropical Medicine, London, UK
| | - Shui Shan Lee
- S.H. Ho Research Centre for Infectious Diseases, The Chinese University of Hong Kong, Hong Kong, China; Stanley Ho Centre for Emerging Infectious Diseases, The Chinese University of Hong Kong, Hong Kong, China
| |
Collapse
|