1
|
Khalifa A, Ssekubugu R, Lessler J, Wawer M, Santelli JS, Hoffman S, Nalugoda F, Lutalo T, Ndyanabo A, Ssekasanvu J, Kigozi G, Kagaayi J, Chang LW, Grabowski MK. Implications of rapid population growth on survey design and HIV estimates in the Rakai Community Cohort Study (RCCS), Uganda. BMJ Open 2023; 13:e071108. [PMID: 37495389 PMCID: PMC10373715 DOI: 10.1136/bmjopen-2022-071108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 07/28/2023] Open
Abstract
OBJECTIVE Since rapid population growth challenges longitudinal population-based HIV cohorts in Africa to maintain coverage of their target populations, this study evaluated whether the exclusion of some residents due to growing population size biases key HIV metrics like prevalence and population-level viremia. DESIGN, SETTING AND PARTICIPANTS Data were obtained from the Rakai Community Cohort Study (RCCS) in south central Uganda, an open population-based cohort which began excluding some residents of newly constructed household structures within its surveillance boundaries in 2008. The study includes adults aged 15-49 years who were censused from 2019 to 2020. MEASURES We fit ensemble machine learning models to RCCS census and survey data to predict HIV seroprevalence and viremia (prevalence of those with viral load >1000 copies/mL) in the excluded population and evaluated whether their inclusion would change overall estimates. RESULTS Of the 24 729 census-eligible residents, 2920 (12%) residents were excluded from the RCCS because they were living in new households. The predicted seroprevalence for these excluded residents was 10.8% (95% CI: 9.6% to 11.8%)-somewhat lower than 11.7% (95% CI: 11.2% to 12.3%) in the observed sample. Predicted seroprevalence for younger excluded residents aged 15-24 years was 4.9% (95% CI: 3.6% to 6.1%)-significantly higher than that in the observed sample for the same age group (2.6% (95% CI: 2.2% to 3.1%)), while predicted seroprevalence for older excluded residents aged 25-49 years was 15.0% (95% CI: 13.3% to 16.4%)-significantly lower than their counterparts in the observed sample (17.2% (95% CI: 16.4% to 18.1%)). Over all ages, the predicted prevalence of viremia in excluded residents (3.7% (95% CI: 3.0% to 4.5%)) was significantly higher than that in the observed sample (1.7% (95% CI: 1.5% to 1.9%)), resulting in a higher overall population-level viremia estimate of 2.1% (95% CI: 1.8% to 2.4%). CONCLUSIONS Exclusion of residents in new households may modestly bias HIV viremia estimates and some age-specific seroprevalence estimates in the RCCS. Overall, HIV seroprevalence estimates were not significantly affected.
Collapse
Affiliation(s)
- Aleya Khalifa
- Department of Epidemiology, Columbia University Mailman School of Public Health, New York, New York, USA
- ICAP, Columbia University, New York, New York, USA
| | - Robert Ssekubugu
- Rakai Health Sciences Program, Kalisizo, Uganda
- Department of Global and Sexual Health, Karolinska Institutet, Stockholm, Sweden
| | - Justin Lessler
- Department of Epidemiology, University of North Carolina School of Public Health, Chapel Hill, North Carolina, USA
- Carolina Population Center, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Maria Wawer
- Rakai Health Sciences Program, Kalisizo, Uganda
- Department of Epidemiology, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - John S Santelli
- Population and Family Health, Columbia University Mailman School of Public Health, New York, New York, USA
| | - Susie Hoffman
- Department of Epidemiology, Columbia University, New York, New York, USA
- HIV Centre for Clinical and Behavioural Studies, Columbia University Irving Medical Centre, New York, New York, USA
| | | | - Tom Lutalo
- Rakai Health Sciences Program, Kalisizo, Uganda
| | | | - Joseph Ssekasanvu
- Rakai Health Sciences Program, Kalisizo, Uganda
- Department of Epidemiology, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland, USA
| | | | | | - Larry W Chang
- Rakai Health Sciences Program, Kalisizo, Uganda
- Department of Epidemiology, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland, USA
- Division of Infectious Diseases, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Mary Kathryn Grabowski
- Rakai Health Sciences Program, Kalisizo, Uganda
- Department of Epidemiology, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland, USA
- Department of Pathology, Johns Hopkins School of Medicine, Baltimore, Maryland, USA
| |
Collapse
|
2
|
A hybrid super ensemble learning model for the early-stage prediction of diabetes risk. Med Biol Eng Comput 2023; 61:785-797. [PMID: 36602674 DOI: 10.1007/s11517-022-02749-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Accepted: 12/22/2022] [Indexed: 01/06/2023]
Abstract
Diabetes mellitus has become a rapidly growing chronic health problem worldwide. There has been a noticeable increase in diabetes cases in the last two decades. Recent advances in ensemble machine learning methods play an important role in the early detection of diabetes mellitus. These methods are both faster and less costly than traditional methods. This study aims to propose a new super ensemble learning model to enable an early diagnosis of diabetes mellitus. Super learner is a cross-validation-based approach that makes better predictions by combining prediction results of more than one machine learning algorithm. The proposed super learner model was created with four base-learners (logistic regression, decision tree, random forest, gradient boosting) and a meta learner (support vector machines) as a result of a case study. Three different dataset were used to measure the robustness of the proposed model. Chi-square was determined as an optimal feature selection technique from five different techniques, and also hyper-parameter settings were made with GridSearch. Finally, the proposed new super learner model achieved to obtain the best accuracy results in the detection of Diabetes mellitus compared to the base-learners for the early-stage diabetes risk prediction (99.6%), PIMA (92%), and diabetes 130-US hospitals (98%) dataset, respectively. This study revealed that super learner algorithms can be effectively used in the detection of diabetes mellitus. Also, obtaining of the high and convincing statistical scores shows the robustness of the proposed super learner model.
Collapse
|
3
|
Lee J, Hong H, Song JM, Yeom E. Neural network ensemble model for prediction of erythrocyte sedimentation rate (ESR) using partial least squares regression. Sci Rep 2022; 12:19618. [PMID: 36379969 PMCID: PMC9666533 DOI: 10.1038/s41598-022-23174-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Accepted: 10/26/2022] [Indexed: 11/16/2022] Open
Abstract
The erythrocyte sedimentation rate (ESR) is a non-specific blood test for determining inflammatory conditions. However, the long measurement time (60 min) to obtain ESR is an obstacle for a prompt evaluation. In this study, to reduce the measurement time of ESR, deep neural networks (DNNs) were applied to the sedimentation tendency of blood samples. DNNs using multilayer perceptron (MLP), long short-term memory (LSTM), and gated recurrent unit (GRU) were assessed and compared to determine a suitable length of time for the input sequence. To avoid overfitting, a stacking ensemble learning was adopted, which combines multiple models by using a meta model. Four meta models were compared: mean, median, least absolute shrinkage and selection operator, and partial least squares regression (PLSR) schemes. From the empirical results, LSTM and GRU models have better prediction than MLP over sequence lengths of 5 to 20 min. The decrease in [Formula: see text] and [Formula: see text] of GRU and LSTM was attenuated after a sequence length of 15 min, so the input sequence length is determined as 15 min. In terms of the meta model, the statistical comparison suggests that GRU combined with PLSR (GRU-PLSR) is the best case. Then, the GRU-PLSR was tested for prediction of ESR data obtained from periodontitis patients to check its applicability to a specific disease. The Bland-Altman plot shows acceptable agreement between measured and predicted ESR values. Based on the results, the GRU-PLSR can predict ESR with improved performance within 15 min and has potential applicability to ESR data with inflammatory and non-inflammatory conditions.
Collapse
Affiliation(s)
- Jaejin Lee
- grid.262229.f0000 0001 0719 8572School of Mechanical Engineering, Pusan National University, Busan, South Korea
| | - Hyeonji Hong
- grid.262229.f0000 0001 0719 8572School of Mechanical Engineering, Pusan National University, Busan, South Korea
| | - Jae Min Song
- grid.262229.f0000 0001 0719 8572Department of Oral and Maxillofacial Surgery, School of Dentistry, Pusan National University, Yangsan, South Korea ,grid.262229.f0000 0001 0719 8572Dental and Life Science Institute, School of Dentistry, Pusan National University, Yangsan, South Korea
| | - Eunseop Yeom
- grid.262229.f0000 0001 0719 8572School of Mechanical Engineering, Pusan National University, Busan, South Korea
| |
Collapse
|