1
|
Zhang S, Kang C, Cui J, Xue H, Zhao S, Chen Y, Lu H, Ye L, Wang D, Chen F, Zhao Y, Pei L, Qu P. Development of machine learning-based models to predict congenital heart disease: A matched case-control study. Int J Med Inform 2025; 195:105741. [PMID: 39647289 DOI: 10.1016/j.ijmedinf.2024.105741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 10/08/2024] [Accepted: 11/30/2024] [Indexed: 12/10/2024]
Abstract
BACKGROUND The current congenital heart disease (CHD) prediction tools lack adequate interpretability and convenience, hindering the development of personalized CHD management strategies. We developed a machine learning-based risk stratification model for CHD prediction. METHODS This study utilized data from 1,759 participants in a case-control study of CHD conducted across six birth defects surveillance hospitals located in Xi'an, Shaanxi Province, Northwest China, spanning from January 2014 to December 2016. The data was partitioned into training and testing datasets with a ratio of 7:3. Predictors were selected from a total of 47 input variables through the Least Absolute Shrinkage and Selection Operator (LASSO). Five machine learning algorithms were used to build the CHD risk prediction models. Model performance was assessed based on a range of learning metrics, including the area under the receiver operating characteristic curve (AUROC), F1 score, and Brier score. Permutation feature importance was employed to elucidate the prediction model. The best-performing model was used to conduct the risk scores. RESULTS The eXtreme Gradient Boosting (XGB) model demonstrated superior performance among CHD prediction models, achieving an AUROC of 0.772 (95 % CI 0.728, 0.817) in the testing dataset and 0.738 (0.699, 0.775) in the external validation dataset. The pivotal predictors (top 3) identified by the model included living in rural areas, the low wealth index, and folic acid supplements (<90 days). The resultant risk score exhibited robust calibration capabilities. Utilizing the risk scores, participants were stratified into low, moderate, and high-risk categories, signifying substantial variations in CHD risk. CONCLUSION This study underscores the feasibility and efficacy of employing a machine learning-based approach for CHD prediction. The risk scores exhibited potential in identifying pregnant women at high risk for fetal CHD, offering valuable insights for guiding primary prevention and CHD management.
Collapse
Affiliation(s)
- Shutong Zhang
- Department of Epidemiology and Health Statistics, School of Public Health, Xi'an Jiaotong University Health Science Center, Xi'an, Shaanxi 710061, China
| | - Chenxi Kang
- Department of Epidemiology and Health Statistics, School of Public Health, Xi'an Jiaotong University Health Science Center, Xi'an, Shaanxi 710061, China
| | - Jing Cui
- Department of Epidemiology and Health Statistics, School of Public Health, Xi'an Jiaotong University Health Science Center, Xi'an, Shaanxi 710061, China
| | - Haodan Xue
- Department of Epidemiology and Health Statistics, School of Public Health, Xi'an Jiaotong University Health Science Center, Xi'an, Shaanxi 710061, China
| | - Shanshan Zhao
- Department of Epidemiology and Health Statistics, School of Public Health, Xi'an Jiaotong University Health Science Center, Xi'an, Shaanxi 710061, China
| | - Yukui Chen
- Department of Epidemiology and Health Statistics, School of Public Health, Xi'an Jiaotong University Health Science Center, Xi'an, Shaanxi 710061, China
| | - Haixia Lu
- Department of Epidemiology and Health Statistics, School of Public Health, Xi'an Jiaotong University Health Science Center, Xi'an, Shaanxi 710061, China
| | - Lu Ye
- Shaanxi Eye Hospital, Xi'an People's Hospital (Xi'an Fourth Hospital), Xi'an, China
| | - Duolao Wang
- Biostatistics Unit, Department of Clinical Sciences, Liverpool School of Tropical Medicine, Pembroke Place, Liverpool L3 5QA, UK; Department of Neurology, Guangdong Key Laboratory of Age-Related Cardiac and Cerebral Diseases, Affiliated Hospital of Guangdong Medical University, Zhanjiang, China
| | - Fangyao Chen
- Department of Epidemiology and Health Statistics, School of Public Health, Xi'an Jiaotong University Health Science Center, Xi'an, Shaanxi 710061, China
| | - Yaling Zhao
- Department of Epidemiology and Health Statistics, School of Public Health, Xi'an Jiaotong University Health Science Center, Xi'an, Shaanxi 710061, China
| | - Leilei Pei
- Department of Epidemiology and Health Statistics, School of Public Health, Xi'an Jiaotong University Health Science Center, Xi'an, Shaanxi 710061, China.
| | - Pengfei Qu
- Translational Medicine Center, Northwest Women's and Children's Hospital, Xi'an 710061, China; Central Laboratory, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Chaoyang, Beijing 100026, China.
| |
Collapse
|
2
|
Shi H, Book WM, Ivey LC, Rodriguez FH, Raskind-Hood C, Downing KF, Farr SL, McCracken CE, Leedom VO, Haynes SE, Amouzou S, Sameni R, Kamaleswaran R. A Generalized Machine Learning Model for Identifying Congenital Heart Defects (CHDs) Using ICD Codes. Birth Defects Res 2025; 117:e2440. [PMID: 39890469 PMCID: PMC12027675 DOI: 10.1002/bdr2.2440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2024] [Revised: 11/25/2024] [Accepted: 01/14/2025] [Indexed: 02/03/2025]
Abstract
BACKGROUND International Classification of Diseases (ICD) codes utilized for congenital heart defect (CHD) case identification in datasets have substantial false-positive (FP) rates. Incorporating machine learning (ML) algorithms following case selection by ICD codes may improve the accuracy of CHD identification, enhancing surveillance efforts. METHODS Traditional ML methods were applied to four encounter-level datasets, 2010-2019, for 3334 patients with validated diagnoses and with at least one CHD ICD code identified. A 5-fold cross-validation approach was applied to the dataset to determine the set of overlapping important features best classifying CHD cases. Training and testing combinations were explored to determine the approach yielding the most accurate CHD classification. RESULTS CHD ICD positive predictive values (PPVs) by site ranged from 53.2% to 84.0%. The ML algorithm achieved a PPV of 95% (1273/1340) for the four-site dataset with a false-negative (FN) rate of 33% (639/1912) by choosing an operating point prioritizing PPV from the PPV-FN rate curve. XGBoost reduced 2105 Clinical Classification Software (CCS) features to 137 that identified those with true-positive (TP) CHD and false-positive FP classification. CONCLUSION Applying ML algorithms following case selection by CHD-related ICD codes improved the accuracy of identifying TP true-positive CHD cases.
Collapse
Affiliation(s)
- Haoming Shi
- Department of Biomedical Engineering, Georgia Institute Technology, Atlanta, GA
| | - Wendy M. Book
- Department of Medicine, Division of Cardiology, Emory University School of Medicine, Atlanta, GA
- Department of Epidemiology, Emory University, Rollins School of Public Health, Atlanta, GA
| | - Lindsey C. Ivey
- Department of Epidemiology, Emory University, Rollins School of Public Health, Atlanta, GA
| | - Fred H. Rodriguez
- Department of Medicine, Division of Cardiology, Emory University School of Medicine, Atlanta, GA
- Children’s Healthcare of Atlanta, Atlanta, GA
| | - Cheryl Raskind-Hood
- Department of Epidemiology, Emory University, Rollins School of Public Health, Atlanta, GA
| | - Karrie F. Downing
- National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, GA
| | - Sherry L. Farr
- National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, GA
| | | | | | | | - Sandra Amouzou
- Center for Research and Evaluation, Kaiser Permanente Georgia, Atlanta, GA
| | - Reza Sameni
- Department of Biomedical Engineering, Georgia Institute Technology, Atlanta, GA
- Department of Biomedical Informatics, Emory University School of Medicine, Atlanta GA
| | - Rishikesan Kamaleswaran
- Department of Surgery, Duke University School of Medicine, Durham NC
- Department of Anesthesiology, Duke University School of Medicine, Durham NC
- Department of Biomedical Engineering, Duke University, Durham NC
- Department of Electrical and Computer Engineering, Duke University, Durham NC
| |
Collapse
|
3
|
Khan K, Ullah F, Syed I, Ali H. Accurately assessing congenital heart disease using artificial intelligence. PeerJ Comput Sci 2024; 10:e2535. [PMID: 39650370 PMCID: PMC11623015 DOI: 10.7717/peerj-cs.2535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 10/29/2024] [Indexed: 12/11/2024]
Abstract
Congenital heart disease (CHD) remains a significant global health challenge, particularly contributing to newborn mortality, with the highest rates observed in middle- and low-income countries due to limited healthcare resources. Machine learning (ML) presents a promising solution by developing predictive models that more accurately assess the risk of mortality associated with CHD. These ML-based models can help healthcare professionals identify high-risk infants and ensure timely and appropriate care. In addition, ML algorithms excel at detecting and analyzing complex patterns that can be overlooked by human clinicians, thereby enhancing diagnostic accuracy. Despite notable advancements, ongoing research continues to explore the full potential of ML in the identification of CHD. The proposed article provides a comprehensive analysis of the ML methods for the diagnosis of CHD in the last eight years. The study also describes different data sets available for CHD research, discussing their characteristics, collection methods, and relevance to ML applications. In addition, the article also evaluates the strengths and weaknesses of existing algorithms, offering a critical review of their performance and limitations. Finally, the article proposes several promising directions for future research, with the aim of further improving the efficacy of ML in the diagnosis and treatment of CHD.
Collapse
Affiliation(s)
- Khalil Khan
- Department of Computer Science, School of Engineering and Digital Sciences, Nazarbayev University, Astana, Kazakhstan
| | - Farhan Ullah
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| | - Ikram Syed
- Dept of Information and Communication Engineering, Hankuk University of Foreign Studies, Yongin, Gyeonggy-do, Republic of South Korea
| | - Hashim Ali
- Department of Computer Science, School of Engineering and Digital Sciences, Nazarbayev University, Astana, Kazakhstan
| |
Collapse
|
4
|
Salehi A, Khedmati M. Identifying at-risk patients for congenital heart disease using integrated predictive models and fuzzy clustering analysis: A cross-sectional study. Heliyon 2024; 10:e39609. [PMID: 39498045 PMCID: PMC11532873 DOI: 10.1016/j.heliyon.2024.e39609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 10/17/2024] [Accepted: 10/18/2024] [Indexed: 11/07/2024] Open
Abstract
Congenital heart disease (CHD) remains a significant global health concern, affecting approximately 1 % of newborns worldwide. While its accurate causes often remain elusive, a combination of genetic and environmental factors is implicated. In this cross-sectional study, we propose a comprehensive prediction framework leveraging Machine Learning (ML) and Multi-Attribute Decision Making (MADM) techniques to enhance CHD diagnostics and forecasting. Our framework integrates supervised and unsupervised learning methodologies to remove data noise and address imbalanced datasets effectively. Through the utilization of imbalance ensemble methods and clustering algorithms such as K-means, we enhance predictive accuracy, particularly in non-clinical datasets where imbalances are prevalent. Our results demonstrate an improvement of 8 % in recall compared to existing literature, showcasing the efficacy of our approach. Moreover, our framework identifies clusters of patients at the highest risk using MADM techniques, providing insights into susceptibility to CHD. Fuzzy clustering techniques further assess the degree of risk for individuals within each cluster, enabling personalized risk evaluation. Importantly, our analysis reveals that unhealthy lifestyle factors, annual per capita income, nutrition, and folic acid supplementation emerge as crucial predictors of CHD occurrences. Additionally, environmental risk factors and maternal illnesses significantly contribute to the predictive model. These findings underscore the multifactorial nature of CHD development, emphasizing the importance of considering socioeconomic and lifestyle factors alongside medical variables in CHD risk assessment and prevention strategies. Our proposed framework offers a promising avenue for early identification and intervention, potentially mitigating the burden of CHD on affected individuals and healthcare systems globally.
Collapse
Affiliation(s)
- Amirreza Salehi
- Department of Industrial Engineering, Sharif University of Technology, Tehran, Iran
| | - Majid Khedmati
- Department of Industrial Engineering, Sharif University of Technology, Tehran, Iran
| |
Collapse
|
5
|
Pan PJ, Lee CH, Hsu NW, Sun TL. Combining principal component analysis and logistic regression for multifactorial fall risk prediction among community-dwelling older adults. Geriatr Nurs 2024; 57:208-216. [PMID: 38696878 DOI: 10.1016/j.gerinurse.2024.04.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 04/03/2024] [Accepted: 04/18/2024] [Indexed: 05/04/2024]
Abstract
Falls require comprehensive assessment in older adults due to their diverse risk factors. This study aimed to develop an effective fall risk prediction model for community-dwelling older adults by integrating principal component analysis (PCA) with machine learning. Data were collected for 45 fall-related variables from 1630 older adults in Taiwan, and models were developed using PCA and logistic regression. The optimal model, PCA with stepwise logistic regression, had an area under the receiver operating characteristic curve of 0.78, sensitivity of 74 %, specificity of 70 %, and accuracy of 71 %. While dimensionality reduction via PCA is not essential, it aids practicality. Our framework combines PCA and logistic regression, providing a reliable method for fall risk prediction to support consistent screening and targeted health promotion. The key innovation is using PCA prior to logistic regression, overcoming conventional limitations. This offers an effective community-based fall screening tool for older adults.
Collapse
Affiliation(s)
- Po-Jung Pan
- Department of Physical Medicine & Rehabilitation, National Yang Ming Chiao Tung University Hospital, Yilan, Taiwan; Center of Community Medicine, National Yang Ming Chiao Tung University Hospital, Yilan, Taiwan; School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Chia-Hsuan Lee
- Department of Data Science, Soochow University, Taipei, Taiwan.
| | - Nai-Wei Hsu
- School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan; Public Health Bureau, Yilan County, Taiwan; Community Medicine Research Center & Institute of Public Health, School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Tien-Lung Sun
- Department of Industrial Engineering and Management, Yuan-Ze University, Taoyuan, Taiwan
| |
Collapse
|
6
|
Robinson J, Sahai S, Pennacchio C, Sharew B, Chen L, Karamlou T. Effects of Sociodemographic Factors on Access to and Outcomes in Congenital Heart Disease in the United States. J Cardiovasc Dev Dis 2024; 11:67. [PMID: 38392282 PMCID: PMC10889660 DOI: 10.3390/jcdd11020067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 02/14/2024] [Accepted: 02/15/2024] [Indexed: 02/24/2024] Open
Abstract
Congenital heart defects (CHDs) are complex conditions affecting the heart and/or great vessels that are present at birth. These defects occur in approximately 9 in every 1000 live births. From diagnosis to intervention, care has dramatically improved over the last several decades. Patients with CHDs are now living well into adulthood. However, there are factors that have been associated with poor outcomes across the lifespan of these patients. These factors include sociodemographic and socioeconomic positions. This commentary examined the disparities and solutions within the evolution of CHD care in the United States.
Collapse
Affiliation(s)
- Justin Robinson
- Department of Thoracic and Cardiovascular Surgery, Heart, Vascular and Thoracic Institute, Cleveland Clinic, Cleveland, OH 44195, USA; (J.R.); (S.S.)
| | - Siddhartha Sahai
- Department of Thoracic and Cardiovascular Surgery, Heart, Vascular and Thoracic Institute, Cleveland Clinic, Cleveland, OH 44195, USA; (J.R.); (S.S.)
| | - Caroline Pennacchio
- Case Western Reserve University School of Medicine, Cleveland, OH 44195, USA
| | - Betemariam Sharew
- Cleveland Clinic Learner College of Medicine, Cleveland, OH 44195, USA
| | - Lin Chen
- Case Western Reserve University School of Medicine, Cleveland, OH 44195, USA
| | - Tara Karamlou
- Department of Thoracic and Cardiovascular Surgery, Heart, Vascular and Thoracic Institute, Cleveland Clinic, Cleveland, OH 44195, USA; (J.R.); (S.S.)
- Division of Pediatric Cardiac Surgery, Heart, Vascular and Thoracic Institute, Cleveland Clinic Children’s Hospital, 9500 Euclid Avenue, Desk M41, Cleveland, OH 44195, USA
| |
Collapse
|
7
|
Kaur I, Ahmad T. A cluster-based ensemble approach for congenital heart disease prediction. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 243:107922. [PMID: 37984098 DOI: 10.1016/j.cmpb.2023.107922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Revised: 10/24/2023] [Accepted: 11/06/2023] [Indexed: 11/22/2023]
Abstract
BACKGROUND One of the most prevalent birth disorders is congenital heart diseases (CHD). Although CHD risk factors have been the subject of numerous studies, their propensity to cause CHD has not been tested. Particularly few research has attempted to forecast CHD risk using population-based cross-sectional data, which is inherently imbalanced. OBJECTIVE The main goals of this study are to create a reliable data analysis model that can help with (i) a better understanding of congenital heart disease prediction in the presence of missing and unbalanced data and (ii) creating cohorts of expectant mothers with similar lifestyle characteristics. METHODS Clusters of patient cohorts are produced using the unsupervised data mining technique density-based spatial clustering of applications with noise (DBSCAN). For more accurate CHD prediction, a random forest model was trained using these clusters and their corresponding patterns. This study uses a dataset of 33,831 expectant mothers to make its prediction. Missing data were handled using the k-NN imputation approach, while extremely unbalanced data were balanced using SMOTE. These techniques are all data-driven and need little to no user or expert involvement. RESULTS AND CONCLUSION Using DBSCAN, three cohorts were found. The cluster information enhanced the random forest-based CHD prediction and revealed intricate factors that influence prediction accuracy. The proposed approach gave the highest results with 99 % accuracy and 0.91 AUC and performed better than the state-of-the-art methodologies. Hence, the suggested method using unsupervised learning can provide intricate information to the classifier and further enhance the performance of the classification.
Collapse
Affiliation(s)
- Ishleen Kaur
- Sri Guru Tegh Bahadur Khalsa College, University of Delhi, Delhi, India.
| | - Tanvir Ahmad
- Department of Computer Engineering, Jamia Millia Islamia, New Delhi, India
| |
Collapse
|
8
|
Butt Z, Tinning H, O'Connell MJ, Fenn J, Alberio R, Forde N. Understanding conceptus-maternal interactions: what tools do we need to develop? Reprod Fertil Dev 2023; 36:81-92. [PMID: 38064186 DOI: 10.1071/rd23181] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2023] Open
Abstract
Communication between the maternal endometrium and developing embryo/conceptus is critical to support successful pregnancy to term. Studying the peri-implantation period of pregnancy is critical as this is when most pregnancy loss occurs in cattle. Our current understanding of these interactions is limited, due to the lack of appropriate in vitro models to assess these interactions. The endometrium is a complex and heterogeneous tissue that is regulated in a transcriptional and translational manner throughout the oestrous cycle. While there are in vitro models to study endometrial function, they are static and 2D in nature or explant models and are limited in how well they recapitulate the in vivo endometrium. Recent developments in organoid systems, microfluidic approaches, extracellular matrix biology, and in silico approaches provide a new opportunity to develop in vitro systems that better model the in vivo scenario. This will allow us to investigate in a more high-throughput manner the fundamental molecular interactions that are required for successful pregnancy in cattle.
Collapse
Affiliation(s)
- Zenab Butt
- Discovery and Translational Sciences Department, Leeds Institute of Cardiovascular and Metabolic Medicine, School of Medicine, University of Leeds, Leeds LS2 9JT, UK
| | - Haidee Tinning
- Discovery and Translational Sciences Department, Leeds Institute of Cardiovascular and Metabolic Medicine, School of Medicine, University of Leeds, Leeds LS2 9JT, UK
| | - Mary J O'Connell
- Computational and Molecular Evolutionary Biology Group, School of Life Sciences, Faculty of Medicine and Health Sciences, University of Nottingham, Nottingham, NG7 2RD, UK
| | - Jonathan Fenn
- Computational and Molecular Evolutionary Biology Group, School of Life Sciences, Faculty of Medicine and Health Sciences, University of Nottingham, Nottingham, NG7 2RD, UK
| | - Ramiro Alberio
- School of Biosciences, University of Nottingham, Sutton Bonington Campus, Loughborough LE12 5RD, UK
| | - Niamh Forde
- Discovery and Translational Sciences Department, Leeds Institute of Cardiovascular and Metabolic Medicine, School of Medicine, University of Leeds, Leeds LS2 9JT, UK
| |
Collapse
|
9
|
Shi H, Book W, Raskind-Hood C, Downing KF, Farr SL, Bell MN, Sameni R, Rodriguez FH, Kamaleswaran R. A machine learning model for predicting congenital heart defects from administrative data. Birth Defects Res 2023; 115:1693-1707. [PMID: 37681293 PMCID: PMC10841295 DOI: 10.1002/bdr2.2245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Revised: 08/21/2023] [Accepted: 08/25/2023] [Indexed: 09/09/2023]
Abstract
INTRODUCTION International Classification of Diseases (ICD) codes recorded in administrative data are often used to identify congenital heart defects (CHD). However, these codes may inaccurately identify true positive (TP) CHD individuals. CHD surveillance could be strengthened by accurate CHD identification in administrative records using machine learning (ML) algorithms. METHODS To identify features relevant to accurate CHD identification, traditional ML models were applied to a validated dataset of 779 patients; encounter level data, including ICD-9-CM and CPT codes, from 2011 to 2013 at four US sites were utilized. Five-fold cross-validation determined overlapping important features that best predicted TP CHD individuals. Median values and 95% confidence intervals (CIs) of area under the receiver operating curve, positive predictive value (PPV), negative predictive value, sensitivity, specificity, and F1-score were compared across four ML models: Logistic Regression, Gaussian Naive Bayes, Random Forest, and eXtreme Gradient Boosting (XGBoost). RESULTS Baseline PPV was 76.5% from expert clinician validation of ICD-9-CM CHD-related codes. Feature selection for ML decreased 7138 features to 10 that best predicted TP CHD cases. During training and testing, XGBoost performed the best in median accuracy (F1-score) and PPV, 0.84 (95% CI: 0.76, 0.91) and 0.94 (95% CI: 0.91, 0.96), respectively. When applied to the entire dataset, XGBoost revealed a median PPV of 0.94 (95% CI: 0.94, 0.95). CONCLUSIONS Applying ML algorithms improved the accuracy of identifying TP CHD cases in comparison to ICD codes alone. Use of this technique to identify CHD cases would improve generalizability of results obtained from large datasets to the CHD patient population, enhancing public health surveillance efforts.
Collapse
Affiliation(s)
- Haoming Shi
- Department of Biomedical Engineering, Georgia Institute Technology, Atlanta, Georgia, USA
| | - Wendy Book
- Division of Cardiology, Emory University School of Medicine, Atlanta, Georgia, USA
- Department of Epidemiology, Emory University, Rollins School of Public Health, Atlanta, Georgia, USA
| | - Cheryl Raskind-Hood
- Department of Epidemiology, Emory University, Rollins School of Public Health, Atlanta, Georgia, USA
| | - Karrie F. Downing
- National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Sherry L. Farr
- National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Mary N. Bell
- Department of Biomedical Engineering, Georgia Institute Technology, Atlanta, Georgia, USA
| | - Reza Sameni
- Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, Georgia, USA
| | - Fred H. Rodriguez
- Division of Cardiology, Emory University School of Medicine, Atlanta, Georgia, USA
- Children's Healthcare of Atlanta, Atlanta, Georgia, USA
| | - Rishikesan Kamaleswaran
- Department of Biomedical Engineering, Georgia Institute Technology, Atlanta, Georgia, USA
- Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, Georgia, USA
| |
Collapse
|
10
|
Ferede AA, Kassie BA, Mosu KT, Getahun WT, Taye BT, Desta M, Fetene MG. Pregnant women's knowledge of birth defects and their associated factors among antenatal care attendees in referral hospitals of Amhara regional state, Ethiopia, in 2019. Front Glob Womens Health 2023; 4:1085645. [PMID: 37575960 PMCID: PMC10419168 DOI: 10.3389/fgwh.2023.1085645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 07/05/2023] [Indexed: 08/15/2023] Open
Abstract
Background Birth defects (BDs) are structural, behavioral, functional, and metabolic disorders present at birth. Due to lack of knowledge, families and communities stigmatized pregnant women following the birth of a child with birth defects. In Ethiopia, there was limited evidence to assess the level of knowledge among pregnant women despite increasing magnitude of birth defects. Objectives This study aims to assess pregnant women's knowledge of birth defects and its associated factors among antenatal care (ANC) attendees in referral hospitals of Amhara regional state in 2019. Materials and methods Between 1 June and 30 June 2019, 636 pregnant women receiving prenatal care participated in an institution-based cross-sectional study. The approach for sampling was multistage. A semi-structured pretested interviewer-administered questionnaire was used to collect data. Data were entered in EpiData version 4.6 and analyzed using SPSS version 25 software. A bivariable and multivariable logistic regression model was used. Odds ratio with 95% confidence interval and p-value of ≤0.05 declared statistical significance association. Results A total of 636 pregnant women were included in the analysis. Accordingly, pregnant women's knowledge of birth defects was found to be 49.2% (95% CI: 45.4-53.1). Age group of <25 years (AOR = 0.16, 95% CI: 0.04-0.61), urban residence (AOR = 6.06, 95% CI: 2.17-16.94), ANC booked before 20 weeks of gestational age (AOR = 3.42, 95% CI: 1.37-8.54), and ever heard on birth defects (AOR = 5.00, 95% CI: 1.87-13.43) were significantly associated factors with pregnant women's knowledge of birth defects. Conclusions Approximately half of the pregnant mothers were aware of birth defects. Addressing pre-pregnancy and pregnancy health information and education particularly on the prevention of birth defects is recommended.
Collapse
Affiliation(s)
- Addisu Andualem Ferede
- Department of Midwifery, College of Health Sciences, Debre Markos University, Debre Markos, Ethiopia
| | - Belayneh Ayanaw Kassie
- Department of Clinical Midwifery, School of Midwifery, College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia
| | - Kiber Temesgen Mosu
- Department of Clinical Midwifery, School of Midwifery, College of Medicine and Health Sciences, University of Gondar, Gondar, Ethiopia
| | - Worku Taye Getahun
- Department of Obstetrics and Gynecology, Debre Markos Comprehensive Specialized Hospital, Debre Markos, Ethiopia
| | - Birhan Tsegaw Taye
- School of Nursing and Midwifery, Asrat Woldeyes Health Sciences Campus, Debre Berhan University, Debre Berhan, Ethiopia
| | - Melaku Desta
- Department of Midwifery, College of Health Sciences, Debre Markos University, Debre Markos, Ethiopia
| | - Mamaru Getie Fetene
- Department of Midwifery, College of Health Sciences, Debre Markos University, Debre Markos, Ethiopia
| |
Collapse
|
11
|
Dehghan B, Sabri MR, Ahmadi A, Ghaderian M, Mahdavi C, Ramezani Nejad D, Sattari M. Identifying the Factors Affecting the Incidence of Congenital Heart Disease Using Support Vector Machine and Particle Swarm Optimization. Adv Biomed Res 2023; 12:130. [PMID: 37434918 PMCID: PMC10331520 DOI: 10.4103/abr.abr_54_22] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 10/09/2022] [Accepted: 10/12/2022] [Indexed: 07/13/2023] Open
Abstract
Background Congenital malformations are defined as "any defect in the structure of a person that exists from birth". Among them, congenital heart malformations have the highest prevalence in the world. This study focuses on the development of a predictive model for congenital heart disease in Isfahan using support vector machine (SVM) and particle swarm intelligence. Materials and Methods It consists of four parts: data collection, preprocessing, identify target features, and technique. The proposed technique is a combination of the SVM method and particle swarm optimization (PSO). Results The data set includes 1389 patients and 399 features. The best performance in terms of accuracy, with 81.57%, is related to the PSO-SVM technique and the worst performance, with 78.62%, is related to the random forest technique. Congenital extra cardiac anomalies are considered as the most important factor with averages of 0.655. Conclusion Congenital extra cardiac anomalies are considered as the most important factor. Detecting more important feature affecting congenital heart disease allows physicians to treat the variable risk factors associated with congenital heart disease progression. The use of a machine learning approach provides the ability to predict the presence of congenital heart disease with high accuracy and sensitivity.
Collapse
Affiliation(s)
- Bahar Dehghan
- Pediatric Cardiovascular Research Center, Cardiovascular Research Institute, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Mohammad Reza Sabri
- Pediatric Cardiovascular Research Center, Cardiovascular Research Institute, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Alireza Ahmadi
- Pediatric Cardiovascular Research Center, Cardiovascular Research Institute, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Mehdi Ghaderian
- Pediatric Cardiovascular Research Center, Cardiovascular Research Institute, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Chehreh Mahdavi
- Pediatric Cardiovascular Research Center, Cardiovascular Research Institute, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Davood Ramezani Nejad
- Pediatric Cardiovascular Research Center, Cardiovascular Research Institute, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Mohammad Sattari
- Health Information Technology Research Center, Isfahan University of Medical Sciences, Isfahan, Iran
| |
Collapse
|
12
|
Analytical Comparison of Risk Prediction Models for the Onset of Macrosomia Based on Three Statistical Methods. DISEASE MARKERS 2022; 2022:9073043. [PMID: 36124028 PMCID: PMC9482546 DOI: 10.1155/2022/9073043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 08/29/2022] [Accepted: 09/01/2022] [Indexed: 11/18/2022]
Abstract
Background and Purpose. Fetal overgrowth can pose a serious threat to the safety of a mother and child. Early identification of high-risk pregnant women and timely pregnancy intervention and guidance are of great value in preventing the development of giant babies and improving adverse maternal and infant outcomes. The current clinical methods for predicting macrosomia mainly rely on obstetric examination and imaging, but their accuracy is controversial. And there is no accepted method for accurately predicting macrosomia. We investigated the risk factors influencing the occurrence of macrosomia and established a prediction model for the occurrence of macrosomia to provide a reference basis for interventions to prevent macrosomia. Method. A retrospective selection of 93 women who were hospitalized in our hospital from March 2019 to May 2022 with a singleton pregnancy and delivered at term with macrosomia were the study group. And 356 women who delivered a normal size baby during the same period were the control group. The variables that were associated with the onset of macrosomia were screened from maternal medical records. Logistic regression models, random forest, and CART decision tree models were developed using the screened variables as input variables and whether they were macrosomia as outcome variables, respectively. The performance of the three models was evaluated by accuracy, precision, recall, F1 score, and receiver operating characteristic curve (ROC). Result. The risk prediction models for the onset of macrosomia, logistic regression model, random forest model, and decision tree, were successfully developed, with accuracies of 0.904, 1.000, and 0.901 in the training set and 0.926, 0.582, and 0.852 in the validation set, respectively. The AUC in the training set were 0.898, 1.000, and 0.789, and in the validation set were 0.906, 0.913, and 0.731, respectively. In general, the logistic regression model has the highest diagnostic efficiency, followed by the random forest model. Conclusion. Logistic regression models have high application value in the assessment of predicting the risk of macrosomia, and it is suggested that the advantages of logistic regression models and random forest models should be combined in future studies and applications to make them work better in the prediction of the risk of macrosomia.
Collapse
|
13
|
Yang J, Chang Q, Dang S, Liu X, Zeng L, Yan H. Dietary Quality during Pregnancy and Congenital Heart Defects. Nutrients 2022; 14:nu14173654. [PMID: 36079912 PMCID: PMC9460731 DOI: 10.3390/nu14173654] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 08/31/2022] [Accepted: 09/02/2022] [Indexed: 11/17/2022] Open
Abstract
Limited studies on maternal dietary quality indices and congenital heart defects (CHD) are available. This study aimed to explore the relationship between dietary quality in pregnancy and CHD among the Chinese population. A case-control study was performed in Northwest China, and 474 cases and 948 controls were included. Eligible women waiting for delivery were interviewed to recall diets and other information during pregnancy. Dietary quality was assessed by the Global Diet Quality Score (GDQS) and Mediterranean Diet Score (MDS). Logistic regression models were adopted to evaluate the associations of dietary quality scores with CHD. Pregnant women with higher scores of GDQS and MDS were at a lower risk of fetal CHD, and the adjusted ORs comparing the extreme quartiles were 0.26 (95%CI: 0.16−0.42; Ptrend < 0.001) and 0.53 (95%CI: 0.34−0.83; Ptrend = 0.007), respectively. The inverse associations of GDQS and MDS with CHD appeared to be stronger among women with lower education levels or in rural areas. Maternal GDQS and MDS had good predictive values for fetal CHD, with the areas under the receiver operating characteristic curves close to 0.8. Efforts to improve maternal dietary quality need to be strengthened to decrease the prevalence of CHD among the Chinese population.
Collapse
Affiliation(s)
- Jiaomei Yang
- Department of Epidemiology and Health Statistics, School of Public Health, Xi’an Jiaotong University Health Science Center, Xi’an 710061, China
- Correspondence: ; Tel.: +86-029-8265-5104
| | - Qianqian Chang
- Department of Epidemiology and Health Statistics, School of Public Health, Xi’an Jiaotong University Health Science Center, Xi’an 710061, China
| | - Shaonong Dang
- Department of Epidemiology and Health Statistics, School of Public Health, Xi’an Jiaotong University Health Science Center, Xi’an 710061, China
| | - Xin Liu
- Department of Epidemiology and Health Statistics, School of Public Health, Xi’an Jiaotong University Health Science Center, Xi’an 710061, China
| | - Lingxia Zeng
- Department of Epidemiology and Health Statistics, School of Public Health, Xi’an Jiaotong University Health Science Center, Xi’an 710061, China
| | - Hong Yan
- Department of Epidemiology and Health Statistics, School of Public Health, Xi’an Jiaotong University Health Science Center, Xi’an 710061, China
- Nutrition and Food Safety Engineering Research Center of Shaanxi Province, Xi’an 710061, China
- Key Laboratory of Environment and Genes Related to Diseases, Xi’an Jiaotong University, Ministry of Education, Xi’an 710061, China
| |
Collapse
|
14
|
Huang X, Cao T, Chen L, Li J, Tan Z, Xu B, Xu R, Song Y, Zhou Z, Wang Z, Wei Y, Zhang Y, Li J, Huo Y, Qin X, Wu Y, Wang X, Wang H, Cheng X, Xu X, Liu L. Novel Insights on Establishing Machine Learning-Based Stroke Prediction Models Among Hypertensive Adults. Front Cardiovasc Med 2022; 9:901240. [PMID: 35600480 PMCID: PMC9120532 DOI: 10.3389/fcvm.2022.901240] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 04/05/2022] [Indexed: 11/13/2022] Open
Abstract
Background Stroke is a major global health burden, and risk prediction is essential for the primary prevention of stroke. However, uncertainty remains about the optimal prediction model for analyzing stroke risk. In this study, we aim to determine the most effective stroke prediction method in a Chinese hypertensive population using machine learning and establish a general methodological pipeline for future analysis. Methods The training set included 70% of data (n = 14,491) from the China Stroke Primary Prevention Trial (CSPPT). Internal validation was processed with the rest 30% of CSPPT data (n = 6,211), and external validation was conducted using a nested case–control (NCC) dataset (n = 2,568). The primary outcome was the first stroke. Four received analysis methods were processed and compared: logistic regression (LR), stepwise logistic regression (SLR), extreme gradient boosting (XGBoost), and random forest (RF). Population characteristic data with inclusion and exclusion of laboratory variables were separately analyzed. Accuracy, sensitivity, specificity, kappa, and area under receiver operating characteristic curves (AUCs) were used to make model assessments with AUCs the top concern. Data balancing techniques, including random under-sampling (RUS) and synthetic minority over-sampling technique (SMOTE), were applied to process this unbalanced training set. Results The best model performance was observed in RUS-applied RF model with laboratory variables. Compared with null models (sensitivity = 0, specificity = 100, and mean AUCs = 0.643), data balancing techniques improved overall performance with RUS, demonstrating a more satisfactory effect in the current study (RUS: sensitivity = 63.9; specificity = 53.7; and mean AUCs = 0.624. Adding laboratory variables improved the performance of analysis methods. All results were reconfirmed in validation sets. The top 10 important variables were determined by the analysis method with the best performance. Conclusion Among the tested methods, the most effective stroke prediction model in targeted population is RUS-applied RF. From the insights, the current study revealed, we provided general frameworks for building machine learning-based prediction models.
Collapse
Affiliation(s)
- Xiao Huang
- Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
- *Correspondence: Xiao Huang
| | - Tianyu Cao
- Biological Anthropology, University of California, Santa Barbara, Santa Barbara, CA, United States
| | - Liangziqian Chen
- Department of Data Management, Shenzhen Evergreen Medical Institute, Shenzhen, China
| | - Junpei Li
- Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
| | - Ziheng Tan
- Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
| | - Benjamin Xu
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, United States
| | - Richard Xu
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Yun Song
- Department of Data Management, Shenzhen Evergreen Medical Institute, Shenzhen, China
- Institute of Biomedicine, Anhui Medical University, Hefei, China
| | - Ziyi Zhou
- Department of Biomedical Engineering, Graduate School at Shenzhen, Tsinghua University, Shenzhen, China
| | - Zhuo Wang
- Key Laboratory of Precision Nutrition and Food Quality, Ministry of Education, Department of Nutrition and Health, College of Food Sciences and Nutritional Engineering, China Agricultural University, Beijing, China
| | - Yaping Wei
- Key Laboratory of Precision Nutrition and Food Quality, Ministry of Education, Department of Nutrition and Health, College of Food Sciences and Nutritional Engineering, China Agricultural University, Beijing, China
| | - Yan Zhang
- Department of Cardiology, Peking University First Hospital, Beijing, China
| | - Jianping Li
- Department of Cardiology, Peking University First Hospital, Beijing, China
| | - Yong Huo
- Department of Cardiology, Peking University First Hospital, Beijing, China
| | - Xianhui Qin
- National Clinical Research Study Center for Kidney Disease, The State Key Laboratory for Organ Failure Research, Renal Division, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Yanqing Wu
- Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
| | - Xiaobin Wang
- Department of Population, Family and Reproductive Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD, United States
| | - Hong Wang
- Department of Cardiovascular Science, Temple University Lewis Katz School of Medicine, Philadelphia, PA, United States
| | - Xiaoshu Cheng
- Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
| | - Xiping Xu
- Key Laboratory of Precision Nutrition and Food Quality, Ministry of Education, Department of Nutrition and Health, College of Food Sciences and Nutritional Engineering, China Agricultural University, Beijing, China
| | - Lishun Liu
- Department of Data Management, Shenzhen Evergreen Medical Institute, Shenzhen, China
- Department of Biomedical Engineering, Graduate School at Shenzhen, Tsinghua University, Shenzhen, China
- Lishun Liu
| |
Collapse
|
15
|
Qu Y, Deng X, Lin S, Han F, Chang HH, Ou Y, Nie Z, Mai J, Wang X, Gao X, Wu Y, Chen J, Zhuang J, Ryan I, Liu X. Using Innovative Machine Learning Methods to Screen and Identify Predictors of Congenital Heart Diseases. Front Cardiovasc Med 2022; 8:797002. [PMID: 35071361 PMCID: PMC8777022 DOI: 10.3389/fcvm.2021.797002] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Accepted: 12/14/2021] [Indexed: 11/16/2022] Open
Abstract
Objective: Congenital heart diseases (CHDs) are associated with an extremely heavy global disease burden as the most common category of birth defects. Genetic and environmental factors have been identified as risk factors of CHDs previously. However, high volume clinical indicators have never been considered when predicting CHDs. This study aimed to predict the occurrence of CHDs by considering thousands of variables from self-reported questionnaires and routinely collected clinical laboratory data using machine learning algorithms. Methods: We conducted a birth cohort study at one of the largest cardiac centers in China from 2011 to 2017. All fetuses were screened for CHDs using ultrasound and cases were confirmed by at least two pediatric cardiologists using echocardiogram. A total of 1,127 potential predictors were included to predict CHDs. We used the Explainable Boosting Machine (EBM) for prediction and evaluated the model performance using area under the Receive Operating Characteristics (ROC) curves (AUC). The top predictors were selected according to their contributions and predictive values. Thresholds were calculated for the most significant predictors. Results: Overall, 5,390 mother-child pairs were recruited. Our prediction model achieved an AUC of 76% (69-83%) from out-of-sample predictions. Among the top 35 predictors of CHDs we identified, 34 were from clinical laboratory tests and only one was from the questionnaire (abortion history). Total accuracy, sensitivity, and specificity were 0.65, 0.74, and 0.65, respectively. Maternal serum uric acid (UA), glucose, and coagulation levels were the most consistent and significant predictors of CHDs. According to the thresholds of the predictors identified in our study, which did not reach the current clinical diagnosis criteria, elevated UA (>4.38 mg/dl), shortened activated partial thromboplastin time (<33.33 s), and elevated glucose levels were the most important predictors and were associated with ranges of 1.17-1.54 relative risks of CHDs. We have developed an online predictive tool for CHDs based on our findings that may help screening and prevention of CHDs. Conclusions: Maternal UA, glucose, and coagulation levels were the most consistent and significant predictors of CHDs. Thresholds below the current clinical definition of “abnormal” for these predictors could be used to help develop CHD screening and prevention strategies.
Collapse
Affiliation(s)
- Yanji Qu
- Guangdong Cardiovascular Institute, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Xinlei Deng
- Department of Environmental Health Sciences, University at Albany, State University of New York, New York, NY, United States
| | - Shao Lin
- Department of Environmental Health Sciences, University at Albany, State University of New York, New York, NY, United States
| | - Fengzhen Han
- Guangdong Cardiovascular Institute, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Howard H Chang
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, United States
| | - Yanqiu Ou
- Guangdong Cardiovascular Institute, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Zhiqiang Nie
- Guangdong Cardiovascular Institute, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Jinzhuang Mai
- Guangdong Cardiovascular Institute, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Ximeng Wang
- Guangdong Cardiovascular Institute, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Xiangmin Gao
- Guangdong Cardiovascular Institute, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Yong Wu
- Guangdong Cardiovascular Institute, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Jimei Chen
- Guangdong Cardiovascular Institute, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Jian Zhuang
- Guangdong Cardiovascular Institute, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Ian Ryan
- Department of Environmental Health Sciences, University at Albany, State University of New York, New York, NY, United States
| | - Xiaoqing Liu
- Guangdong Cardiovascular Institute, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| |
Collapse
|
16
|
Boyd R, McMullen H, Beqaj H, Kalfa D. Environmental Exposures and Congenital Heart Disease. Pediatrics 2022; 149:183839. [PMID: 34972224 DOI: 10.1542/peds.2021-052151] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/19/2021] [Indexed: 12/16/2022] Open
Abstract
Congenital heart disease (CHD) is the most common congenital abnormality worldwide, affecting 8 to 12 infants per 1000 births globally and causing >40% of prenatal deaths. However, its causes remain mainly unknown, with only up to 15% of CHD cases having a determined genetic cause. Exploring the complex relationship between genetics and environmental exposures is key in understanding the multifactorial nature of the development of CHD. Multiple population-level association studies have been conducted on maternal environmental exposures and their association with CHD, including evaluating the effect of maternal disease, medication exposure, environmental pollution, and tobacco and alcohol use on the incidence of CHD. However, these studies have been done in a siloed manner, with few examining the interplay between multiple environmental exposures. Here, we broadly and qualitatively review the current literature on maternal and paternal prenatal exposures and their association with CHD. We propose using the framework of the emerging field of the exposome, the environmental complement to the genome, to review all internal and external prenatal environmental exposures and identify potentiating or alleviating synergy between exposures. Finally, we propose mechanistic pathways through which susceptibility to development of CHD may be induced via the totality of prenatal environmental exposures, including the interplay between placental and cardiac development and the internal vasculature and placental morphology in early stages of pregnancy.
Collapse
|
17
|
Giang KW, Helgadottir S, Dellborg M, Volpe G, Mandalenakis Z. Enhanced prediction of atrial fibrillation and mortality among patients with congenital heart disease using nationwide register-based medical hospital data and neural networks. EUROPEAN HEART JOURNAL. DIGITAL HEALTH 2021; 2:568-575. [PMID: 36713111 PMCID: PMC9707883 DOI: 10.1093/ehjdh/ztab065] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Revised: 06/28/2021] [Accepted: 07/14/2021] [Indexed: 02/01/2023]
Abstract
Aims To improve short-and long-term predictions of mortality and atrial fibrillation (AF) among patients with congenital heart disease (CHD) from a nationwide population using neural networks (NN). Methods and results The Swedish National Patient Register and the Cause of Death Register were used to identify all patients with CHD born from 1970 to 2017. A total of 71 941 CHD patients were identified and followed-up from birth until the event or end of study in 2017. Based on data from a nationwide population, a NN model was obtained to predict mortality and AF. Logistic regression (LR) based on the same data was used as a baseline comparison. Of 71 941 CHD patients, a total of 5768 died (8.02%) and 995 (1.38%) developed AF over time with a mean follow-up time of 16.47 years (standard deviation 12.73 years). The performance of NN models in predicting the mortality and AF was higher than the performance of LR regardless of the complexity of the disease, with an average area under the receiver operating characteristic of >0.80 and >0.70, respectively. The largest differences were observed in mortality and complexity of CHD over time. Conclusion We found that NN can be used to predict mortality and AF on a nationwide scale using data that are easily obtainable by clinicians. In addition, NN showed a high performance overall and, in most cases, with better performance for prediction as compared with more traditional regression methods.
Collapse
Affiliation(s)
- Kok Wai Giang
- Institute of Medicine, Department of Molecular and Clinical Medicine, Sahlgrenska Academy, University of Gothenburg, Diagnosvägen 11, 416 50 Gothenburg, Sweden
| | - Saga Helgadottir
- Department of Physics, University of Gothenburg, Gothenburg, Sweden
| | - Mikael Dellborg
- Institute of Medicine, Department of Molecular and Clinical Medicine, Sahlgrenska Academy, University of Gothenburg, Diagnosvägen 11, 416 50 Gothenburg, Sweden
- Adult Congenital Heart Unit, Department of Medicine, Sahlgrenska University Hospital/Östra, Gothenburg, Sweden
| | - Giovanni Volpe
- Department of Physics, University of Gothenburg, Gothenburg, Sweden
| | - Zacharias Mandalenakis
- Institute of Medicine, Department of Molecular and Clinical Medicine, Sahlgrenska Academy, University of Gothenburg, Diagnosvägen 11, 416 50 Gothenburg, Sweden
- Adult Congenital Heart Unit, Department of Medicine, Sahlgrenska University Hospital/Östra, Gothenburg, Sweden
| |
Collapse
|
18
|
Helman SM, Herrup EA, Christopher AB, Al-Zaiti SS. The role of machine learning applications in diagnosing and assessing critical and non-critical CHD: a scoping review. Cardiol Young 2021; 31:1770-1780. [PMID: 34725005 PMCID: PMC8805679 DOI: 10.1017/s1047951121004212] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Machine learning uses historical data to make predictions about new data. It has been frequently applied in healthcare to optimise diagnostic classification through discovery of hidden patterns in data that may not be obvious to clinicians. Congenital Heart Defect (CHD) machine learning research entails one of the most promising clinical applications, in which timely and accurate diagnosis is essential. The objective of this scoping review is to summarise the application and clinical utility of machine learning techniques used in paediatric cardiology research, specifically focusing on approaches aiming to optimise diagnosis and assessment of underlying CHD. Out of 50 full-text articles identified between 2015 and 2021, 40% focused on optimising the diagnosis and assessment of CHD. Deep learning and support vector machine were the most commonly used algorithms, accounting for an overall diagnostic accuracy > 0.80. Clinical applications primarily focused on the classification of auscultatory heart sounds, transthoracic echocardiograms, and cardiac MRIs. The range of these applications and directions of future research are discussed in this scoping review.
Collapse
Affiliation(s)
- Stephanie M Helman
- Department of Acute and Tertiary Care Nursing, University of Pittsburgh, Pittsburgh, PA, USA
| | - Elizabeth A Herrup
- Division of Pediatric Critical Care Medicine, UPMC Children's Hospital of Pittsburgh, Pittsburgh, PA, USA
| | - Adam B Christopher
- Division of Pediatric Cardiology, UPMC Children's Hospital of Pittsburgh, Pittsburgh, PA, USA
| | - Salah S Al-Zaiti
- Department of Acute and Tertiary Care Nursing, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Emergency Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Division of Cardiology, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
19
|
Yang H, Li X, Cao H, Cui Y, Luo Y, Liu J, Zhang Y. Using machine learning methods to predict hepatic encephalopathy in cirrhotic patients with unbalanced data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021; 211:106420. [PMID: 34555589 DOI: 10.1016/j.cmpb.2021.106420] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 09/11/2021] [Indexed: 06/13/2023]
Abstract
OBJECTIVE Hepatic encephalopathy (HE) is among the most common complications of cirrhosis. Data for cirrhosis with HE is typically unbalanced. Traditional statistical methods and machine learning algorithms thus cannot identify a few classes. In this paper, we use machine learning algorithms to construct a risk prediction model for liver cirrhosis complicated by HE to improve the efficiency of its prediction. METHOD We collected medical data from 1,256 patients with cirrhosis and performed preprocessing to extract 81 features from these irregular data. To predict HE in cirrhotic patients, we compared several classification methods: logistic regression, weighted random forest (WRF), SVM, and weighted SVM (WSVM). We also used an additional 722 patients with cirrhosis for external validation of the model. RESULTS The WRF, WSVM, and logistic regression models exhibited better recognition ability for patients with HE than traditional machine learning models (sensitivity> 0.70), but their ability to identify patients with uncomplicated HE was slightly lower (specificity approximately 85%). The comprehensive evaluation index of the traditional model was higher than those of other models (G-means> 0.80 and F-measure> 0.40). For the WRF, the G-means (0.82), F-measure (0.46), and AUC (0.82) were superior to those of the logistic regression and WSVM models, which means that it can better predict the incidence of HE in patients. CONCLUSION The WRF model is more suitable for the classification of unbalanced medical data and can be used to construct a risk prediction and evaluation system for liver cirrhosis complicated with HE. The probabilistic prediction models of WRF can help clinicians identify high-risk patients with HE.
Collapse
Affiliation(s)
- Hong Yang
- Department of Health Statistics, Shanxi Medical University, Taiyuan, China
| | - Xinxin Li
- Department of Health Statistics, Shanxi Medical University, Taiyuan, China
| | - Hongyan Cao
- Department of Health Statistics, Shanxi Medical University, Taiyuan, China
| | - Yuehua Cui
- Department of Statistics and Probability, Michigan State University, East Lansing, USA
| | - Yanhong Luo
- Department of Health Statistics, Shanxi Medical University, Taiyuan, China
| | - Jinchun Liu
- Department of Gastroenterology, the First Hospital of Shanxi Medical University, Taiyuan, China.
| | - Yanbo Zhang
- Department of Health Statistics, Shanxi Medical University, Taiyuan, China.
| |
Collapse
|
20
|
Prediction of arrhythmia after intervention in children with atrial septal defect based on random forest. BMC Pediatr 2021; 21:280. [PMID: 34134641 PMCID: PMC8207618 DOI: 10.1186/s12887-021-02744-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Accepted: 05/27/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Using random forest to predict arrhythmia after intervention in children with atrial septal defect. METHODS We constructed a prediction model of complications after interventional closure for children with atrial septal defect. The model was based on random forest, and it solved the need for postoperative arrhythmia risk prediction and assisted clinicians and patients' families to make preoperative decisions. RESULTS Available risk prediction models provided patients with specific risk factor assessments, we used Synthetic Minority Oversampling Technique algorithm and random forest machine learning to propose a prediction model, and got a prediction accuracy of 94.65 % and an Area Under Curve value of 0.8956. CONCLUSIONS Our study was based on the model constructed by random forest, which can effectively predict the complications of arrhythmia after interventional closure in children with atrial septal defect.
Collapse
|
21
|
Gomes JDA, Olstad EW, Kowalski TW, Gervin K, Vianna FSL, Schüler-Faccini L, Nordeng HME. Genetic Susceptibility to Drug Teratogenicity: A Systematic Literature Review. Front Genet 2021; 12:645555. [PMID: 33981330 PMCID: PMC8107476 DOI: 10.3389/fgene.2021.645555] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 03/19/2021] [Indexed: 12/19/2022] Open
Abstract
Since the 1960s, drugs have been known to cause teratogenic effects in humans. Such teratogenicity has been postulated to be influenced by genetics. The aim of this review was to provide an overview of the current knowledge on genetic susceptibility to drug teratogenicity in humans and reflect on future directions within the field of genetic teratology. We focused on 12 drugs and drug classes with evidence of teratogenic action, as well as 29 drugs and drug classes with conflicting evidence of fetal safety in humans. An extensive literature search was performed in the PubMed and EMBASE databases using terms related to the drugs of interest, congenital anomalies and fetal development abnormalities, and genetic variation and susceptibility. A total of 29 studies were included in the final data extraction. The eligible studies were published between 1999 and 2020 in 10 different countries, and comprised 28 candidate gene and 1 whole-exome sequencing studies. The sample sizes ranged from 20 to 9,774 individuals. Several drugs were investigated, including antidepressants (nine studies), thalidomide (seven studies), antiepileptic drugs (five studies), glucocorticoids (four studies), acetaminophen (two studies), and sex hormones (estrogens, one study; 17-alpha hydroxyprogesterone caproate, one study). The main neonatal phenotypic outcomes included perinatal complications, cardiovascular congenital anomalies, and neurodevelopmental outcomes. The review demonstrated that studies on genetic teratology are generally small, heterogeneous, and exhibit inconsistent results. The most convincing findings were genetic variants in SLC6A4, MTHFR, and NR3C1, which were associated with drug teratogenicity by antidepressants, antiepileptics, and glucocorticoids, respectively. Notably, this review demonstrated the large knowledge gap regarding genetic susceptibility to drug teratogenicity, emphasizing the need for further efforts in the field. Future studies may be improved by increasing the sample size and applying genome-wide approaches to promote the interpretation of results. Such studies could support the clinical implementation of genetic screening to provide safer drug use in pregnant women in need of drugs.
Collapse
Affiliation(s)
- Julia do Amaral Gomes
- Programa de Pós-Graduação em Genética e Biologia Molecular (PPGBM), Departamento de Genética, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil
- Sistema Nacional de Informação sobre Agentes Teratogênicos (SIAT), Serviço de Genética Médica, Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil
- Laboratório de Medicina Genômica, Centro de Pesquisa Experimental, Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil
- Instituto Nacional de Genética Médica Populacional (INAGEMP), Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil
| | - Emilie Willoch Olstad
- Pharmacoepidemiology and Drug Safety Research Group, Department of Pharmacy, Faculty of Mathematics and Natural Sciences, University of Oslo, Oslo, Norway
- PharmaTox Strategic Research Initiative, Faculty of Mathematics and Natural Sciences, University of Oslo, Oslo, Norway
| | - Thayne Woycinck Kowalski
- Programa de Pós-Graduação em Genética e Biologia Molecular (PPGBM), Departamento de Genética, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil
- Laboratório de Medicina Genômica, Centro de Pesquisa Experimental, Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil
- Instituto Nacional de Genética Médica Populacional (INAGEMP), Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil
- Complexo de Ensino Superior de Cachoeirinha (CESUCA), Cachoeirinha, Brazil
| | - Kristina Gervin
- Pharmacoepidemiology and Drug Safety Research Group, Department of Pharmacy, Faculty of Mathematics and Natural Sciences, University of Oslo, Oslo, Norway
- PharmaTox Strategic Research Initiative, Faculty of Mathematics and Natural Sciences, University of Oslo, Oslo, Norway
- Division of Clinical Neuroscience, Department of Research and Innovation, Oslo University Hospital, Oslo, Norway
| | - Fernanda Sales Luiz Vianna
- Programa de Pós-Graduação em Genética e Biologia Molecular (PPGBM), Departamento de Genética, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil
- Sistema Nacional de Informação sobre Agentes Teratogênicos (SIAT), Serviço de Genética Médica, Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil
- Laboratório de Medicina Genômica, Centro de Pesquisa Experimental, Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil
- Instituto Nacional de Genética Médica Populacional (INAGEMP), Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil
| | - Lavínia Schüler-Faccini
- Programa de Pós-Graduação em Genética e Biologia Molecular (PPGBM), Departamento de Genética, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil
- Sistema Nacional de Informação sobre Agentes Teratogênicos (SIAT), Serviço de Genética Médica, Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil
- Instituto Nacional de Genética Médica Populacional (INAGEMP), Hospital de Clínicas de Porto Alegre (HCPA), Porto Alegre, Brazil
| | - Hedvig Marie Egeland Nordeng
- Pharmacoepidemiology and Drug Safety Research Group, Department of Pharmacy, Faculty of Mathematics and Natural Sciences, University of Oslo, Oslo, Norway
- PharmaTox Strategic Research Initiative, Faculty of Mathematics and Natural Sciences, University of Oslo, Oslo, Norway
- Department of Child Health and Development, Norwegian Institute of Public Health, Oslo, Norway
| |
Collapse
|
22
|
Oskar S, Stingone JA. Machine Learning Within Studies of Early-Life Environmental Exposures and Child Health: Review of the Current Literature and Discussion of Next Steps. Curr Environ Health Rep 2021; 7:170-184. [PMID: 32578067 DOI: 10.1007/s40572-020-00282-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
PURPOSE OF REVIEW The goal of this article is to review the use of machine learning (ML) within studies of environmental exposures and children's health, identify common themes across studies, and provide recommendations to advance their use in research and practice. RECENT FINDINGS We identified 42 articles reporting upon the use of ML within studies of environmental exposures and children's health between 2017 and 2019. The common themes among the articles were analysis of mixture data, exposure prediction, disease prediction and forecasting, analysis of complex data, and causal inference. With the increasing complexity of environmental health data, we anticipate greater use of ML to address the challenges that cannot be handled by traditional analytics. In order for these methods to beneficially impact public health, the ML techniques we use need to be appropriate for our study questions, rigorously evaluated and reported in a way that can be critically assessed by the scientific community.
Collapse
Affiliation(s)
- Sabine Oskar
- Department of Epidemiology, Columbia University Mailman School of Public Health, 722 West 168th St, Room 1608, New York, NY, 10032, USA
| | - Jeanette A Stingone
- Department of Epidemiology, Columbia University Mailman School of Public Health, 722 West 168th St, Room 1608, New York, NY, 10032, USA.
| |
Collapse
|
23
|
Davidson L, Boland MR. Towards deep phenotyping pregnancy: a systematic review on artificial intelligence and machine learning methods to improve pregnancy outcomes. Brief Bioinform 2021; 22:6065792. [PMID: 33406530 PMCID: PMC8424395 DOI: 10.1093/bib/bbaa369] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Revised: 10/13/2020] [Accepted: 11/18/2020] [Indexed: 12/16/2022] Open
Abstract
Objective Development of novel informatics methods focused on improving pregnancy outcomes remains an active area of research. The purpose of this study is to systematically review the ways that artificial intelligence (AI) and machine learning (ML), including deep learning (DL), methodologies can inform patient care during pregnancy and improve outcomes. Materials and methods We searched English articles on EMBASE, PubMed and SCOPUS. Search terms included ML, AI, pregnancy and informatics. We included research articles and book chapters, excluding conference papers, editorials and notes. Results We identified 127 distinct studies from our queries that were relevant to our topic and included in the review. We found that supervised learning methods were more popular (n = 69) than unsupervised methods (n = 9). Popular methods included support vector machines (n = 30), artificial neural networks (n = 22), regression analysis (n = 17) and random forests (n = 16). Methods such as DL are beginning to gain traction (n = 13). Common areas within the pregnancy domain where AI and ML methods were used the most include prenatal care (e.g. fetal anomalies, placental functioning) (n = 73); perinatal care, birth and delivery (n = 20); and preterm birth (n = 13). Efforts to translate AI into clinical care include clinical decision support systems (n = 24) and mobile health applications (n = 9). Conclusions Overall, we found that ML and AI methods are being employed to optimize pregnancy outcomes, including modern DL methods (n = 13). Future research should focus on less-studied pregnancy domain areas, including postnatal and postpartum care (n = 2). Also, more work on clinical adoption of AI methods and the ethical implications of such adoption is needed.
Collapse
Affiliation(s)
- Lena Davidson
- MS degree at College of St. Scholastica, Duluth, MN, USA
| | - Mary Regina Boland
- Department of Biostatistics, Epidemiology, and Informatics at the University of Pennsylvania
| |
Collapse
|
24
|
Hosni M, Carrillo de Gea JM, Idri A, El Bajta M, Fernández Alemán JL, García-Mateos G, Abnane I. A systematic mapping study for ensemble classification methods in cardiovascular disease. Artif Intell Rev 2020. [DOI: 10.1007/s10462-020-09914-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
25
|
Stroke Prediction with Machine Learning Methods among Older Chinese. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:ijerph17061828. [PMID: 32178250 PMCID: PMC7142983 DOI: 10.3390/ijerph17061828] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 03/10/2020] [Accepted: 03/10/2020] [Indexed: 12/21/2022]
Abstract
Timely stroke diagnosis and intervention are necessary considering its high prevalence. Previous studies have mainly focused on stroke prediction with balanced data. Thus, this study aimed to develop machine learning models for predicting stroke with imbalanced data in an elderly population in China. Data were obtained from a prospective cohort that included 1131 participants (56 stroke patients and 1075 non-stroke participants) in 2012 and 2014, respectively. Data balancing techniques including random over-sampling (ROS), random under-sampling (RUS), and synthetic minority over-sampling technique (SMOTE) were used to process the imbalanced data in this study. Machine learning methods such as regularized logistic regression (RLR), support vector machine (SVM), and random forest (RF) were used to predict stroke with demographic, lifestyle, and clinical variables. Accuracy, sensitivity, specificity, and areas under the receiver operating characteristic curves (AUCs) were used for performance comparison. The top five variables for stroke prediction were selected for each machine learning method based on the SMOTE-balanced data set. The total prevalence of stroke was high in 2014 (4.95%), with men experiencing much higher prevalence than women (6.76% vs. 3.25%). The three machine learning methods performed poorly in the imbalanced data set with extremely low sensitivity (approximately 0.00) and AUC (approximately 0.50). After using data balancing techniques, the sensitivity and AUC considerably improved with moderate accuracy and specificity, and the maximum values for sensitivity and AUC reached 0.78 (95% CI, 0.73–0.83) for RF and 0.72 (95% CI, 0.71–0.73) for RLR. Using AUCs for RLR, SVM, and RF in the imbalanced data set as references, a significant improvement was observed in the AUCs of all three machine learning methods (p < 0.05) in the balanced data sets. Considering RLR in each data set as a reference, only RF in the imbalanced data set and SVM in the ROS-balanced data set were superior to RLR in terms of AUC. Sex, hypertension, and uric acid were common predictors in all three machine learning methods. Blood glucose level was included in both RLR and RF. Drinking, age and high-sensitivity C-reactive protein level, and low-density lipoprotein cholesterol level were also included in RLR, SVM, and RF, respectively. Our study suggests that machine learning methods with data balancing techniques are effective tools for stroke prediction with imbalanced data.
Collapse
|
26
|
Dworzynski P, Aasbrenn M, Rostgaard K, Melbye M, Gerds TA, Hjalgrim H, Pers TH. Nationwide prediction of type 2 diabetes comorbidities. Sci Rep 2020; 10:1776. [PMID: 32019971 PMCID: PMC7000818 DOI: 10.1038/s41598-020-58601-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Accepted: 01/16/2020] [Indexed: 02/06/2023] Open
Abstract
Identification of individuals at risk of developing disease comorbidities represents an important task in tackling the growing personal and societal burdens associated with chronic diseases. We employed machine learning techniques to investigate to what extent data from longitudinal, nationwide Danish health registers can be used to predict individuals at high risk of developing type 2 diabetes (T2D) comorbidities. Leveraging logistic regression-, random forest- and gradient boosting models and register data spanning hospitalizations, drug prescriptions and contacts with primary care contractors from >200,000 individuals newly diagnosed with T2D, we predicted five-year risk of heart failure (HF), myocardial infarction (MI), stroke (ST), cardiovascular disease (CVD) and chronic kidney disease (CKD). For HF, MI, CVD, and CKD, register-based models outperformed a reference model leveraging canonical individual characteristics by achieving area under the receiver operating characteristic curve improvements of 0.06, 0.03, 0.04, and 0.07, respectively. The top 1,000 patients predicted to be at highest risk exhibited observed incidence ratios exceeding 4.99, 3.52, 1.97 and 4.71 respectively. In summary, prediction of T2D comorbidities utilizing Danish registers led to consistent albeit modest performance improvements over reference models, suggesting that register data could be leveraged to systematically identify individuals at risk of developing disease comorbidities.
Collapse
Affiliation(s)
- Piotr Dworzynski
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- Department of Epidemiology Research, Statens Serum Institut, Copenhagen, Denmark
| | - Martin Aasbrenn
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- Department of Geriatrics and Internal Medicine, Bispebjerg and Frederiksberg Hospital, Copenhagen, Denmark
| | - Klaus Rostgaard
- Department of Epidemiology Research, Statens Serum Institut, Copenhagen, Denmark
| | - Mads Melbye
- Department of Epidemiology Research, Statens Serum Institut, Copenhagen, Denmark
- Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | | | - Henrik Hjalgrim
- Department of Epidemiology Research, Statens Serum Institut, Copenhagen, Denmark
- Department of Haematology, Rigshospitalet, Copenhagen, Denmark
| | - Tune H Pers
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
- Department of Epidemiology Research, Statens Serum Institut, Copenhagen, Denmark.
| |
Collapse
|
27
|
Tiwari P, Colborn KL, Smith DE, Xing F, Ghosh D, Rosenberg MA. Assessment of a Machine Learning Model Applied to Harmonized Electronic Health Record Data for the Prediction of Incident Atrial Fibrillation. JAMA Netw Open 2020; 3:e1919396. [PMID: 31951272 PMCID: PMC6991266 DOI: 10.1001/jamanetworkopen.2019.19396] [Citation(s) in RCA: 69] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
IMPORTANCE Atrial fibrillation (AF) is the most common sustained cardiac arrhythmia, and its early detection could lead to significant improvements in outcomes through the appropriate prescription of anticoagulation medication. Although a variety of methods exist for screening for AF, a targeted approach, which requires an efficient method for identifying patients at risk, would be preferred. OBJECTIVE To examine machine learning approaches applied to electronic health record data that have been harmonized to the Observational Medical Outcomes Partnership Common Data Model for identifying risk of AF. DESIGN, SETTING, AND PARTICIPANTS This diagnostic study used data from 2 252 219 individuals cared for in the UCHealth hospital system, which comprises 3 large hospitals in Colorado, from January 1, 2011, to October 1, 2018. Initial analysis was performed in December 2018; follow-up analysis was performed in July 2019. EXPOSURES All Observational Medical Outcomes Partnership Common Data Model-harmonized electronic health record features, including diagnoses, procedures, medications, age, and sex. MAIN OUTCOMES AND MEASURES Classification of incident AF in designated 6-month intervals, adjudicated retrospectively, based on area under the receiver operating characteristic curve and F1 statistic. RESULTS Of 2 252 219 individuals (1 225 533 [54.4%] women; mean [SD] age, 42.9 [22.3] years), 28 036 (1.2%) developed incident AF during a designated 6-month interval. The machine learning model that used the 200 most common electronic health record features, including age and sex, and random oversampling with a single-layer, fully connected neural network provided the optimal prediction of 6-month incident AF, with an area under the receiver operating characteristic curve of 0.800 and an F1 score of 0.110. This model performed only slightly better than a more basic logistic regression model composed of known clinical risk factors for AF, which had an area under the receiver operating characteristic curve of 0.794 and an F1 score of 0.079. CONCLUSIONS AND RELEVANCE Machine learning approaches to electronic health record data offer a promising method for improving risk prediction for incident AF, but more work is needed to show improvement beyond standard risk factors.
Collapse
Affiliation(s)
- Premanand Tiwari
- Colorado Center for Personalized Medicine, University of Colorado School of Medicine, Aurora
| | - Kathryn L. Colborn
- Colorado School of Public Health, Department of Biostatics and Informatics, University of Colorado Denver, Aurora
| | - Derek E. Smith
- Children’s Hospital Colorado, Cancer Center Biostatistics Core, Department of Pediatrics, University of Colorado, Aurora
| | - Fuyong Xing
- Colorado School of Public Health, Department of Biostatics and Informatics, University of Colorado Denver, Aurora
| | - Debashis Ghosh
- Colorado School of Public Health, Department of Biostatics and Informatics, University of Colorado Denver, Aurora
| | - Michael A. Rosenberg
- Colorado Center for Personalized Medicine, University of Colorado School of Medicine, Aurora
- Division of Cardiology and Cardiac Electrophysiology, University of Colorado School of Medicine, Aurora
| |
Collapse
|
28
|
Shi P, Li G, Yuan Y, Kuang L. Outlier Detection Using Improved Support Vector Data Description in Wireless Sensor Networks. SENSORS (BASEL, SWITZERLAND) 2019; 19:E4712. [PMID: 31671540 PMCID: PMC6864849 DOI: 10.3390/s19214712] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 10/24/2019] [Accepted: 10/27/2019] [Indexed: 11/22/2022]
Abstract
Wireless sensor networks (WSNs) are susceptible to faults in sensor data. Outlier detection is crucial for ensuring the quality of data analysis in WSNs. This paper proposes a novel improved support vector data description method (ID-SVDD) to effectively detect outliers of sensor data. ID-SVDD utilizes the density distribution of data to compensate SVDD. The Parzen-window algorithm is applied to calculate the relative density for each data point in a data set. Meanwhile, we use Mahalanobis distance (MD) to improve the Gaussian function in Parzen-window density estimation. Through combining new relative density weight with SVDD, this approach can efficiently map the data points from sparse space to high-density space. In order to assess the outlier detection performance, the ID-SVDD algorithm was implemented on several datasets. The experimental results demonstrated that ID-SVDD achieved high performance, and could be applied in real water quality monitoring.
Collapse
Affiliation(s)
- Pei Shi
- School of IoT Engineering, Jiangnan University, Wuxi 214122, China.
- Freshwater Fisheries Research Center of Chinese Academy of Fishery Sciences, Wuxi 214081, China.
| | - Guanghui Li
- School of IoT Engineering, Jiangnan University, Wuxi 214122, China.
| | - Yongming Yuan
- Freshwater Fisheries Research Center of Chinese Academy of Fishery Sciences, Wuxi 214081, China.
| | - Liang Kuang
- School of IoT Engineering, Jiangsu Vocational College of Information Technology, Wuxi 214153, China.
| |
Collapse
|
29
|
Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol 2019; 110:12-22. [PMID: 30763612 DOI: 10.1016/j.jclinepi.2019.02.004] [Citation(s) in RCA: 976] [Impact Index Per Article: 162.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Revised: 01/18/2019] [Accepted: 02/05/2019] [Indexed: 02/06/2023]
Abstract
OBJECTIVES The objective of this study was to compare performance of logistic regression (LR) with machine learning (ML) for clinical prediction modeling in the literature. STUDY DESIGN AND SETTING We conducted a Medline literature search (1/2016 to 8/2017) and extracted comparisons between LR and ML models for binary outcomes. RESULTS We included 71 of 927 studies. The median sample size was 1,250 (range 72-3,994,872), with 19 predictors considered (range 5-563) and eight events per predictor (range 0.3-6,697). The most common ML methods were classification trees, random forests, artificial neural networks, and support vector machines. In 48 (68%) studies, we observed potential bias in the validation procedures. Sixty-four (90%) studies used the area under the receiver operating characteristic curve (AUC) to assess discrimination. Calibration was not addressed in 56 (79%) studies. We identified 282 comparisons between an LR and ML model (AUC range, 0.52-0.99). For 145 comparisons at low risk of bias, the difference in logit(AUC) between LR and ML was 0.00 (95% confidence interval, -0.18 to 0.18). For 137 comparisons at high risk of bias, logit(AUC) was 0.34 (0.20-0.47) higher for ML. CONCLUSION We found no evidence of superior performance of ML over LR. Improvements in methodology and reporting are needed for studies that compare modeling algorithms.
Collapse
Affiliation(s)
- Evangelia Christodoulou
- Department of Development & Regeneration, KU Leuven, Herestraat 49 box 805, Leuven, 3000 Belgium
| | - Jie Ma
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Botnar Research Centre, University of Oxford, Windmill Road, Oxford, OX3 7LD UK
| | - Gary S Collins
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Botnar Research Centre, University of Oxford, Windmill Road, Oxford, OX3 7LD UK; Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Ewout W Steyerberg
- Department of Biomedical Data Sciences, Leiden University Medical Centre, Albinusdreef 2, Leiden, 2333 ZA The Netherlands
| | - Jan Y Verbakel
- Department of Development & Regeneration, KU Leuven, Herestraat 49 box 805, Leuven, 3000 Belgium; Department of Public Health & Primary Care, KU Leuven, Kapucijnenvoer 33J box 7001, Leuven, 3000 Belgium; Nuffield Department of Primary Care Health Sciences, University of Oxford, Woodstock Road, Oxford, OX2 6GG UK
| | - Ben Van Calster
- Department of Development & Regeneration, KU Leuven, Herestraat 49 box 805, Leuven, 3000 Belgium; Department of Biomedical Data Sciences, Leiden University Medical Centre, Albinusdreef 2, Leiden, 2333 ZA The Netherlands.
| |
Collapse
|
30
|
Li Y, Pu Q, Li S, Zhang H, Wang X, Yao H, Zhao L. Machine learning methods for research highlight prediction in biomedical effects of nanomaterial application. Pattern Recognit Lett 2019. [DOI: 10.1016/j.patrec.2018.11.008] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
31
|
Huang Z, Huang C, Xie J, Ma J, Cao G, Huang Q, Shen B, Byers Kraus V, Pei F. Analysis of a large data set to identify predictors of blood transfusion in primary total hip and knee arthroplasty. Transfusion 2018; 58:1855-1862. [PMID: 30145838 DOI: 10.1111/trf.14783] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2017] [Revised: 03/05/2018] [Accepted: 03/05/2018] [Indexed: 02/05/2023]
Abstract
BACKGROUND The aim of this study was to identify the predictors of need for allogenic blood transfusion (ALBT) in primary lower limb total joint arthroplasty (TJA). STUDY DESIGN AND METHODS This study utilized a large dataset of 15,187 patients undergoing primary unilateral TJA. Risk factors and demographic information were extracted from the electronic health record. A predictive model was developed by both a random forest (RF) algorithm and logistic regression (LR). The area under the receiver operating characteristic curve (AUC-ROC) was used to compare the accuracy of the two methods. RESULTS The rate of ALBT was 18.9% in total. Patient-related factors associated with higher risk of an ALBT included female sex, American Society of Anesthesiologists (ASA) II, ASA III, and ASA IV. Surgery-related risk factors for ALBT were operative time, drain use, and amount of intraoperative blood loss. Higher preoperative hemoglobin and tranexamic acid use were associated with decreased risk for ALBT. The RF model had a better predictive accuracy (area under the curve [AUC] 0.84) than the LR model (AUC, 0.77; p < 0.001). CONCLUSION The risk factors identified in the current study can provide specific, personalized perioperative ALBT risk assessment for a patient considering lower limb TJA. Furthermore, the predictive accuracy of the RF algorithm was significantly higher than that of LR, making it a potential tool for future personalized preoperative prediction of risk for perioperative ALBT.
Collapse
Affiliation(s)
- ZeYu Huang
- Department of Orthopedic Surgery, West China Hospital, West China Medical School, Sichuan University
| | - Cheng Huang
- College of Cybersecurity, Chengdu, Sichuan Province, People's Republic of China
| | - JinWei Xie
- Department of Orthopedic Surgery, West China Hospital, West China Medical School, Sichuan University
| | - Jun Ma
- Department of Orthopedic Surgery, West China Hospital, West China Medical School, Sichuan University
| | - GuoRui Cao
- Department of Orthopedic Surgery, West China Hospital, West China Medical School, Sichuan University
| | - Qiang Huang
- Department of Orthopedic Surgery, West China Hospital, West China Medical School, Sichuan University
| | - Bin Shen
- Department of Orthopedic Surgery, West China Hospital, West China Medical School, Sichuan University
| | - Virginia Byers Kraus
- Duke Molecular Physiology Institute, Durham, North Carolina.,Division of Rheumatology, Department of Medicine, Duke University School of Medicine, Duke University, Durham, North Carolina
| | - FuXing Pei
- Department of Orthopedic Surgery, West China Hospital, West China Medical School, Sichuan University
| |
Collapse
|