1
|
Niceta M, Ciolfi A, Ferilli M, Pedace L, Cappelletti C, Nardini C, Hildonen M, Chiriatti L, Miele E, Dentici ML, Gnazzo M, Cesario C, Pisaneschi E, Baban A, Novelli A, Maitz S, Selicorni A, Squeo GM, Merla G, Dallapiccola B, Tumer Z, Digilio MC, Priolo M, Tartaglia M. DNA methylation profiling in Kabuki syndrome: reclassification of germline KMT2D VUS and sensitivity in validating postzygotic mosaicism. Eur J Hum Genet 2024:10.1038/s41431-024-01597-9. [PMID: 38528056 DOI: 10.1038/s41431-024-01597-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 03/05/2024] [Accepted: 03/13/2024] [Indexed: 03/27/2024] Open
Abstract
Autosomal dominant Kabuki syndrome (KS) is a rare multiple congenital anomalies/neurodevelopmental disorder caused by heterozygous inactivating variants or structural rearrangements of the lysine-specific methyltransferase 2D (KMT2D) gene. While it is often recognizable due to a distinctive gestalt, the disorder is clinically variable, and a phenotypic scoring system has been introduced to help clinicians to reach a clinical diagnosis. The phenotype, however, can be less pronounced in some patients, including those carrying postzygotic mutations. The full spectrum of pathogenic variation in KMT2D has not fully been characterized, which may hamper the clinical classification of a portion of these variants. DNA methylation (DNAm) profiling has successfully been used as a tool to classify variants in genes associated with several neurodevelopmental disorders, including KS. In this work, we applied a KS-specific DNAm signature in a cohort of 13 individuals with KMT2D VUS and clinical features suggestive or overlapping with KS. We succeeded in correctly classifying all the tested individuals, confirming diagnosis for three subjects and rejecting the pathogenic role of 10 VUS in the context of KS. In the latter group, exome sequencing allowed to identify the genetic cause underlying the disorder in three subjects. By testing five individuals with postzygotic pathogenic KMT2D variants, we also provide evidence that DNAm profiling has power to recognize pathogenic variants at different levels of mosaicism, identifying 15% as the minimum threshold for which DNAm profiling can be applied as an informative diagnostic tool in KS mosaics.
Collapse
Affiliation(s)
- Marcello Niceta
- Molecular Genetics and Functional Genomics, Bambino Gesù Children's Hospital, IRCCS, 00146, Rome, Italy
| | - Andrea Ciolfi
- Molecular Genetics and Functional Genomics, Bambino Gesù Children's Hospital, IRCCS, 00146, Rome, Italy
| | - Marco Ferilli
- Molecular Genetics and Functional Genomics, Bambino Gesù Children's Hospital, IRCCS, 00146, Rome, Italy
- Department of Computer, Control and Management Engineering, Sapienza University, 00185, Rome, Italy
| | - Lucia Pedace
- Department of Pediatric Hematology/Oncology, Cell and Gene Therapy, Bambino Gesù Children's Hospital, IRCCS, 00165, Rome, Italy
| | - Camilla Cappelletti
- Molecular Genetics and Functional Genomics, Bambino Gesù Children's Hospital, IRCCS, 00146, Rome, Italy
| | - Claudia Nardini
- Department of Pediatric Hematology/Oncology, Cell and Gene Therapy, Bambino Gesù Children's Hospital, IRCCS, 00165, Rome, Italy
| | - Mathis Hildonen
- Department of Clinical Genetics, Kennedy Center, Copenhagen University Hospital, Rigshopsitalet, 2600, Glostrup, Denmark
| | - Luigi Chiriatti
- Molecular Genetics and Functional Genomics, Bambino Gesù Children's Hospital, IRCCS, 00146, Rome, Italy
| | - Evelina Miele
- Department of Pediatric Hematology/Oncology, Cell and Gene Therapy, Bambino Gesù Children's Hospital, IRCCS, 00165, Rome, Italy
| | - Maria Lisa Dentici
- Medical Genetics Unit, Bambino Gesù Children's Hospital, IRCCS, 00146, Rome, Italy
| | - Maria Gnazzo
- Laboratory of Medical Genetics, Translational Cytogenomics Research Unit, Bambino Gesù Children's Hospital, IRCCS, 00146, Rome, Italy
| | - Claudia Cesario
- Laboratory of Medical Genetics, Translational Cytogenomics Research Unit, Bambino Gesù Children's Hospital, IRCCS, 00146, Rome, Italy
| | - Elisa Pisaneschi
- Laboratory of Medical Genetics, Translational Cytogenomics Research Unit, Bambino Gesù Children's Hospital, IRCCS, 00146, Rome, Italy
| | - Anwar Baban
- Pediatric Cardiology and Cardiac Arrhythmias Unit, Department of Pediatric Cardiology and Cardiac Surgery, Bambino Gesù Children's Hospital, IRCCS, 00146, Rome, Italy
| | - Antonio Novelli
- Laboratory of Medical Genetics, Translational Cytogenomics Research Unit, Bambino Gesù Children's Hospital, IRCCS, 00146, Rome, Italy
| | - Silvia Maitz
- Genetica Clinica Pediatrica, Fondazione MBBM, ASST Monza Ospedale San Gerardo, 20900, Monza, Italy
| | | | - Gabriella Maria Squeo
- Laboratory of Regulatory and Functional Genomics, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo, 71013, Foggia, Italy
| | - Giuseppe Merla
- Laboratory of Regulatory and Functional Genomics, Fondazione IRCCS Casa Sollievo della Sofferenza, San Giovanni Rotondo, 71013, Foggia, Italy
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, 80131, Naples, Italy
| | - Bruno Dallapiccola
- Molecular Genetics and Functional Genomics, Bambino Gesù Children's Hospital, IRCCS, 00146, Rome, Italy
| | - Zeynep Tumer
- Department of Clinical Genetics, Kennedy Center, Copenhagen University Hospital, Rigshopsitalet, 2600, Glostrup, Denmark
- Department of Clinical Medicine, Faculty of Medicine and Health Sciences, University of Copenhagen, 2200, Copenhagen, Denmark
| | | | - Manuela Priolo
- Medical and Laboratory Genetics, Antonio Cardarelli Hospital, 80131, Naples, Italy
| | - Marco Tartaglia
- Molecular Genetics and Functional Genomics, Bambino Gesù Children's Hospital, IRCCS, 00146, Rome, Italy.
| |
Collapse
|
2
|
Aeberhard JL, Radan AP, Delgado-Gonzalo R, Strahm KM, Sigurthorsdottir HB, Schneider S, Surbek D. Artificial intelligence and machine learning in cardiotocography: A scoping review. Eur J Obstet Gynecol Reprod Biol 2023; 281:54-62. [PMID: 36535071 DOI: 10.1016/j.ejogrb.2022.12.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Revised: 10/19/2022] [Accepted: 12/05/2022] [Indexed: 12/13/2022]
Abstract
INTRODUCTION Artificial intelligence (AI) is gaining more interest in the field of medicine due to its capacity to learn patterns directly from data. This becomes interesting for the field of cardiotocography (CTG) interpretation, since it promises to remove existing biases and improve the well-known issues of inter- and intra-observer variability. MATERIAL AND METHODS The objective of this study was to map current knowledge in AI-assisted interpretation of CTG tracings and thus, to present different approaches with their strengths, gaps, and limitations. The search was performed on Ovid Medline and PubMed databases. The Preferred Reporting Items for Systematic Reviews and meta-Analysis for Scoping Reviews (PRISMA-ScR) guidelines were followed. RESULTS We summarized 40 different studies investigating at least one algorithm or system to classify CTG tracings. In addition, the Oxford Sonicaid system is presented because of its wide use in clinical practice. CONCLUSIONS There are several promising approaches in this area, but none of them has gained big acceptance in clinical practice. Further investigation and refinement of the algorithms and features are needed to achieve a validated decision-support system. For this purpose, larger quantities of curated and labeled data may be necessary.
Collapse
Affiliation(s)
| | - Anda-Petronela Radan
- Department of Obstetrics and Feto-maternal Medicine, University Hospital of Bern, Switzerland
| | | | - Karin Maya Strahm
- Department of Obstetrics and Feto-maternal Medicine, University Hospital of Bern, Switzerland
| | | | - Sophie Schneider
- Department of Obstetrics and Feto-maternal Medicine, University Hospital of Bern, Switzerland
| | - Daniel Surbek
- Department of Obstetrics and Feto-maternal Medicine, University Hospital of Bern, Switzerland
| |
Collapse
|
3
|
Li X, Yan L, Wang X, Ouyang C, Wang C, Chao J, Zhang J, Lian G. Predictive models for endoscopic disease activity in patients with ulcerative colitis: Practical machine learning-based modeling and interpretation. Front Med (Lausanne) 2022; 9:1043412. [PMID: 36619650 PMCID: PMC9810755 DOI: 10.3389/fmed.2022.1043412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 12/07/2022] [Indexed: 12/24/2022] Open
Abstract
Background Endoscopic disease activity monitoring is important for the long-term management of patients with ulcerative colitis (UC), there is currently no widely accepted non-invasive method that can effectively predict endoscopic disease activity. We aimed to develop and validate machine learning (ML) models for predicting it, which are desired to reduce the frequency of endoscopic examinations and related costs. Methods The patients with a diagnosis of UC in two hospitals from January 2016 to January 2021 were enrolled in this study. Thirty nine clinical and laboratory variables were collected. All patients were divided into four groups based on MES or UCEIS scores. Logistic regression (LR) and four ML algorithms were applied to construct the prediction models. The performance of models was evaluated in terms of accuracy, sensitivity, precision, F1 score, and area under the receiver-operating characteristic curve (AUC). Then Shapley additive explanations (SHAP) was applied to determine the importance of the selected variables and interpret the ML models. Results A total of 420 patients were entered into the study. Twenty four variables showed statistical differences among the groups. After synthetic minority oversampling technique (SMOTE) oversampling and RFE variables selection, the random forests (RF) model with 23 variables in MES and the extreme gradient boosting (XGBoost) model with 21 variables in USEIS, had the greatest discriminatory ability (AUC = 0.8192 in MES and 0.8006 in UCEIS in the test set). The results obtained from SHAP showed that albumin, rectal bleeding, and CRP/ALB contributed the most to the overall model. In addition, the above three variables had a more balanced contribution to each classification under the MES than the UCEIS according to the SHAP values. Conclusion This proof-of-concept study demonstrated that the ML model could serve as an effective non-invasive approach to predicting endoscopic disease activity for patients with UC. RF and XGBoost, which were first introduced into data-based endoscopic disease activity prediction, are suitable for the present prediction modeling.
Collapse
Affiliation(s)
- Xiaojun Li
- Department of Gastroenterology, The Second Xiangya Hospital of Central South University, Research Center of Digestive Disease, Central South University, Changsha, China
| | - Lamei Yan
- Department of Gastroenterology, The Second Xiangya Hospital of Central South University, Research Center of Digestive Disease, Central South University, Changsha, China,Department of Gastroenterology, The First Affiliated Hospital of Shaoyang College, Shaoyang, Hunan, China
| | - Xuehong Wang
- Department of Gastroenterology, The Second Xiangya Hospital of Central South University, Research Center of Digestive Disease, Central South University, Changsha, China
| | - Chunhui Ouyang
- Department of Gastroenterology, The Second Xiangya Hospital of Central South University, Research Center of Digestive Disease, Central South University, Changsha, China
| | - Chunlian Wang
- Department of Gastroenterology, The Second Xiangya Hospital of Central South University, Research Center of Digestive Disease, Central South University, Changsha, China
| | - Jun Chao
- Department of Gastroenterology, The Second Xiangya Hospital of Central South University, Research Center of Digestive Disease, Central South University, Changsha, China,Hunan Aicortech Intelligent Research Institute Co., Changsha, Hunan, China
| | - Jie Zhang
- Department of Gastroenterology, The Second Xiangya Hospital of Central South University, Research Center of Digestive Disease, Central South University, Changsha, China,*Correspondence: Jie Zhang,
| | - Guanghui Lian
- Department of Gastroenterology, Xiangya Hospital of Central South University, Changsha, Hunan, China,Guanghui Lian,
| |
Collapse
|
4
|
Syrowatka A, Song W, Amato MG, Foer D, Edrees H, Co Z, Kuznetsova M, Dulgarian S, Seger DL, Simona A, Bain PA, Purcell Jackson G, Rhee K, Bates DW. Key use cases for artificial intelligence to reduce the frequency of adverse drug events: a scoping review. Lancet Digit Health 2021; 4:e137-e148. [PMID: 34836823 DOI: 10.1016/s2589-7500(21)00229-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Revised: 08/13/2021] [Accepted: 09/10/2021] [Indexed: 12/31/2022]
Abstract
Adverse drug events (ADEs) represent one of the most prevalent types of health-care-related harm, and there is substantial room for improvement in the way that they are currently predicted and detected. We conducted a scoping review to identify key use cases in which artificial intelligence (AI) could be leveraged to reduce the frequency of ADEs. We focused on modern machine learning techniques and natural language processing. 78 articles were included in the scoping review. Studies were heterogeneous and applied various AI techniques covering a wide range of medications and ADEs. We identified several key use cases in which AI could contribute to reducing the frequency and consequences of ADEs, through prediction to prevent ADEs and early detection to mitigate the effects. Most studies (73 [94%] of 78) assessed technical algorithm performance, and few studies evaluated the use of AI in clinical settings. Most articles (58 [74%] of 78) were published within the past 5 years, highlighting an emerging area of study. Availability of new types of data, such as genetic information, and access to unstructured clinical notes might further advance the field.
Collapse
Affiliation(s)
- Ania Syrowatka
- Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA.
| | - Wenyu Song
- Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Mary G Amato
- Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, USA; Massachusetts College of Pharmacy and Health Sciences, Boston, MA, USA
| | - Dinah Foer
- Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, USA; Division of Allergy and Clinical Immunology, Brigham and Women's Hospital, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Heba Edrees
- Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, USA; Massachusetts College of Pharmacy and Health Sciences, Boston, MA, USA
| | - Zoe Co
- Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | | | - Sevan Dulgarian
- Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Diane L Seger
- Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Aurélien Simona
- Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Paul A Bain
- Countway Library of Medicine, Harvard Medical School, Boston, MA, USA
| | - Gretchen Purcell Jackson
- IBM Watson Health, Cambridge, MA, USA; Department of Pediatric Surgery, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Kyu Rhee
- IBM Watson Health, Cambridge, MA, USA; CVS Health, Wellesley Hills, MA, USA
| | - David W Bates
- Division of General Internal Medicine, Brigham and Women's Hospital, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA; Harvard T H Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
5
|
Cheng Y, Chen C, Yang J, Yang H, Fu M, Zhong X, Wang B, He M, Hu Z, Zhang Z, Jin X, Kang Y, Wu Q. Using Machine Learning Algorithms to Predict Hospital Acquired Thrombocytopenia after Operation in the Intensive Care Unit: A Retrospective Cohort Study. Diagnostics (Basel) 2021; 11:diagnostics11091614. [PMID: 34573956 PMCID: PMC8466367 DOI: 10.3390/diagnostics11091614] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 08/25/2021] [Accepted: 09/01/2021] [Indexed: 02/05/2023] Open
Abstract
Hospital acquired thrombocytopenia (HAT) is a common hematological complication after surgery. This research aimed to develop and compare the performance of seven machine learning (ML) algorithms for predicting patients that are at risk of HAT after surgery. We conducted a retrospective cohort study which enrolled adult patients transferred to the intensive care unit (ICU) after surgery in West China Hospital of Sichuan University from January 2016 to December 2018. All subjects were randomly divided into a derivation set (70%) and test set (30%). ten-fold cross-validation was used to estimate the hyperparameters of ML algorithms during the training process in the derivation set. After ML models were developed, the sensitivity, specificity, area under the curve (AUC), and net benefit (decision analysis curve, DCA) were calculated to evaluate the performances of ML models in the test set. A total of 10,369 patients were included and in 1354 (13.1%) HAT occurred. The AUC of all seven ML models exceeded 0.7, the two highest were Gradient Boosting (GB) (0.834, 0.814-0.853, p < 0.001) and Random Forest (RF) (0.828, 0.807-0.848, p < 0.001). There was no difference between GB and RF (0.834 vs. 0.828, p = 0.293); however, these two were better than the remaining five models (p < 0.001). The DCA revealed that all ML models had high net benefits with a threshold probability approximately less than 0.6. In conclusion, we found that ML models constructed by multiple preoperative variables can predict HAT in patients transferred to ICU after surgery, which can improve risk stratification and guide management in clinical practice.
Collapse
Affiliation(s)
- Yisong Cheng
- Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu 610041, China; (Y.C.); (J.Y.); (H.Y.); (M.F.); (X.Z.); (B.W.); (M.H.); (Z.H.); (Z.Z.); (X.J.); (Y.K.)
| | - Chaoyue Chen
- Department of Neurosurgery, West China Hospital, Sichuan University, Chengdu 610041, China;
| | - Jie Yang
- Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu 610041, China; (Y.C.); (J.Y.); (H.Y.); (M.F.); (X.Z.); (B.W.); (M.H.); (Z.H.); (Z.Z.); (X.J.); (Y.K.)
| | - Hao Yang
- Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu 610041, China; (Y.C.); (J.Y.); (H.Y.); (M.F.); (X.Z.); (B.W.); (M.H.); (Z.H.); (Z.Z.); (X.J.); (Y.K.)
| | - Min Fu
- Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu 610041, China; (Y.C.); (J.Y.); (H.Y.); (M.F.); (X.Z.); (B.W.); (M.H.); (Z.H.); (Z.Z.); (X.J.); (Y.K.)
| | - Xi Zhong
- Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu 610041, China; (Y.C.); (J.Y.); (H.Y.); (M.F.); (X.Z.); (B.W.); (M.H.); (Z.H.); (Z.Z.); (X.J.); (Y.K.)
| | - Bo Wang
- Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu 610041, China; (Y.C.); (J.Y.); (H.Y.); (M.F.); (X.Z.); (B.W.); (M.H.); (Z.H.); (Z.Z.); (X.J.); (Y.K.)
| | - Min He
- Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu 610041, China; (Y.C.); (J.Y.); (H.Y.); (M.F.); (X.Z.); (B.W.); (M.H.); (Z.H.); (Z.Z.); (X.J.); (Y.K.)
| | - Zhi Hu
- Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu 610041, China; (Y.C.); (J.Y.); (H.Y.); (M.F.); (X.Z.); (B.W.); (M.H.); (Z.H.); (Z.Z.); (X.J.); (Y.K.)
| | - Zhongwei Zhang
- Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu 610041, China; (Y.C.); (J.Y.); (H.Y.); (M.F.); (X.Z.); (B.W.); (M.H.); (Z.H.); (Z.Z.); (X.J.); (Y.K.)
| | - Xiaodong Jin
- Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu 610041, China; (Y.C.); (J.Y.); (H.Y.); (M.F.); (X.Z.); (B.W.); (M.H.); (Z.H.); (Z.Z.); (X.J.); (Y.K.)
| | - Yan Kang
- Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu 610041, China; (Y.C.); (J.Y.); (H.Y.); (M.F.); (X.Z.); (B.W.); (M.H.); (Z.H.); (Z.Z.); (X.J.); (Y.K.)
| | - Qin Wu
- Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu 610041, China; (Y.C.); (J.Y.); (H.Y.); (M.F.); (X.Z.); (B.W.); (M.H.); (Z.H.); (Z.Z.); (X.J.); (Y.K.)
- Correspondence: ; Tel.: +86-028-8542-2506
| |
Collapse
|
6
|
Vepa A, Saleem A, Rakhshan K, Daneshkhah A, Sedighi T, Shohaimi S, Omar A, Salari N, Chatrabgoun O, Dharmaraj D, Sami J, Parekh S, Ibrahim M, Raza M, Kapila P, Chakrabarti P. Using Machine Learning Algorithms to Develop a Clinical Decision-Making Tool for COVID-19 Inpatients. Int J Environ Res Public Health 2021; 18:6228. [PMID: 34207560 DOI: 10.3390/ijerph18126228] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/18/2021] [Revised: 05/28/2021] [Accepted: 06/01/2021] [Indexed: 12/21/2022]
Abstract
Background: Within the UK, COVID-19 has contributed towards over 103,000 deaths. Although multiple risk factors for COVID-19 have been identified, using this data to improve clinical care has proven challenging. The main aim of this study is to develop a reliable, multivariable predictive model for COVID-19 in-patient outcomes, thus enabling risk-stratification and earlier clinical decision-making. Methods: Anonymised data consisting of 44 independent predictor variables from 355 adults diagnosed with COVID-19, at a UK hospital, was manually extracted from electronic patient records for retrospective, case–control analysis. Primary outcomes included inpatient mortality, required ventilatory support, and duration of inpatient treatment. Pulmonary embolism sequala was the only secondary outcome. After balancing data, key variables were feature selected for each outcome using random forests. Predictive models were then learned and constructed using Bayesian networks. Results: The proposed probabilistic models were able to predict, using feature selected risk factors, the probability of the mentioned outcomes. Overall, our findings demonstrate reliable, multivariable, quantitative predictive models for four outcomes, which utilise readily available clinical information for COVID-19 adult inpatients. Further research is required to externally validate our models and demonstrate their utility as risk stratification and clinical decision-making tools.
Collapse
|
7
|
Karabulut OC, Karpuzcu BA, Türk E, Ibrahim AH, Süzek BE. ML-AdVInfect: A Machine-Learning Based Adenoviral Infection Predictor. Front Mol Biosci 2021; 8:647424. [PMID: 34026828 PMCID: PMC8139618 DOI: 10.3389/fmolb.2021.647424] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Accepted: 04/22/2021] [Indexed: 01/08/2023] Open
Abstract
Adenoviruses (AdVs) constitute a diverse family with many pathogenic types that infect a broad range of hosts. Understanding the pathogenesis of adenoviral infections is not only clinically relevant but also important to elucidate the potential use of AdVs as vectors in therapeutic applications. For an adenoviral infection to occur, attachment of the viral ligand to a cellular receptor on the host organism is a prerequisite and, in this sense, it is a criterion to decide whether an adenoviral infection can potentially happen. The interaction between any virus and its corresponding host organism is a specific kind of protein-protein interaction (PPI) and several experimental techniques, including high-throughput methods are being used in exploring such interactions. As a result, there has been accumulating data on virus-host interactions including a significant portion reported at publicly available bioinformatics resources. There is not, however, a computational model to integrate and interpret the existing data to draw out concise decisions, such as whether an infection happens or not. In this study, accepting the cellular entry of AdV as a decisive parameter for infectivity, we have developed a machine learning, more precisely support vector machine (SVM), based methodology to predict whether adenoviral infection can take place in a given host. For this purpose, we used the sequence data of the known receptors of AdVs, we identified sets of adenoviral ligands and their respective host species, and eventually, we have constructed a comprehensive adenovirus–host interaction dataset. Then, we committed interaction predictions through publicly available virus-host PPI tools and constructed an AdV infection predictor model using SVM with RBF kernel, with the overall sensitivity, specificity, and AUC of 0.88 ± 0.011, 0.83 ± 0.064, and 0.86 ± 0.030, respectively. ML-AdVInfect is the first of its kind as an effective predictor to screen the infection capacity along with anticipating any cross-species shifts. We anticipate our approach led to ML-AdVInfect can be adapted in making predictions for other viral infections.
Collapse
Affiliation(s)
- Onur Can Karabulut
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Betül Asiye Karpuzcu
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Erdem Türk
- Department of Computer Engineering, Faculty of Engineering, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Ahmad Hassan Ibrahim
- Department of Computer Engineering, Faculty of Engineering, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Barış Ethem Süzek
- Department of Computer Engineering, Faculty of Engineering, Muğla Sıtkı Koçman University, Muğla, Turkey.,Georgetown University Medical Center, Biochemistry and Molecular and Cellular Biology, Washington, DC, United States
| |
Collapse
|
8
|
Tan TH, Hsu CC, Chen CJ, Hsu SL, Liu TL, Lin HJ, Wang JJ, Liu CF, Huang CC. Predicting outcomes in older ED patients with influenza in real time using a big data-driven and machine learning approach to the hospital information system. BMC Geriatr 2021; 21:280. [PMID: 33902485 PMCID: PMC8077903 DOI: 10.1186/s12877-021-02229-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 04/19/2021] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Predicting outcomes in older patients with influenza in the emergency department (ED) by machine learning (ML) has never been implemented. Therefore, we conducted this study to clarify the clinical utility of implementing ML. METHODS We recruited 5508 older ED patients (≥65 years old) in three hospitals between 2009 and 2018. Patients were randomized into a 70%/30% split for model training and testing. Using 10 clinical variables from their electronic health records, a prediction model using the synthetic minority oversampling technique preprocessing algorithm was constructed to predict five outcomes. RESULTS The best areas under the curves of predicting outcomes were: random forest model for hospitalization (0.840), pneumonia (0.765), and sepsis or septic shock (0.857), XGBoost for intensive care unit admission (0.902), and logistic regression for in-hospital mortality (0.889) in the testing data. The predictive model was further applied in the hospital information system to assist physicians' decisions in real time. CONCLUSIONS ML is a promising way to assist physicians in predicting outcomes in older ED patients with influenza in real time. Evaluations of the effectiveness and impact are needed in the future.
Collapse
Affiliation(s)
- Tian-Hoe Tan
- Department of Emergency Medicine, Chi Mei Medical Center, 901 Zhonghua Road, Yongkang District, Tainan City, 710, Taiwan
| | - Chien-Chin Hsu
- Department of Emergency Medicine, Chi Mei Medical Center, 901 Zhonghua Road, Yongkang District, Tainan City, 710, Taiwan
- Department of Biotechnology, Southern Taiwan University of Science and Technology, Tainan, Taiwan
| | - Chia-Jung Chen
- Information Systems, Chi Mei Medical Center, Tainan, Taiwan
| | - Shu-Lien Hsu
- Department of Nursing, Chi Mei Medical Center, Tainan, Taiwan
| | - Tzu-Lan Liu
- Information Systems, Chi Mei Medical Center, Tainan, Taiwan
| | - Hung-Jung Lin
- Department of Emergency Medicine, Chi Mei Medical Center, 901 Zhonghua Road, Yongkang District, Tainan City, 710, Taiwan
- Department of Emergency Medicine, Taipei Medical University, Taipei, Taiwan
| | - Jhi-Joung Wang
- Department of Medical Research, Chi Mei Medical Center, Tainan, Taiwan
- Allied AI Biomed Center, Southern Taiwan University of Science and Technology, Tainan, Taiwan
| | - Chung-Feng Liu
- Department of Medical Research, Chi Mei Medical Center, Tainan, Taiwan
| | - Chien-Cheng Huang
- Department of Emergency Medicine, Chi Mei Medical Center, 901 Zhonghua Road, Yongkang District, Tainan City, 710, Taiwan.
- Department of Senior Services, Southern Taiwan University of Science and Technology, Tainan, Taiwan.
- Department of Environmental and Occupational Health, College of Medicine, National Cheng Kung University, Tainan, Taiwan.
| |
Collapse
|
9
|
Vijayvargiya A, Prakash C, Kumar R, Bansal S, R.s. Tavares JM. Human knee abnormality detection from imbalanced sEMG data. Biomed Signal Process Control 2021; 66:102406. [DOI: 10.1016/j.bspc.2021.102406] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
10
|
Antaki F, Kahwati G, Sebag J, Coussa RG, Fanous A, Duval R, Sebag M. Predictive modeling of proliferative vitreoretinopathy using automated machine learning by ophthalmologists without coding experience. Sci Rep 2020; 10:19528. [PMID: 33177614 PMCID: PMC7658348 DOI: 10.1038/s41598-020-76665-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Accepted: 11/01/2020] [Indexed: 11/23/2022] Open
Abstract
We aimed to assess the feasibility of machine learning (ML) algorithm design to predict proliferative vitreoretinopathy (PVR) by ophthalmologists without coding experience using automated ML (AutoML). The study was a retrospective cohort study of 506 eyes who underwent pars plana vitrectomy for rhegmatogenous retinal detachment (RRD) by a single surgeon at a tertiary-care hospital between 2012 and 2019. Two ophthalmologists without coding experience used an interactive application in MATLAB to build and evaluate ML algorithms for the prediction of postoperative PVR using clinical data from the electronic health records. The clinical features associated with postoperative PVR were determined by univariate feature selection. The area under the curve (AUC) for predicting postoperative PVR was better for models that included pre-existing PVR as an input. The quadratic support vector machine (SVM) model built using all selected clinical features had an AUC of 0.90, a sensitivity of 63.0%, and a specificity of 97.8%. An optimized Naïve Bayes algorithm that did not include pre-existing PVR as an input feature had an AUC of 0.81, a sensitivity of 54.3%, and a specificity of 92.4%. In conclusion, the development of ML models for the prediction of PVR by ophthalmologists without coding experience is feasible. Input from a data scientist might still be needed to tackle class imbalance-a common challenge in ML classification using real-world clinical data.
Collapse
Affiliation(s)
- Fares Antaki
- Department of Ophthalmology, Université de Montréal, Montreal, QC, Canada
- Department of Ophthalmology, Centre Hospitalier de l'Université de Montréal (CHUM), Montreal, QC, Canada
- Centre Universitaire d'Ophtalmologie (CUO), Hôpital Maisonneuve-Rosemont, CIUSSS de l'Est-de-l'Île-de-Montréal, Montreal, QC, Canada
| | - Ghofril Kahwati
- Institut National des Sciences Appliquées de Toulouse (INSA Toulouse), Toulouse, France
- École de Technologie Supérieure (ÉTS), Montreal, QC, Canada
| | - Julia Sebag
- Department of Ophthalmology, Université de Montréal, Montreal, QC, Canada
| | - Razek Georges Coussa
- Department of Ophthalmology and Visual Sciences, Carver College of Medicine, University of Iowa, Iowa City, IA, USA
| | - Anthony Fanous
- Faculty of Medicine, McGill University, Montreal, QC, Canada
| | - Renaud Duval
- Department of Ophthalmology, Université de Montréal, Montreal, QC, Canada
- Centre Universitaire d'Ophtalmologie (CUO), Hôpital Maisonneuve-Rosemont, CIUSSS de l'Est-de-l'Île-de-Montréal, Montreal, QC, Canada
| | - Mikael Sebag
- Department of Ophthalmology, Université de Montréal, Montreal, QC, Canada.
- Department of Ophthalmology, Centre Hospitalier de l'Université de Montréal (CHUM), Montreal, QC, Canada.
| |
Collapse
|
11
|
Abstract
OBJECTIVE This study aims to build a predictive model for "return to work" (RTW) after sick leave by using a machine-learning algorithm. METHODS Panel data of 2000 participants (1686 males and 314 females) from the Labor Welfare Research Institute of the Korea Workers' Compensation & Welfare Service were used. A gradient boosting machine (GBM) was used to build the predictive model. RESULTS The GBM showed excellent performance in a binary classification (returned to work vs not working). However, the model of the three-group classification showed suboptimal performance. CONCLUSIONS Although machine-learning algorithms using common predictive factors can accurately predict whether one can work after sick leave, they cannot differentiate the form of returning to work. Future research with detailed information based on the injury or disease is warranted.
Collapse
|
12
|
Zhang PI, Hsu CC, Kao Y, Chen CJ, Kuo YW, Hsu SL, Liu TL, Lin HJ, Wang JJ, Liu CF, Huang CC. Real-time AI prediction for major adverse cardiac events in emergency department patients with chest pain. Scand J Trauma Resusc Emerg Med 2020; 28:93. [PMID: 32917261 DOI: 10.1186/s13049-020-00786-x] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2020] [Accepted: 09/02/2020] [Indexed: 02/07/2023] Open
Abstract
Background A big-data-driven and artificial intelligence (AI) with machine learning (ML) approach has never been integrated with the hospital information system (HIS) for predicting major adverse cardiac events (MACE) in patients with chest pain in the emergency department (ED). Therefore, we conducted the present study to clarify it. Methods In total, 85,254 ED patients with chest pain in three hospitals between 2009 and 2018 were identified. We randomized the patients into a 70%/30% split for ML model training and testing. We used 14 clinical variables from their electronic health records to construct a random forest model with the synthetic minority oversampling technique preprocessing algorithm to predict acute myocardial infarction (AMI) < 1 month and all-cause mortality < 1 month. Comparisons of the predictive accuracies among random forest, logistic regression, support-vector clustering (SVC), and K-nearest neighbor (KNN) models were also performed. Results Predicting MACE using the random forest model produced areas under the curves (AUC) of 0.915 for AMI < 1 month and 0.999 for all-cause mortality < 1 month. The random forest model had better predictive accuracy than logistic regression, SVC, and KNN. We further integrated the AI prediction model with the HIS to assist physicians with decision-making in real time. Validation of the AI prediction model by new patients showed AUCs of 0.907 for AMI < 1 month and 0.888 for all-cause mortality < 1 month. Conclusions An AI real-time prediction model is a promising method for assisting physicians in predicting MACE in ED patients with chest pain. Further studies to evaluate the impact on clinical practice are warranted.
Collapse
|
13
|
Park YW, Choi D, Lee J, Ahn SS, Lee SK, Lee SH, Bang M. Differentiating patients with schizophrenia from healthy controls by hippocampal subfields using radiomics. Schizophr Res 2020; 223:337-44. [PMID: 32988740 DOI: 10.1016/j.schres.2020.09.009] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Revised: 08/11/2020] [Accepted: 09/14/2020] [Indexed: 12/16/2022]
Abstract
BACKGROUND Accurately diagnosing schizophrenia is still challenging due to the lack of validated biomarkers. Here, we aimed to investigate whether radiomic features in bilateral hippocampal subfields from magnetic resonance images (MRIs) can differentiate patients with schizophrenia from healthy controls (HCs). METHODS A total of 152 participants with MRI (86 schizophrenia and 66 HCs) were allocated to training (n = 106) and test (n = 46) sets. Radiomic features (n = 642) from the bilateral hippocampal subfields processed with automatic segmentation techniques were extracted from T1-weighted MRIs. After feature selection, various combinations of classifiers (logistic regression, extra-trees, AdaBoost, XGBoost, or support vector machine) and subsampling were trained. The performance of the classifier was validated in the test set by determining the area under the curve (AUC). Furthermore, the association between selected radiomic features and clinical symptoms in schizophrenia was assessed. RESULTS Thirty radiomic features were identified to differentiate participants with schizophrenia from HCs. In the training set, the AUC exhibited poor to good performance (range: 0.683-0.861). The best performing radiomics model in the test set was achieved by the mutual information feature selection and logistic regression with an AUC, accuracy, sensitivity, and specificity of 0.821 (95% confidence interval 0.681-0.961), 82.1%, 76.9%, and 70%, respectively. Greater maximum values in the left cornu ammonis 1-3 subfield were associated with a higher severity of positive symptoms and general psychopathology in participants with schizophrenia. CONCLUSION Radiomic features from hippocampal subfields may be useful biomarkers for identifying schizophrenia.
Collapse
|
14
|
Mishra S, Mallick PK, Jena L, Chae GS. Optimization of Skewed Data Using Sampling-Based Preprocessing Approach. Front Public Health 2020; 8:274. [PMID: 32766193 PMCID: PMC7378392 DOI: 10.3389/fpubh.2020.00274] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2020] [Accepted: 05/26/2020] [Indexed: 11/26/2022] Open
Abstract
In the past few years, classification has undergone some major evolution. With a constant surge of the amount of data gathered from different sources, efficient processing and analysis of data is becoming difficult. Due to the uneven distribution of data among classes, data classification with machine-learning techniques has become more tedious. While most algorithms focus on major data samples, they ignore the minor class data. Thus, the data-skewing issue is one of the critical problems that need attention of researchers. The paper stresses upon data preprocessing using sampling techniques to overcome the data-skewing problem. Here, three different sampling techniques such as Resampling, SpreadSubSampling, and SMOTE are implemented to reduce this uneven data distribution issue and classified with the K-nearest neighbor algorithm. The performance of classification is evaluated with various performance metrics to determine the efficiency of classification.
Collapse
Affiliation(s)
- Sushruta Mishra
- School of Computer Engineering, Kalinga Institute of Industrial Technology, Deemed to be University, Bhubaneswar, India
| | - Pradeep Kumar Mallick
- School of Computer Engineering, Kalinga Institute of Industrial Technology, Deemed to be University, Bhubaneswar, India
| | - Lambodar Jena
- Department of Computer Science and Engineering, Siksha 'O' Anusandhan Deemed to be University, Bhubaneswar, India
| | - Gyoo-Soo Chae
- Division of Information & Communication, Baekseok University, ChePonan-si, South Korea
| |
Collapse
|
15
|
Gao L, Wu S. Response score of deep learning for out-of-distribution sample detection of medical images. J Biomed Inform 2020; 107:103442. [PMID: 32450299 DOI: 10.1016/j.jbi.2020.103442] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 05/02/2020] [Accepted: 05/05/2020] [Indexed: 02/07/2023]
Abstract
Deep learning Convolutional Neural Networks have achieved remarkable performance in a variety of classification tasks. The data-driven nature of deep learning indicates that a model behaves in response to the data used to train the model, and the quality of datasets may lead to substantial influence on the model's performance, especially when dealing with complicated clinical images. In this paper, we propose a simple and novel method to investigate and quantify a deep learning model's response with respect to a given sample, allowing us to detect out-of-distribution samples based on a newly proposed metric, Response Score. The key idea is that samples belonging to different classes may have different degrees of influence on a model. We quantify the resulting consequence of a single sample to a trained-model and relate the quantitative measure of the consequence (by the Response Score) to detect the out-of-distribution samples. The proposed method can find multiple applications such as (1) recognizing abnormal samples, (2) detecting mixed-domain data, and (3) identifying mislabeled data. We present extensive experiments on the three different applications using four biomedical imaging datasets. Experimental results show that our method exhibits remarkable performance and outperforms the compared methods.
Collapse
|
16
|
Rahman R, Kodesh A, Levine SZ, Sandin S, Reichenberg A, Schlessinger A. Identification of newborns at risk for autism using electronic medical records and machine learning. Eur Psychiatry 2020; 63:e22. [PMID: 32100657 PMCID: PMC7315872 DOI: 10.1192/j.eurpsy.2020.17] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Current approaches for early identification of individuals at high risk for autism spectrum disorder (ASD) in the general population are limited, and most ASD patients are not identified until after the age of 4. This is despite substantial evidence suggesting that early diagnosis and intervention improves developmental course and outcome. The aim of the current study was to test the ability of machine learning (ML) models applied to electronic medical records (EMRs) to predict ASD early in life, in a general population sample. METHODS We used EMR data from a single Israeli Health Maintenance Organization, including EMR information for parents of 1,397 ASD children (ICD-9/10) and 94,741 non-ASD children born between January 1st, 1997 and December 31st, 2008. Routinely available parental sociodemographic information, parental medical histories, and prescribed medications data were used to generate features to train various ML algorithms, including multivariate logistic regression, artificial neural networks, and random forest. Prediction performance was evaluated with 10-fold cross-validation by computing the area under the receiver operating characteristic curve (AUC; C-statistic), sensitivity, specificity, accuracy, false positive rate, and precision (positive predictive value [PPV]). RESULTS All ML models tested had similar performance. The average performance across all models had C-statistic of 0.709, sensitivity of 29.93%, specificity of 98.18%, accuracy of 95.62%, false positive rate of 1.81%, and PPV of 43.35% for predicting ASD in this dataset. CONCLUSIONS We conclude that ML algorithms combined with EMR capture early life ASD risk as well as reveal previously unknown features to be associated with ASD-risk. Such approaches may be able to enhance the ability for accurate and efficient early detection of ASD in large populations of children.
Collapse
Affiliation(s)
- Rayees Rahman
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Arad Kodesh
- Department of Mental Health, Meuhedet Health Services, Tel Aviv, Israel.,Department of Community Health, University of Haifa, Haifa, Israel
| | - Stephen Z Levine
- Department of Community Health, University of Haifa, Haifa, Israel
| | - Sven Sandin
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York, USA.,Seaver Center for Autism Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Abraham Reichenberg
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York, USA.,Seaver Center for Autism Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, New York, USA.,MINDICH Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, New York, USA.,Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Avner Schlessinger
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| |
Collapse
|
17
|
Abstract
Autism spectrum disorder (ASD) emerges during early childhood and is marked by a relatively narrow window in which infants transition from exhibiting normative behavioral profiles to displaying the defining features of the ASD phenotype in toddlerhood. Prospective brain imaging studies in infants at high familial risk for autism have revealed important insights into the neurobiology and developmental unfolding of ASD. In this article, we review neuroimaging studies of brain development in ASD from birth through toddlerhood, relate these findings to candidate neurobiological mechanisms, and discuss implications for future research and translation to clinical practice.
Collapse
Affiliation(s)
- Jessica B Girault
- Carolina Institute for Developmental Disabilities, The University of North Carolina at Chapel Hill School of Medicine, 101 Renee Lynne Court, Chapel Hill, NC 27599, USA.
| | - Joseph Piven
- Carolina Institute for Developmental Disabilities, The University of North Carolina at Chapel Hill School of Medicine, 101 Renee Lynne Court, Chapel Hill, NC 27599, USA
| |
Collapse
|
18
|
|
19
|
Atallah DM, Badawy M, El-sayed A. A new proposed feature selection method to predict kidney transplantation outcome. Health Technol 2019; 9:847-856. [DOI: 10.1007/s12553-019-00369-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
20
|
Li J, Ogrodnik M, Kolachalama VB, Lin H, Au R. Assessment of the Mid-Life Demographic and Lifestyle Risk Factors of Dementia Using Data from the Framingham Heart Study Offspring Cohort. J Alzheimers Dis 2019; 63:1119-1127. [PMID: 29710704 DOI: 10.3233/jad-170917] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
BACKGROUND Dementia is the leading cause of dependence and disability in the elderly population worldwide. However, currently there is no effective medication for dementia treatment. Therefore, identifying lifestyle-related risk factors including some that are modifiable may provide important strategies for reducing risk of dementia. OBJECTIVE This study aims to highlight associations between easily obtainable lifestyle risk factors in mid-life and dementia in later adulthood. METHODS Using data from the Framingham Heart Study Offspring cohort, we leveraged well-known classification models (decision tree classifier and random forests) to associate demographic and lifestyle behavioral data with dementia status. We then evaluated model performance by computing area under receiver operating characteristic (ROC) curve. RESULTS As expected, age was strongly associated with dementia. The analysis also identified 'widowed' marital status, lower BMI, and less sleep at mid-life as risk factors of dementia. The areas under the ROC curves were 0.79 for the decision tree, and 0.89 for the random forest model. CONCLUSION Demographic and lifestyle factors that are non-invasive and inexpensive to implement can be assessed in midlife and used to potentially modify the risk of dementia in late adulthood. Classification models can help identify associations between dementia and midlife lifestyle risk factors. These findings inform further research, in order to help public health officials develop targeted programs for dementia prevention.
Collapse
Affiliation(s)
- Jinlei Li
- School of Public Health, Peking Union Medical School, Beijing, China.,Framingham Heart Study, Boston University School of Medicine, Boston, MA, USA
| | - Matthew Ogrodnik
- Division of Graduate Medical Sciences, Boston University School of Medicine, Boston, MA, USA
| | - Vijaya B Kolachalama
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, MA, USA.,Whitaker Cardiovascular Institute, Boston University School of Medicine, Boston, MA, USA
| | - Honghuang Lin
- Framingham Heart Study, Boston University School of Medicine, Boston, MA, USA.,Section of Computational Biomedicine, Boston University School of Medicine, Boston, MA, USA.,National Heart Lung and Blood Institute Framingham Heart Study, Framingham, MA, USA
| | - Rhoda Au
- Framingham Heart Study, Boston University School of Medicine, Boston, MA, USA.,Department of Anatomy & Neurobiology, Neurology and Epidemiology, Boston University Schools of Medicine and Public Health, Boston, MA, USA
| |
Collapse
|
21
|
Abstract
In this study, an attempt has been made to identify the origin of multifractality in uterine electromyography signals and to differentiate term (gestational age > 37 weeks) and preterm (gestational age ≤ 37 weeks) conditions by multifractal detrended moving average technique. The signals obtained from a publicly available database, recorded from the abdominal surface during the second trimester, are used in this study. The signals are preprocessed and converted to shuffle and surrogate series to examine the source of multifractality. Multifractal detrended moving average algorithm is applied on all the signals. The presence of multifractality is verified using scaling exponents, and multifractal spectral features are extracted from the spectrum. The variation of multifractal features in term and preterm conditions is analyzed statistically using Student's t-test. The results of scaling exponents show that the uterine electromyography or electrohysterography signals reveal multifractal characteristics in term and preterm conditions. Further investigation indicates the existence of long-range correlation as the primary source of multifractality. Among all extracted features, strength of multifractality, exponent index, and maximum and peak singularity exponents are statistically significant ( p < 0.05) in differentiating term and preterm conditions. The coefficient of variation is found to be lower for strength of multifractality and peak singularity exponent, which reveal that these features exhibit less inter-subject variance. Hence, it appears that multifractal analysis can aid in the diagnosis of preterm or term delivery of pregnant women.
Collapse
Affiliation(s)
- N Punitha
- Non-Invasive Imaging and Diagnostic (NIID) Laboratory, Biomedical Engineering Group, Department of Applied Mechanics, Indian Institute of Technology Madras, Chennai, India
| | - S Ramakrishnan
- Non-Invasive Imaging and Diagnostic (NIID) Laboratory, Biomedical Engineering Group, Department of Applied Mechanics, Indian Institute of Technology Madras, Chennai, India
| |
Collapse
|
22
|
Tan X, Su S, Huang Z, Guo X, Zuo Z, Sun X, Li L. Wireless Sensor Networks Intrusion Detection Based on SMOTE and the Random Forest Algorithm. Sensors (Basel) 2019; 19:E203. [PMID: 30626020 DOI: 10.3390/s19010203] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2018] [Revised: 12/27/2018] [Accepted: 01/04/2019] [Indexed: 11/21/2022]
Abstract
With the wide application of wireless sensor networks in military and environmental monitoring, security issues have become increasingly prominent. Data exchanged over wireless sensor networks is vulnerable to malicious attacks due to the lack of physical defense equipment. Therefore, corresponding schemes of intrusion detection are urgently needed to defend against such attacks. Considering the serious class imbalance of the intrusion dataset, this paper proposes a method of using the synthetic minority oversampling technique (SMOTE) to balance the dataset and then uses the random forest algorithm to train the classifier for intrusion detection. The simulations are conducted on a benchmark intrusion dataset, and the accuracy of the random forest algorithm has reached 92.39%, which is higher than other comparison algorithms. After oversampling the minority samples, the accuracy of the random forest combined with the SMOTE has increased to 92.57%. This shows that the proposed algorithm provides an effective solution to solve the problem of class imbalance and improves the performance of intrusion detection.
Collapse
|
23
|
Abstract
Objective Clinical research literature focuses primarily on the most common causes of maternal morbidity and mortality (MMM). We explore sections of the discharge summaries of pregnant or postpartum women admitted to an intensive care unit (ICU) to identify associated disorders and mine the literature to identify knowledge gaps in clinical research. Methods Data for the study were discharge summaries in the MIMIC (Medical Information Mart for Intensive Care) database. We extracted a control cohort to study if there is a difference in comorbidities between pregnant and not pregnant patients with similar reasons for admission. We identified comorbidities of the Unified Medical Language System (UMLS) semantic types disease or syndrome, Mental or behavioral dysfunction, and injury, or poisoning. We used Entrez programming utilities (E-utilities) to query PubMed ® . Results We identified 246 pregnant and postpartum patients. A control group of 587 not pregnancy related admissions matched on age and admit diagnosis. We found overlap of 24.3% discharge diagnoses between the two groups, and 7.5% of the codes exclusively in the pregnancy group. We identified 33 disease mentions not included in the most common reported causes of MMM. Conclusion Our results demonstrate that clinical text provides additional comorbidities associated with maternal complications that need further clinical research.
Collapse
Affiliation(s)
- Laritza M Rodriguez
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, National Institutes of Health, Bethesda, Maryland
| | - Dina Demner Fushman
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, National Institutes of Health, Bethesda, Maryland
| |
Collapse
|
24
|
Fergus P, Selvaraj M, Chalmers C. Machine learning ensemble modelling to classify caesarean section and vaginal delivery types using Cardiotocography traces. Comput Biol Med 2017; 93:7-16. [PMID: 29248699 DOI: 10.1016/j.compbiomed.2017.12.002] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2017] [Revised: 12/06/2017] [Accepted: 12/07/2017] [Indexed: 10/18/2022]
Abstract
Human visual inspection of Cardiotocography traces is used to monitor the foetus during labour and avoid neonatal mortality and morbidity. The problem, however, is that visual interpretation of Cardiotocography traces is subject to high inter and intra observer variability. Incorrect decisions, caused by miss-interpretation, can lead to adverse perinatal outcomes and in severe cases death. This study presents a review of human Cardiotocography trace interpretation and argues that machine learning, used as a decision support system by obstetricians and midwives, may provide an objective measure alongside normal practices. This will help to increase predictive capacity and reduce negative outcomes. A robust methodology is presented for feature set engineering using an open database comprising 552 intrapartum recordings. State-of-the-art in signal processing techniques is applied to raw Cardiotocography foetal heart rate traces to extract 13 features. Those with low discriminative capacity are removed using Recursive Feature Elimination. The dataset is imbalanced with significant differences between the prior probabilities of both normal deliveries and those delivered by caesarean section. This issue is addressed by oversampling the training instances using a synthetic minority oversampling technique to provide a balanced class distribution. Several simple, yet powerful, machine-learning algorithms are trained, using the feature set, and their performance is evaluated with real test data. The results are encouraging using an ensemble classifier comprising Fishers Linear Discriminant Analysis, Random Forest and Support Vector Machine classifiers, with 87% (95% Confidence Interval: 86%, 88%) for Sensitivity, 90% (95% CI: 89%, 91%) for Specificity, and 96% (95% CI: 96%, 97%) for the Area Under the Curve, with a 9% (95% CI: 9%, 10%) Mean Square Error.
Collapse
Affiliation(s)
- Paul Fergus
- Liverpool John Moores University, Faculty of Engineering and Technology, Data Science Research Centre, Department of Computer Science, Byron Street, Liverpool, L3 3AF, United Kingdom.
| | - Malarvizhi Selvaraj
- Liverpool John Moores University, Faculty of Engineering and Technology, Data Science Research Centre, Department of Computer Science, Byron Street, Liverpool, L3 3AF, United Kingdom.
| | - Carl Chalmers
- Liverpool John Moores University, Faculty of Engineering and Technology, Data Science Research Centre, Department of Computer Science, Byron Street, Liverpool, L3 3AF, United Kingdom.
| |
Collapse
|
25
|
Fergus P, Hussain A, Al-Jumeily D, Huang DS, Bouguila N. Classification of caesarean section and normal vaginal deliveries using foetal heart rate signals and advanced machine learning algorithms. Biomed Eng Online 2017; 16:89. [PMID: 28679415 PMCID: PMC5498914 DOI: 10.1186/s12938-017-0378-z] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2017] [Accepted: 06/26/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Visual inspection of cardiotocography traces by obstetricians and midwives is the gold standard for monitoring the wellbeing of the foetus during antenatal care. However, inter- and intra-observer variability is high with only a 30% positive predictive value for the classification of pathological outcomes. This has a significant negative impact on the perinatal foetus and often results in cardio-pulmonary arrest, brain and vital organ damage, cerebral palsy, hearing, visual and cognitive defects and in severe cases, death. This paper shows that using machine learning and foetal heart rate signals provides direct information about the foetal state and helps to filter the subjective opinions of medical practitioners when used as a decision support tool. The primary aim is to provide a proof-of-concept that demonstrates how machine learning can be used to objectively determine when medical intervention, such as caesarean section, is required and help avoid preventable perinatal deaths. METHODS This is evidenced using an open dataset that comprises 506 controls (normal virginal deliveries) and 46 cases (caesarean due to pH ≤ 7.20-acidosis, n = 18; pH > 7.20 and pH < 7.25-foetal deterioration, n = 4; or clinical decision without evidence of pathological outcome measures, n = 24). Several machine-learning algorithms are trained, and validated, using binary classifier performance measures. RESULTS The findings show that deep learning classification achieves sensitivity = 94%, specificity = 91%, Area under the curve = 99%, F-score = 100%, and mean square error = 1%. CONCLUSIONS The results demonstrate that machine learning significantly improves the efficiency for the detection of caesarean section and normal vaginal deliveries using foetal heart rate signals compared with obstetrician and midwife predictions and systems reported in previous studies.
Collapse
Affiliation(s)
- Paul Fergus
- Applied Computing Research Group, Department of Computer Science, Faculty of Engineering and Technology, Liverpool John Moors University, Byron Street, Liverpool, L3 3AF, UK.
| | - Abir Hussain
- Applied Computing Research Group, Department of Computer Science, Faculty of Engineering and Technology, Liverpool John Moors University, Byron Street, Liverpool, L3 3AF, UK
| | - Dhiya Al-Jumeily
- Applied Computing Research Group, Department of Computer Science, Faculty of Engineering and Technology, Liverpool John Moors University, Byron Street, Liverpool, L3 3AF, UK
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, Tongji University, No. 4800 Caoan Road, Shanghai, 201804, China
| | - Nizar Bouguila
- Concordia Institute for Information Systems Engineering, Concorida University, 1455 de Maisonneuve Blvd West, EV7.632, Montreal, QC, HJ3G 2W1, Canada
| |
Collapse
|
26
|
Affiliation(s)
- Rok Blagus
- Univerza v Ljubljani Medicinska Fakulteta, Institute for Biostatistics and Medical Informatics, Leiden, The Netherlands
| | - Jelle J Goeman
- Leiden University Medical Center, Department of Medical Statistics and Bioinformatics, Leiden, The Netherlands
| |
Collapse
|
27
|
Blagus R, Lusa L. Joint use of over- and under-sampling techniques and cross-validation for the development and assessment of prediction models. BMC Bioinformatics 2015; 16:363. [PMID: 26537827 PMCID: PMC4634915 DOI: 10.1186/s12859-015-0784-9] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Accepted: 10/17/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Prediction models are used in clinical research to develop rules that can be used to accurately predict the outcome of the patients based on some of their characteristics. They represent a valuable tool in the decision making process of clinicians and health policy makers, as they enable them to estimate the probability that patients have or will develop a disease, will respond to a treatment, or that their disease will recur. The interest devoted to prediction models in the biomedical community has been growing in the last few years. Often the data used to develop the prediction models are class-imbalanced as only few patients experience the event (and therefore belong to minority class). RESULTS Prediction models developed using class-imbalanced data tend to achieve sub-optimal predictive accuracy in the minority class. This problem can be diminished by using sampling techniques aimed at balancing the class distribution. These techniques include under- and oversampling, where a fraction of the majority class samples are retained in the analysis or new samples from the minority class are generated. The correct assessment of how the prediction model is likely to perform on independent data is of crucial importance; in the absence of an independent data set, cross-validation is normally used. While the importance of correct cross-validation is well documented in the biomedical literature, the challenges posed by the joint use of sampling techniques and cross-validation have not been addressed. CONCLUSIONS We show that care must be taken to ensure that cross-validation is performed correctly on sampled data, and that the risk of overestimating the predictive accuracy is greater when oversampling techniques are used. Examples based on the re-analysis of real datasets and simulation studies are provided. We identify some results from the biomedical literature where the incorrect cross-validation was performed, where we expect that the performance of oversampling techniques was heavily overestimated.
Collapse
Affiliation(s)
- Rok Blagus
- Institute for Biostatistics and Medical Informatics, University of Ljubljana, Vrazov trg 2, Ljubljana, Slovenia.
| | - Lara Lusa
- Institute for Biostatistics and Medical Informatics, University of Ljubljana, Vrazov trg 2, Ljubljana, Slovenia.
| |
Collapse
|
28
|
|
29
|
Fergus P, Hignett D, Hussain A, Al-Jumeily D, Abdel-Aziz K. Automatic epileptic seizure detection using scalp EEG and advanced artificial intelligence techniques. Biomed Res Int 2015; 2015:986736. [PMID: 25710040 DOI: 10.1155/2015/986736] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2014] [Revised: 12/09/2014] [Accepted: 12/23/2014] [Indexed: 11/17/2022]
Abstract
The epilepsies are a heterogeneous group of neurological disorders and syndromes characterised by recurrent, involuntary, paroxysmal seizure activity, which is often associated with a clinicoelectrical correlate on the electroencephalogram. The diagnosis of epilepsy is usually made by a neurologist but can be difficult to be made in the early stages. Supporting paraclinical evidence obtained from magnetic resonance imaging and electroencephalography may enable clinicians to make a diagnosis of epilepsy and investigate treatment earlier. However, electroencephalogram capture and interpretation are time consuming and can be expensive due to the need for trained specialists to perform the interpretation. Automated detection of correlates of seizure activity may be a solution. In this paper, we present a supervised machine learning approach that classifies seizure and nonseizure records using an open dataset containing 342 records. Our results show an improvement on existing studies by as much as 10% in most cases with a sensitivity of 93%, specificity of 94%, and area under the curve of 98% with a 6% global error using a k-class nearest neighbour classifier. We propose that such an approach could have clinical applications in the investigation of patients with suspected seizure disorders.
Collapse
|
30
|
Ramezankhani A, Pournik O, Shahrabi J, Azizi F, Hadaegh F, Khalili D. The Impact of Oversampling with SMOTE on the Performance of 3 Classifiers in Prediction of Type 2 Diabetes. Med Decis Making 2014; 36:137-44. [PMID: 25449060 DOI: 10.1177/0272989x14560647] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2013] [Accepted: 10/23/2014] [Indexed: 11/15/2022]
Abstract
OBJECTIVE To evaluate the impact of the synthetic minority oversampling technique (SMOTE) on the performance of probabilistic neural network (PNN), naïve Bayes (NB), and decision tree (DT) classifiers for predicting diabetes in a prospective cohort of the Tehran Lipid and Glucose Study (TLGS). METHODS . Data of the 6647 nondiabetic participants, aged 20 years or older with more than 10 years of follow-up, were used to develop prediction models based on 21 common risk factors. The minority class in the training dataset was oversampled using the SMOTE technique, at 100%, 200%, 300%, 400%, 500%, 600%, and 700% of its original size. The original and the oversampled training datasets were used to establish the classification models. Accuracy, sensitivity, specificity, precision, F-measure, and Youden's index were used to evaluated the performance of classifiers in the test dataset. To compare the performance of the 3 classification models, we used the ROC convex hull (ROCCH). RESULTS Oversampling the minority class at 700% (completely balanced) increased the sensitivity of the PNN, DT, and NB by 64%, 51%, and 5%, respectively, but decreased the accuracy and specificity of the 3 classification methods. NB had the best Youden's index before and after oversampling. The ROCCH showed that PNN is suboptimal for any class and cost conditions. CONCLUSIONS To determine a classifier with a machine learning algorithm like the PNN and DT, class skew in data should be considered. The NB and DT were optimal classifiers in a prediction task in an imbalanced medical database.
Collapse
Affiliation(s)
- Azra Ramezankhani
- Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran (AR, FH, DK)
| | - Omid Pournik
- Department of Community Medicine, School of Medicine, Iran University of Medical Sciences, Tehran, Iran (OP)
| | - Jamal Shahrabi
- Industrial Engineering Department, Amirkabir University of Technology, Tehran, Iran (JS)
| | - Fereidoun Azizi
- Endocrine Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran (FA)
| | - Farzad Hadaegh
- Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran (AR, FH, DK)
| | - Davood Khalili
- Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran (AR, FH, DK),Department of Epidemiology, School of Public Health, Shahid Beheshti University of Medical Sciences, Tehran, Iran (DK)
| |
Collapse
|
31
|
Breathett K, Muhlestein D, Foraker R, Gulati M. Differences in Preeclampsia Rates Between African American and Caucasian Women: Trends from the National Hospital Discharge Survey. J Womens Health (Larchmt) 2014; 23:886-93. [DOI: 10.1089/jwh.2014.4749] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- Khadijah Breathett
- Department of Internal Medicine, Division of Cardiology, The Ohio State University Wexner Medical Center, Columbus, Ohio
| | | | - Randi Foraker
- College of Public Health, Division of Epidemiology, The Ohio State University, Columbus, Ohio
| | - Martha Gulati
- Department of Internal Medicine, Division of Cardiology, The Ohio State University Wexner Medical Center, Columbus, Ohio
- College of Public Health, Division of Epidemiology, The Ohio State University, Columbus, Ohio
| |
Collapse
|
32
|
Lee PH. Resampling methods improve the predictive power of modeling in class-imbalanced datasets. Int J Environ Res Public Health 2014; 11:9776-89. [PMID: 25238271 PMCID: PMC4199049 DOI: 10.3390/ijerph110909776] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2014] [Revised: 09/04/2014] [Accepted: 09/12/2014] [Indexed: 11/20/2022]
Abstract
In the medical field, many outcome variables are dichotomized, and the two possible values of a dichotomized variable are referred to as classes. A dichotomized dataset is class-imbalanced if it consists mostly of one class, and performance of common classification models on this type of dataset tends to be suboptimal. To tackle such a problem, resampling methods, including oversampling and undersampling can be used. This paper aims at illustrating the effect of resampling methods using the National Health and Nutrition Examination Survey (NHANES) wave 2009–2010 dataset. A total of 4677 participants aged ≥20 without self-reported diabetes and with valid blood test results were analyzed. The Classification and Regression Tree (CART) procedure was used to build a classification model on undiagnosed diabetes. A participant demonstrated evidence of diabetes according to WHO diabetes criteria. Exposure variables included demographics and socio-economic status. CART models were fitted using a randomly selected 70% of the data (training dataset), and area under the receiver operating characteristic curve (AUC) was computed using the remaining 30% of the sample for evaluation (testing dataset). CART models were fitted using the training dataset, the oversampled training dataset, the weighted training dataset, and the undersampled training dataset. In addition, resampling case-to-control ratio of 1:1, 1:2, and 1:4 were examined. Resampling methods on the performance of other extensions of CART (random forests and generalized boosted trees) were also examined. CARTs fitted on the oversampled (AUC = 0.70) and undersampled training data (AUC = 0.74) yielded a better classification power than that on the training data (AUC = 0.65). Resampling could also improve the classification power of random forests and generalized boosted trees. To conclude, applying resampling methods in a class-imbalanced dataset improved the classification power of CART, random forests, and generalized boosted trees.
Collapse
Affiliation(s)
- Paul H Lee
- School of Nursing, Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong.
| |
Collapse
|
33
|
Fergus P, Cheung P, Hussain A, Al-Jumeily D, Dobbins C, Iram S. Prediction of preterm deliveries from EHG signals using machine learning. PLoS One 2013; 8:e77154. [PMID: 24204760 PMCID: PMC3810473 DOI: 10.1371/journal.pone.0077154] [Citation(s) in RCA: 109] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2013] [Accepted: 08/30/2013] [Indexed: 12/16/2022] Open
Abstract
There has been some improvement in the treatment of preterm infants, which has helped to increase their chance of survival. However, the rate of premature births is still globally increasing. As a result, this group of infants are most at risk of developing severe medical conditions that can affect the respiratory, gastrointestinal, immune, central nervous, auditory and visual systems. In extreme cases, this can also lead to long-term conditions, such as cerebral palsy, mental retardation, learning difficulties, including poor health and growth. In the US alone, the societal and economic cost of preterm births, in 2005, was estimated to be $26.2 billion, per annum. In the UK, this value was close to £2.95 billion, in 2009. Many believe that a better understanding of why preterm births occur, and a strategic focus on prevention, will help to improve the health of children and reduce healthcare costs. At present, most methods of preterm birth prediction are subjective. However, a strong body of evidence suggests the analysis of uterine electrical signals (Electrohysterography), could provide a viable way of diagnosing true labour and predict preterm deliveries. Most Electrohysterography studies focus on true labour detection during the final seven days, before labour. The challenge is to utilise Electrohysterography techniques to predict preterm delivery earlier in the pregnancy. This paper explores this idea further and presents a supervised machine learning approach that classifies term and preterm records, using an open source dataset containing 300 records (38 preterm and 262 term). The synthetic minority oversampling technique is used to oversample the minority preterm class, and cross validation techniques, are used to evaluate the dataset against other similar studies. Our approach shows an improvement on existing studies with 96% sensitivity, 90% specificity, and a 95% area under the curve value with 8% global error using the polynomial classifier.
Collapse
Affiliation(s)
- Paul Fergus
- Applied Computing Research Group, Liverpool John Moores University, Liverpool, Merseyside, United Kingdom
| | - Pauline Cheung
- Applied Computing Research Group, Liverpool John Moores University, Liverpool, Merseyside, United Kingdom
| | - Abir Hussain
- Applied Computing Research Group, Liverpool John Moores University, Liverpool, Merseyside, United Kingdom
| | - Dhiya Al-Jumeily
- Applied Computing Research Group, Liverpool John Moores University, Liverpool, Merseyside, United Kingdom
| | - Chelsea Dobbins
- Applied Computing Research Group, Liverpool John Moores University, Liverpool, Merseyside, United Kingdom
| | - Shamaila Iram
- Applied Computing Research Group, Liverpool John Moores University, Liverpool, Merseyside, United Kingdom
| |
Collapse
|
34
|
Afzal Z, Schuemie MJ, van Blijderveen JC, Sen EF, Sturkenboom MCJM, Kors JA. Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records. BMC Med Inform Decis Mak 2013; 13:30. [PMID: 23452306 PMCID: PMC3602667 DOI: 10.1186/1472-6947-13-30] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2012] [Accepted: 02/27/2013] [Indexed: 01/18/2023] Open
Abstract
Background Distinguishing cases from non-cases in free-text electronic medical records is an important initial step in observational epidemiological studies, but manual record validation is time-consuming and cumbersome. We compared different approaches to develop an automatic case identification system with high sensitivity to assist manual annotators. Methods We used four different machine-learning algorithms to build case identification systems for two data sets, one comprising hepatobiliary disease patients, the other acute renal failure patients. To improve the sensitivity of the systems, we varied the imbalance ratio between positive cases and negative cases using under- and over-sampling techniques, and applied cost-sensitive learning with various misclassification costs. Results For the hepatobiliary data set, we obtained a high sensitivity of 0.95 (on a par with manual annotators, as compared to 0.91 for a baseline classifier) with specificity 0.56. For the acute renal failure data set, sensitivity increased from 0.69 to 0.89, with specificity 0.59. Performance differences between the various machine-learning algorithms were not large. Classifiers performed best when trained on data sets with imbalance ratio below 10. Conclusions We were able to achieve high sensitivity with moderate specificity for automatic case identification on two data sets of electronic medical records. Such a high-sensitive case identification system can be used as a pre-filter to significantly reduce the burden of manual record validation.
Collapse
Affiliation(s)
- Zubair Afzal
- Department of Medical Informatics, Erasmus Medical Center, P,O, Box 2040, Rotterdam 3000CA, Netherlands.
| | | | | | | | | | | |
Collapse
|
35
|
Yajuan Wang, Simon M, Bonde P, Harris BU, Teuteberg JJ, Kormos RL, Antaki JF. Prognosis of Right Ventricular Failure in Patients With Left Ventricular Assist Device Based on Decision Tree With SMOTE. ACTA ACUST UNITED AC 2012; 16:383-90. [DOI: 10.1109/titb.2012.2187458] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
36
|
|
37
|
Wang Y, Simon MA, Bonde P, Harris BU, Teuteberg JJ, Kormos RL, Antaki JF. Decision tree for adjuvant right ventricular support in patients receiving a left ventricular assist device. J Heart Lung Transplant 2011; 31:140-9. [PMID: 22168963 DOI: 10.1016/j.healun.2011.11.003] [Citation(s) in RCA: 83] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2011] [Revised: 10/12/2011] [Accepted: 11/07/2011] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Right ventricular (RV) failure is a significant complication after implantation of a left ventricular assist device (LVAD). It is therefore important to identify patients at risk a priori. However, prognostic models derived from multivariate analyses have had limited predictive power. METHODS This study retrospectively analyzed the records of 183 LVAD recipients between May 1996 and October 2009; of these, 27 later required a RVAD (RVAD(+)) and 156 remained on LVAD only (RVAD(-)) until transplant or death. A decision tree model was constructed to represent combinatorial non-linear relationships of the pre-operative data that are predictive of the need for RVAD support. RESULTS An optimal set of 8 pre-operative variables were identified: transpulmonary gradient, age, right atrial pressure, international normalized ratio, heart rate, white blood cell count, alanine aminotransferase, and the number of inotropic agents. The resultant decision tree, which consisted of 28 branches and 15 leaves, identified RVAD(+) patients with 85% sensitivity, RVAD(-) patients with 83% specificity, and exhibited an area under the receiver operating characteristic curve of 0.87. CONCLUSIONS The decision tree model developed in this study exhibited several advantages compared with existing risk scores. Quantitatively, it provided improved prognosis of RV support by encoding the non-linear, synergic interactions among pre-operative variables. Because of its intuitive structure, it more closely mimics clinical reasoning and therefore can be more readily interpreted. Further development with additional multicenter, longitudinal data may provide a valuable prognostic tool for triage of LVAD therapy and, potentially, improve outcomes.
Collapse
Affiliation(s)
- Yajuan Wang
- Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15219, USA
| | | | | | | | | | | | | |
Collapse
|
38
|
Current awareness: Pharmacoepidemiology and drug safety. Pharmacoepidemiol Drug Saf 2009; 18:i-x. [DOI: 10.1002/pds.1653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|