1
|
Monthatip K, Boonnag C, Muangmool T, Charoenkwan K. A machine learning-based prediction model of pelvic lymph node metastasis in women with early-stage cervical cancer. J Gynecol Oncol 2024; 35:e17. [PMID: 37921601 PMCID: PMC10948976 DOI: 10.3802/jgo.2024.35.e17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 09/03/2023] [Accepted: 10/03/2023] [Indexed: 11/04/2023] Open
Abstract
OBJECTIVE To develop a novel machine learning-based preoperative prediction model for pelvic lymph node metastasis (PLNM) in early-stage cervical cancer by combining the clinical findings and preoperative computerized tomography (CT) of the whole abdomen and pelvis. METHODS Patients diagnosed with International Federation of Gynecology and Obstetrics stage IA2-IIA1 squamous cell carcinoma, adenocarcinoma, and adenosquamous carcinoma of the cervix who had primary radical surgery with bilateral pelvic lymphadenectomy from January 1, 2003 to December 31, 2020, were included. Seven supervised machine learning algorithms, including logistic regression, random forest, support vector machine, adaptive boosting, gradient boosting, extreme gradient boosting, and category boosting, were used to evaluate the risk of PLNM. RESULTS PLNM was found in 199 (23.9%) of 832 patients included. Younger age, larger tumor size, higher stage, no prior conization, tumor appearance, adenosquamous histology, and vaginal metastasis as well as the CT findings of larger tumor size, parametrial metastasis, pelvic lymph node enlargement, and vaginal metastasis, were significantly associated with PLNM. The models' predictive performance, including accuracy (89.1%-90.6%), area under the receiver operating characteristics curve (86.9%-91.0%), sensitivity (77.4%-82.4%), specificity (92.1%-94.3%), positive predictive value (77.0%-81.7%), and negative predictive value (93.0%-94.4%), appeared satisfactory and comparable among all the algorithms. After optimizing the model's decision threshold to enhance the sensitivity to at least 95%, the 'highly sensitive' model was obtained with a 2.5%-4.4% false-negative rate of PLNM prediction. CONCLUSION We developed prediction models for PLNM in early-stage cervical cancer with promising prediction performance in our setting. Further external validation in other populations is needed with potential clinical applications.
Collapse
Affiliation(s)
- Kamonrat Monthatip
- Department of Obstetrics and Gynecology, Faculty of Medicine, Chiang Mai University, Chiang Mai, Thailand
| | - Chiraphat Boonnag
- Biomedical Informatics Center, Faculty of Medicine, Chiang Mai University, Chiang Mai, Thailand
| | - Tanarat Muangmool
- Department of Obstetrics and Gynecology, Faculty of Medicine, Chiang Mai University, Chiang Mai, Thailand
| | - Kittipat Charoenkwan
- Department of Obstetrics and Gynecology, Faculty of Medicine, Chiang Mai University, Chiang Mai, Thailand.
| |
Collapse
|
2
|
Teza H, Pattanateepapon A, Lertpimonchai A, Vathesatogkit P, J McKay G, Attia J, Thakkinstian A. Development of Risk Prediction Models for Severe Periodontitis in a Thai Population: Statistical and Machine Learning Approaches. JMIR Form Res 2023; 7:e48351. [PMID: 38096008 PMCID: PMC10755655 DOI: 10.2196/48351] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 10/31/2023] [Accepted: 11/01/2023] [Indexed: 12/31/2023] Open
Abstract
BACKGROUND Severe periodontitis affects 26% of Thai adults and 11.2% of adults globally and is characterized by the loss of alveolar bone height. Full-mouth examination by periodontal probing is the gold standard for diagnosis but is time- and resource-intensive. A screening model to identify those at high risk of severe periodontitis would offer a targeted approach and aid in reducing the workload for dentists. While statistical modelling by a logistic regression is commonly applied, optimal performance depends on feature selections and engineering. Machine learning has been recently gaining favor given its potential discriminatory power and ability to deal with multiway interactions without the requirements of linear assumptions. OBJECTIVE We aim to compare the performance of screening models developed using statistical and machine learning approaches for the risk prediction of severe periodontitis. METHODS This study used data from the prospective Electricity Generating Authority of Thailand cohort. Dental examinations were performed for the 2008 and 2013 surveys. Oral examinations (ie, number of teeth and oral hygiene index and plaque scores), periodontal pocket depth, and gingival recession were performed by dentists. The outcome of interest was severe periodontitis diagnosed by the Centre for Disease Control-American Academy of Periodontology, defined as 2 or more interproximal sites with a clinical attachment level ≥6 mm (on different teeth) and 1 or more interproximal sites with a periodontal pocket depth ≥5 mm. Risk prediction models were developed using mixed-effects logistic regression (MELR), recurrent neural network, mixed-effects support vector machine, and mixed-effects decision tree models. A total of 21 features were considered as predictive features, including 4 demographic characteristics, 2 physical examinations, 4 underlying diseases, 1 medication, 2 risk behaviors, 2 oral features, and 6 laboratory features. RESULTS A total of 3883 observations from 2086 participants were split into development (n=3112, 80.1%) and validation (n=771, 19.9%) sets with prevalences of periodontitis of 34.4% (n=1070) and 34.1% (n=263), respectively. The final MELR model contained 6 features (gender, education, smoking, diabetes mellitus, number of teeth, and plaque score) with an area under the curve (AUC) of 0.983 (95% CI 0.977-0.989) and positive likelihood ratio (LR+) of 11.9 (95% CI 8.8-16.3). Machine learning yielded lower performance than the MELR model, with AUC (95% CI) and LR+ (95% CI) values of 0.712 (0.669-0.754) and 2.1 (1.8-2.6), respectively, for the recurrent neural network model; 0.698 (0.681-0.734) and 2.1 (1.7-2.6), respectively, for the mixed-effects support vector machine model; and 0.662 (0.621-0.702) and 2.4 (1.9-3.0), respectively, for the mixed-effects decision tree model. CONCLUSIONS The MELR model might be more useful than machine learning for large-scale screening to identify those at high risk of severe periodontitis for periodontal evaluation. External validation using data from other centers is required to evaluate the generalizability of the model.
Collapse
Affiliation(s)
- Htun Teza
- Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| | - Anuchate Pattanateepapon
- Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| | - Attawood Lertpimonchai
- Department of Periodontology, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand
- Center of Excellence in Periodontal Disease and Dental Implant, Chulalongkorn University, Bangkok, Thailand
| | - Prin Vathesatogkit
- Department of Medicine, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| | - Gareth J McKay
- Centre for Public Health, School of Medicine, Dentistry and Biomedical Sciences, Queen's University Belfast, Belfast, United Kingdom
| | - John Attia
- School of Medicine and Public Health, Hunter Medical Research Institute, University of Newcastle, New Lambton, NSW, Australia
| | - Ammarin Thakkinstian
- Department of Clinical Epidemiology and Biostatistics, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| |
Collapse
|
3
|
Roschewitz M, Khara G, Yearsley J, Sharma N, James JJ, Ambrózay É, Heroux A, Kecskemethy P, Rijken T, Glocker B. Automatic correction of performance drift under acquisition shift in medical image classification. Nat Commun 2023; 14:6608. [PMID: 37857643 PMCID: PMC10587231 DOI: 10.1038/s41467-023-42396-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 10/10/2023] [Indexed: 10/21/2023] Open
Abstract
Image-based prediction models for disease detection are sensitive to changes in data acquisition such as the replacement of scanner hardware or updates to the image processing software. The resulting differences in image characteristics may lead to drifts in clinically relevant performance metrics which could cause harm in clinical decision making, even for models that generalise in terms of area under the receiver-operating characteristic curve. We propose Unsupervised Prediction Alignment, a generic automatic recalibration method that requires no ground truth annotations and only limited amounts of unlabelled example images from the shifted data distribution. We illustrate the effectiveness of the proposed method to detect and correct performance drift in mammography-based breast cancer screening and on publicly available histopathology data. We show that the proposed method can preserve the expected performance in terms of sensitivity/specificity under various realistic scenarios of image acquisition shift, thus offering an important safeguard for clinical deployment.
Collapse
Affiliation(s)
- Mélanie Roschewitz
- Kheiron Medical Technologies, London, UK.
- Imperial College London, Department of Computing, London, UK.
| | | | | | - Nisha Sharma
- Leeds Teaching Hospital NHS Trust, Department of Radiology, Leeds, UK
| | - Jonathan J James
- Nottingham University Hospitals NHS Trust, Nottingham City Hospital, Nottingham Breast Institute, Nottingham, UK
| | | | | | | | | | - Ben Glocker
- Kheiron Medical Technologies, London, UK.
- Imperial College London, Department of Computing, London, UK.
| |
Collapse
|
4
|
Gantenbein J, Ahmadizadeh C, Heeb O, Lambercy O, Menon C. Feasibility of force myography for the direct control of an assistive robotic hand orthosis in non-impaired individuals. J Neuroeng Rehabil 2023; 20:101. [PMID: 37537602 PMCID: PMC10399035 DOI: 10.1186/s12984-023-01222-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 07/21/2023] [Indexed: 08/05/2023] Open
Abstract
BACKGROUND Assistive robotic hand orthoses can support people with sensorimotor hand impairment in many activities of daily living and therefore help to regain independence. However, in order for the users to fully benefit from the functionalities of such devices, a safe and reliable way to detect their movement intention for device control is crucial. Gesture recognition based on force myography measuring volumetric changes in the muscles during contraction has been previously shown to be a viable and easy to implement strategy to control hand prostheses. Whether this approach could be efficiently applied to intuitively control an assistive robotic hand orthosis remains to be investigated. METHODS In this work, we assessed the feasibility of using force myography measured from the forearm to control a robotic hand orthosis worn on the hand ipsilateral to the measurement site. In ten neurologically-intact participants wearing a robotic hand orthosis, we collected data for four gestures trained in nine arm configurations, i.e., seven static positions and two dynamic movements, corresponding to typical activities of daily living conditions. In an offline analysis, we determined classification accuracies for two binary classifiers (one for opening and one for closing) and further assessed the impact of individual training arm configurations on the overall performance. RESULTS We achieved an overall classification accuracy of 92.9% (averaged over two binary classifiers, individual accuracies 95.5% and 90.3%, respectively) but found a large variation in performance between participants, ranging from 75.4 up to 100%. Averaged inference times per sample were measured below 0.15 ms. Further, we found that the number of training arm configurations could be reduced from nine to six without notably decreasing classification performance. CONCLUSION The results of this work support the general feasibility of using force myography as an intuitive intention detection strategy for a robotic hand orthosis. Further, the findings also generated valuable insights into challenges and potential ways to overcome them in view of applying such technologies for assisting people with sensorimotor hand impairment during activities of daily living.
Collapse
Affiliation(s)
- Jessica Gantenbein
- Rehabilitation Engineering Laboratory, Department of Health Sciences and Technology, ETH Zurich, Lengghalde 5, 8008, Zurich, Switzerland
| | - Chakaveh Ahmadizadeh
- Biomedical and Mobile Health Technology Lab, Department of Health Sciences and Technology, ETH Zurich, Lengghalde 5, 8008, Zurich, Switzerland
| | - Oliver Heeb
- Rehabilitation Engineering Laboratory, Department of Health Sciences and Technology, ETH Zurich, Lengghalde 5, 8008, Zurich, Switzerland
- Biomedical and Mobile Health Technology Lab, Department of Health Sciences and Technology, ETH Zurich, Lengghalde 5, 8008, Zurich, Switzerland
| | - Olivier Lambercy
- Rehabilitation Engineering Laboratory, Department of Health Sciences and Technology, ETH Zurich, Lengghalde 5, 8008, Zurich, Switzerland
- Future Health Technologies, Singapore-ETH Centre, Campus for Research Excellence And Technological Enterprise (CREATE), 1 Create Way, Singapore, 138602, Singapore
| | - Carlo Menon
- Biomedical and Mobile Health Technology Lab, Department of Health Sciences and Technology, ETH Zurich, Lengghalde 5, 8008, Zurich, Switzerland.
| |
Collapse
|
5
|
Stolbov LA, Filimonov DA, Poroikov VV. SAR based on self consistent classifier. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2022; 33:793-804. [PMID: 36369710 DOI: 10.1080/1062936x.2022.2139751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 10/20/2022] [Indexed: 06/16/2023]
Abstract
The accuracy and performance of (Q)SAR models depend significantly on the data used for training. Datasets prepared on the basis of publicly available databases contain structures belonging to different chemical classes and have a highly imbalanced actives/inactives ratio. Currently, hundreds of structural descriptors are used in (Q)SAR studies. The abundance of structural descriptors gives rise to the problem of the constructed (Q)SAR models stability. The methods frequently used for the selection of a small fraction of the 'best' descriptors usually do not have sufficient mathematical justification. We propose a new approach to a self-consistent classifier for SAR analysis in order to overcome these problems. Logistic (SCLC) and extreme (SCEC) extensions of self-consistent regression (SCR) were implemented to enhance the classification capabilities of SCR. The approach was applied to classification models' development for inhibiting activity endpoints in HIV-1-related data and toxicity endpoints with subsequent fivefold cross-validation to estimate the models' performance. Comparison of the proposed SCLC and SCEC models with those developed using the original SCR and support vector machine demonstrated the comparable accuracy. Advantages in feature selection using our approach provide more generalizable (Q)SAR models. In particular, the crucial factors responsible for the observed value are determined unambiguously.
Collapse
Affiliation(s)
- L A Stolbov
- Laboratory of Structure-Function Based Drug Design, Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow, Russian Federation
| | - D A Filimonov
- Laboratory of Structure-Function Based Drug Design, Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow, Russian Federation
| | - V V Poroikov
- Laboratory of Structure-Function Based Drug Design, Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow, Russian Federation
| |
Collapse
|
6
|
Threshold prediction for detecting rare positive samples using a meta-learner. Pattern Anal Appl 2022. [DOI: 10.1007/s10044-022-01103-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
7
|
Vijithananda SM, Jayatilake ML, Hewavithana B, Gonçalves T, Rato LM, Weerakoon BS, Kalupahana TD, Silva AD, Dissanayake KD. Feature extraction from MRI ADC images for brain tumor classification using machine learning techniques. Biomed Eng Online 2022; 21:52. [PMID: 35915448 PMCID: PMC9344709 DOI: 10.1186/s12938-022-01022-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 07/13/2022] [Indexed: 11/10/2022] Open
Abstract
Background Diffusion-weighted (DW) imaging is a well-recognized magnetic resonance imaging (MRI) technique that is being routinely used in brain examinations in modern clinical radiology practices. This study focuses on extracting demographic and texture features from MRI Apparent Diffusion Coefficient (ADC) images of human brain tumors, identifying the distribution patterns of each feature and applying Machine Learning (ML) techniques to differentiate malignant from benign brain tumors. Methods This prospective study was carried out using 1599 labeled MRI brain ADC image slices, 995 malignant, 604 benign from 195 patients who were radiologically diagnosed and histopathologically confirmed as brain tumor patients. The demographics, mean pixel values, skewness, kurtosis, features of Grey Level Co-occurrence Matrix (GLCM), mean, variance, energy, entropy, contrast, homogeneity, correlation, prominence and shade, were extracted from MRI ADC images of each patient. At the feature selection phase, the validity of the extracted features were measured using ANOVA f-test. Then, these features were used as input to several Machine Learning classification algorithms and the respective models were assessed. Results According to the results of ANOVA f-test feature selection process, two attributes: skewness (3.34) and GLCM homogeneity (3.45) scored the lowest ANOVA f-test scores. Therefore, both features were excluded in continuation of the experiment. From the different tested ML algorithms, the Random Forest classifier was chosen to build the final ML model, since it presented the highest accuracy. The final model was able to predict malignant and benign neoplasms with an 90.41% accuracy after the hyper parameter tuning process. Conclusions This study concludes that the above mentioned features (except skewness and GLCM homogeneity) are informative to identify and differentiate malignant from benign brain tumors. Moreover, they enable the development of a high-performance ML model that has the ability to assist in the decision-making steps of brain tumor diagnosis process, prior to attempting invasive diagnostic procedures, such as brain biopsies.
Collapse
Affiliation(s)
- Sahan M Vijithananda
- Department of Radiology, Faculty of Medicine, University of Peradeniya, Peradeniya, Sri Lanka
| | - Mohan L Jayatilake
- Department of Radiography and Radiotherapy, University of Peradeniya, Peradeniya, Sri Lanka.
| | - Badra Hewavithana
- Department of Radiology, Faculty of Medicine, University of Peradeniya, Peradeniya, Sri Lanka
| | | | - Luis M Rato
- Department of Informatics, University of Évora, Évora, Portugal
| | - Bimali S Weerakoon
- Department of Radiography and Radiotherapy, University of Peradeniya, Peradeniya, Sri Lanka
| | - Tharindu D Kalupahana
- Department of Computer Engineering, University of Sri Jayawardhanapura, Dehiwala-Mount Lavinia, Sri Lanka
| | - Anil D Silva
- Epilepsy Unit, National Hospital of Sri Lanka, Colombo 10, Sri Lanka
| | - Karuna D Dissanayake
- Department of Histopathology, National Hospital of Sri Lanka, Colombo 10, Sri Lanka
| |
Collapse
|
8
|
Huynh T, Nibali A, He Z. Semi-supervised learning for medical image classification using imbalanced training data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 216:106628. [PMID: 35101700 DOI: 10.1016/j.cmpb.2022.106628] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 12/20/2021] [Accepted: 01/07/2022] [Indexed: 06/14/2023]
Abstract
BACKGROUND AND OBJECTIVE Medical image classification is often challenging for two reasons: a lack of labelled examples due to expensive and time-consuming annotation protocols, and imbalanced class labels due to the relative scarcity of disease-positive individuals in the wider population. Semi-supervised learning methods exist for dealing with a lack of labels, but they generally do not address the problem of class imbalance. Hence, the purpose of this study is to explore a new approach to perturbation-based semi-supervised learning which tackles the problem of applying semi-supervised learning to medical image classification with imbalanced training data. METHODS In this study we propose Adaptive Blended Consistency Loss (ABCL), a simple yet effective drop-in replacement for consistency loss in perturbation-based semi-supervised learning methods. ABCL counteracts data skew by adaptively mixing the target class distribution of the consistency loss in accordance with class frequency. Our proposed method is evaluated and compared with existing methods on two different imbalanced medical image classification datasets. An ablation study is also provided to analyse the properties and effectiveness of our proposed method. RESULTS Our experiments with ABCL reveal improvements to unweighted average recall (UAR) when compared with existing consistency losses that are not designed to counteract class imbalance and other existing methods. Our proposed ABCL method is able to improve the performance of the baseline consistency loss approach from 0.59 to 0.67 UAR and outperforms methods that address the class imbalance problem for labelled data (between 0.51 and 0.59 UAR) and for unlabelled data (0.61 UAR) on the imbalanced skin cancer dataset. On the imbalanced retinal fundus glaucoma dataset, ABCL (combined with Weighted Cross Entropy loss) achieves 0.67 UAR, which is an improvement over the best existing approach (0.57 UAR). CONCLUSIONS Overall the results show the effectiveness of ABCL to alleviate the class imbalance problem for semi-supervised classification for medical images.
Collapse
Affiliation(s)
- Tri Huynh
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Australia.
| | - Aiden Nibali
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Australia
| | - Zhen He
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Australia
| |
Collapse
|
9
|
Zimmerman J, Soler RE, Lavinder J, Murphy S, Atkins C, Hulbert L, Lusk R, Ng BP. Iterative guided machine learning-assisted systematic literature reviews: a diabetes case study. Syst Rev 2021; 10:97. [PMID: 33810798 PMCID: PMC8017891 DOI: 10.1186/s13643-021-01640-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Accepted: 03/19/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Systematic Reviews (SR), studies of studies, use a formal process to evaluate the quality of scientific literature and determine ensuing effectiveness from qualifying articles to establish consensus findings around a hypothesis. Their value is increasing as the conduct and publication of research and evaluation has expanded and the process of identifying key insights becomes more time consuming. Text analytics and machine learning (ML) techniques may help overcome this problem of scale while still maintaining the level of rigor expected of SRs. METHODS In this article, we discuss an approach that uses existing examples of SRs to build and test a method for assisting the SR title and abstract pre-screening by reducing the initial pool of potential articles down to articles that meet inclusion criteria. Our approach differs from previous approaches to using ML as a SR tool in that it incorporates ML configurations guided by previously conducted SRs, and human confirmation on ML predictions of relevant articles during multiple iterative reviews on smaller tranches of citations. We applied the tailored method to a new SR review effort to validate performance. RESULTS The case study test of the approach proved a sensitivity (recall) in finding relevant articles during down selection that may rival many traditional processes and show ability to overcome most type II errors. The study achieved a sensitivity of 99.5% (213 out of 214) of total relevant articles while only conducting a human review of 31% of total articles available for review. CONCLUSIONS We believe this iterative method can help overcome bias in initial ML model training by having humans reinforce ML models with new and relevant information, and is an applied step towards transfer learning for ML in SR.
Collapse
Affiliation(s)
- John Zimmerman
- Deloitte Consulting, LLP, 191 Peachtree Street, Atlanta, GA, 30303, USA.
| | - Robin E Soler
- Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Division of Diabetes Translation, 1600 Clifton Rd, Atlanta, GA, USA
| | - James Lavinder
- Deloitte Consulting, LLP, 191 Peachtree Street, Atlanta, GA, 30303, USA
| | - Sarah Murphy
- Deloitte Consulting, LLP, 191 Peachtree Street, Atlanta, GA, 30303, USA
| | - Charisma Atkins
- Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Division of Diabetes Translation, 1600 Clifton Rd, Atlanta, GA, USA
| | - LaShonda Hulbert
- Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Division of Diabetes Translation, 1600 Clifton Rd, Atlanta, GA, USA
| | - Richard Lusk
- Deloitte Consulting, LLP, 191 Peachtree Street, Atlanta, GA, 30303, USA
| | - Boon Peng Ng
- Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Division of Diabetes Translation, 1600 Clifton Rd, Atlanta, GA, USA.,College of Nursing & Disability, Aging and Technology Cluster, University of Central Florida, 12201 Research Pkwy Suite 300, Orlando, FL, USA
| |
Collapse
|
10
|
Rácz A, Bajusz D, Héberger K. Effect of Dataset Size and Train/Test Split Ratios in QSAR/QSPR Multiclass Classification. Molecules 2021; 26:1111. [PMID: 33669834 PMCID: PMC7922354 DOI: 10.3390/molecules26041111] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 02/04/2021] [Accepted: 02/16/2021] [Indexed: 01/04/2023] Open
Abstract
Applied datasets can vary from a few hundred to thousands of samples in typical quantitative structure-activity/property (QSAR/QSPR) relationships and classification. However, the size of the datasets and the train/test split ratios can greatly affect the outcome of the models, and thus the classification performance itself. We compared several combinations of dataset sizes and split ratios with five different machine learning algorithms to find the differences or similarities and to select the best parameter settings in nonbinary (multiclass) classification. It is also known that the models are ranked differently according to the performance merit(s) used. Here, 25 performance parameters were calculated for each model, then factorial ANOVA was applied to compare the results. The results clearly show the differences not just between the applied machine learning algorithms but also between the dataset sizes and to a lesser extent the train/test split ratios. The XGBoost algorithm could outperform the others, even in multiclass modeling. The performance parameters reacted differently to the change of the sample set size; some of them were much more sensitive to this factor than the others. Moreover, significant differences could be detected between train/test split ratios as well, exerting a great effect on the test validation of our models.
Collapse
Affiliation(s)
- Anita Rácz
- Department of Plasma Chemistry, Institute of Materials and Environmental Chemistry, ELKH Research Centre for Natural Sciences, Magyar Tudósok krt. 2, H-1117 Budapest, Hungary;
| | - Dávid Bajusz
- Medicinal Chemistry Research Group, ELKH Research Centre for Natural Sciences, Magyar Tudósok krt. 2, H-1117 Budapest, Hungary;
| | - Károly Héberger
- Department of Plasma Chemistry, Institute of Materials and Environmental Chemistry, ELKH Research Centre for Natural Sciences, Magyar Tudósok krt. 2, H-1117 Budapest, Hungary;
| |
Collapse
|
11
|
Jing XY, Zhang X, Zhu X, Wu F, You X, Gao Y, Shan S, Yang JY. Multiset Feature Learning for Highly Imbalanced Data Classification. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:139-156. [PMID: 31331881 DOI: 10.1109/tpami.2019.2929166] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
With the expansion of data, increasing imbalanced data has emerged. When the imbalance ratio (IR) of data is high, most existing imbalanced learning methods decline seriously in classification performance. In this paper, we systematically investigate the highly imbalanced data classification problem, and propose an uncorrelated cost-sensitive multiset learning (UCML) approach for it. Specifically, UCML first constructs multiple balanced subsets through random partition, and then employs the multiset feature learning (MFL) to learn discriminant features from the constructed multiset. To enhance the usability of each subset and deal with the non-linearity issue existed in each subset, we further propose a deep metric based UCML (DM-UCML) approach. DM-UCML introduces the generative adversarial network technique into the multiset constructing process, such that each subset can own similar distribution with the original dataset. To cope with the non-linearity issue, DM-UCML integrates deep metric learning with MFL, such that more favorable performance can be achieved. In addition, DM-UCML designs a new discriminant term to enhance the discriminability of learned metrics. Experiments on eight traditional highly class-imbalanced datasets and two large-scale datasets indicate that: the proposed approaches outperform state-of-the-art highly imbalanced learning methods and are more robust to high IR.
Collapse
|
12
|
Raj A, Dehingia N, Singh A, McDougal L, McAuley J. Application of machine learning to understand child marriage in India. SSM Popul Health 2020; 12:100687. [PMID: 33335970 PMCID: PMC7732880 DOI: 10.1016/j.ssmph.2020.100687] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 10/29/2020] [Accepted: 10/30/2020] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Prior research documents that India has the greatest number of girls married as minors of any nation in the world, increasing social and health risks for both these young wives and their children. While the prevalence of child marriage has declined in the nation, more work is needed to accelerate this decline and the negative consequences of the practice. Expanded targets for intervention require greater identification of these targets. Machine learning can offer insight into identification of novel factors associated with child marriage that can serve as targets for intervention. METHODS We applied machine learning methods to retrospective cross-sectional survey data from India on demographics and health, the nationally-representative National Family Health Survey, conducted in 2015-16. We analyzed data using a traditional regression model, with child marriage as the dependent variable, and 4000+ variables from the survey as the independent variables. We also used three commonly used machine learning algorithms- Least Absolute Shrinkage and Selection Operator (lasso) or L-1 regularized logistic regression models; L2 regularized logistic regression or ridge models; and neural network models. Finally, we developed and applied a novel and rigorous approach involving expert qualitative review and coding of variables generated from an iterative series of regularized models to assess thematically key variable groupings associated with child marriage. FINDINGS Analyses revealed that regularized logistic and neural network applications demonstrated better accuracy and lower error rates than traditional logistic regression, with a greater number of features and variables generated. Regularized models highlight higher fertility and contraception, longer duration of marriage, geographic, and socioeconomic vulnerabilities as key correlates; findings shown in prior research. However, our novel method involving expert qualitative coding of variables generated from iterative regularized models and resultant thematic generation offered clarity on variables not focused upon in prior research, specifically non-utilization of health system benefits related to nutrition for mothers and infants. INTERPRETATION Machine learning appears to be a valid means of identifying key correlates of child marriage in India and, via our innovative iterative thematic approach, can be useful to identify novel variables associated with this outcome. Findings related to low nutritional service uptake also demonstrate the need for more focus on public health outreach for nutritional programs tailored to this population.
Collapse
Affiliation(s)
- Anita Raj
- Center on Gender Equity and Health, Department of Medicine, University of California San Diego, San Diego, CA, USA
- Department of Education Studies, Division of Social Sciences, University of California San Diego, San Diego, CA, USA
| | - Nabamallika Dehingia
- Center on Gender Equity and Health, Department of Medicine, University of California San Diego, San Diego, CA, USA
| | - Abhishek Singh
- International Institute of Population Sciences, Mumbai, India
| | - Lotus McDougal
- Center on Gender Equity and Health, Department of Medicine, University of California San Diego, San Diego, CA, USA
| | - Julian McAuley
- Department of Computer Science, School of Engineering, University of California San Diego, San Diego, CA, USA
| |
Collapse
|
13
|
Wang K, Zhou Z, Wang R, Chen L, Zhang Q, Sher D, Wang J. A multi‐objective radiomics model for the prediction of locoregional recurrence in head and neck squamous cell cancer. Med Phys 2020; 47:5392-5400. [DOI: 10.1002/mp.14388] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 05/11/2020] [Accepted: 07/02/2020] [Indexed: 02/05/2023] Open
Affiliation(s)
- Kai Wang
- Department of Radiation Oncology UT Southwestern Medical Center Dallas TX75390USA
| | - Zhiguo Zhou
- Department of Radiation Oncology UT Southwestern Medical Center Dallas TX75390USA
- School of Computer Science and Mathematics University of Central Missouri Warrensburg MO64093USA
| | - Rongfang Wang
- Department of Radiation Oncology UT Southwestern Medical Center Dallas TX75390USA
- School of Artificial Intelligence Xidian University Xi'an710071China
| | - Liyuan Chen
- Department of Radiation Oncology UT Southwestern Medical Center Dallas TX75390USA
| | - Qiongwen Zhang
- Department of Radiation Oncology UT Southwestern Medical Center Dallas TX75390USA
- State Key Laboratory of Biotherapy and Cancer Center Sichuan University and Collaborative Innovation Center Chengdu610041China
- Department of Head and Neck Cancer West China Hospital Chengdu610041China
| | - David Sher
- Department of Radiation Oncology UT Southwestern Medical Center Dallas TX75390USA
| | - Jing Wang
- Department of Radiation Oncology UT Southwestern Medical Center Dallas TX75390USA
| |
Collapse
|
14
|
Almilaji O, Smith C, Surgenor S, Clegg A, Williams E, Thomas P, Snook J. Refinement and validation of the IDIOM score for predicting the risk of gastrointestinal cancer in iron deficiency anaemia. BMJ Open Gastroenterol 2020; 7:e000403. [PMID: 32444424 PMCID: PMC7247388 DOI: 10.1136/bmjgast-2020-000403] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Revised: 03/30/2020] [Accepted: 04/08/2020] [Indexed: 01/27/2023] Open
Abstract
OBJECTIVE To refine and validate a model for predicting the risk of gastrointestinal (GI) cancer in iron deficiency anaemia (IDA) and to develop an app to facilitate use in clinical practice. DESIGN Three elements: (1) analysis of a dataset of 2390 cases of IDA to validate the predictive value of age, sex, blood haemoglobin concentration (Hb), mean cell volume (MCV) and iron studies on the probability of underlying GI cancer; (2) a pilot study of the benefit of adding faecal immunochemical testing (FIT) into the model; and (3) development of an app based on the model. RESULTS Age, sex and Hb were all strong, independent predictors of the risk of GI cancer, with ORs (95% CI) of 1.05 per year (1.03 to 1.07, p<0.00001), 2.86 for men (2.03 to 4.06, p<0.00001) and 1.03 for each g/L reduction in Hb (1.01 to 1.04, p<0.0001) respectively. An association with MCV was also revealed, with an OR of 1.03 for each fl reduction (1.01 to 1.05, p<0.02). The model was confirmed to be robust by an internal validation exercise. In the pilot study of high-risk cases, FIT was also predictive of GI cancer (OR 6.6, 95% CI 1.6 to 51.8), but the sensitivity was low at 23.5% (95% CI 6.8% to 49.9%). An app based on the model was developed. CONCLUSION This predictive model may help rationalise the use of investigational resources in IDA, by fast-tracking high-risk cases and, with appropriate safeguards, avoiding invasive investigation altogether in those at ultra-low predicted risk.
Collapse
Affiliation(s)
- Orouba Almilaji
- Department of Gastroenterology, Poole Hospital NHS Foundation Trust, Poole, UK
- Clinical Research Unit, Bournemouth University, Bournemouth, Dorset, UK
| | - Carla Smith
- Department of Gastroenterology, Poole Hospital NHS Foundation Trust, Poole, UK
| | - Sue Surgenor
- Department of Gastroenterology, Poole Hospital NHS Foundation Trust, Poole, UK
| | - Andrew Clegg
- Health Technology Assessment Group, University of Central Lancashire, Preston, Lancashire, UK
| | - Elizabeth Williams
- Department of Gastroenterology, Poole Hospital NHS Foundation Trust, Poole, UK
| | - Peter Thomas
- Clinical Research Unit, Bournemouth University, Bournemouth, Dorset, UK
| | - Jonathon Snook
- Department of Gastroenterology, Poole Hospital NHS Foundation Trust, Poole, UK
| |
Collapse
|
15
|
|
16
|
Féré M, Gobinet C, Liu LH, Beljebbar A, Untereiner V, Gheldof D, Chollat M, Klossa J, Chatelain B, Piot O. Implementation of a classification strategy of Raman data collected in different clinical conditions: application to the diagnosis of chronic lymphocytic leukemia. Anal Bioanal Chem 2019; 412:949-962. [PMID: 31853604 DOI: 10.1007/s00216-019-02321-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Revised: 10/31/2019] [Accepted: 12/03/2019] [Indexed: 02/06/2023]
Abstract
The literature is rich in proof of concept studies demonstrating the potential of Raman spectroscopy for disease diagnosis. However, few studies are conducted in a clinical context to demonstrate its applicability in current clinical practice and workflow. Indeed, this translational research remains far from the patient's bedside for several reasons. First, samples are often cultured cell lines. Second, they are prepared on non-standard substrates for clinical routine. Third, a unique supervised classification model is usually constructed using inadequate cross-validation strategy. Finally, the implemented models maximize classification accuracy without taking into account the clinician's needs. In this paper, we address these issues through a diagnosis problem in real clinical conditions, i.e., the diagnosis of chronic lymphocytic leukemia from fresh unstained blood smears spread on glass slides. From Raman data acquired in different experimental conditions, a repeated double cross-validation strategy was combined with different cross-validation approaches, a consensus label strategy and adaptive thresholds able to adapt to the clinician's needs. Combined with validation at the patient level, classification results were improved compared to traditional strategies.
Collapse
Affiliation(s)
- M Féré
- BioSpecT EA 7506, Faculty of Pharmacy, University of Reims Champagne-Ardenne, 51 rue Cognacq-Jay, 51096, Reims, France
| | - C Gobinet
- BioSpecT EA 7506, Faculty of Pharmacy, University of Reims Champagne-Ardenne, 51 rue Cognacq-Jay, 51096, Reims, France.
| | - L H Liu
- BioSpecT EA 7506, Faculty of Pharmacy, University of Reims Champagne-Ardenne, 51 rue Cognacq-Jay, 51096, Reims, France
| | - A Beljebbar
- BioSpecT EA 7506, Faculty of Pharmacy, University of Reims Champagne-Ardenne, 51 rue Cognacq-Jay, 51096, Reims, France
| | - V Untereiner
- Cellular and Tissular Imaging Platform PICT, Faculty of Pharmacy, University of Reims Champagne-Ardenne, 51 rue Cognacq-Jay, 51096, Reims, France
| | - D Gheldof
- CHU UCL Namur, Namur Thrombosis and Hemostasis Center, Hematology Laboratory, Rue Dr Gaston Therasse, Catholic University of Louvain, 5530, Yvoir, Belgium
| | - M Chollat
- TRIBVN, 39 Rue Louveau, 92320, Châtillon, France
| | - J Klossa
- TRIBVN, 39 Rue Louveau, 92320, Châtillon, France
| | - B Chatelain
- CHU UCL Namur, Namur Thrombosis and Hemostasis Center, Hematology Laboratory, Rue Dr Gaston Therasse, Catholic University of Louvain, 5530, Yvoir, Belgium
| | - O Piot
- BioSpecT EA 7506, Faculty of Pharmacy, University of Reims Champagne-Ardenne, 51 rue Cognacq-Jay, 51096, Reims, France.,Cellular and Tissular Imaging Platform PICT, Faculty of Pharmacy, University of Reims Champagne-Ardenne, 51 rue Cognacq-Jay, 51096, Reims, France
| |
Collapse
|
17
|
Dong Q, Gong S, Zhu X. Imbalanced Deep Learning by Minority Class Incremental Rectification. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2019; 41:1367-1381. [PMID: 29993438 DOI: 10.1109/tpami.2018.2832629] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Model learning from class imbalanced training data is a long-standing and significant challenge for machine learning. In particular, existing deep learning methods consider mostly either class balanced data or moderately imbalanced data in model training, and ignore the challenge of learning from significantly imbalanced training data. To address this problem, we formulate a class imbalanced deep learning model based on batch-wise incremental minority (sparsely sampled) class rectification by hard sample mining in majority (frequently sampled) classes during model training. This model is designed to minimise the dominant effect of majority classes by discovering sparsely sampled boundaries of minority classes in an iterative batch-wise learning process. To that end, we introduce a Class Rectification Loss (CRL) function that can be deployed readily in deep network architectures. Extensive experimental evaluations are conducted on three imbalanced person attribute benchmark datasets (CelebA, X-Domain, DeepFashion) and one balanced object category benchmark dataset (CIFAR-100). These experimental results demonstrate the performance advantages and model scalability of the proposed batch-wise incremental minority class rectification model over the existing state-of-the-art models for addressing the problem of imbalanced data learning.
Collapse
|
18
|
Abstract
Network traffic exhibits a high level of variability over short periods of time. This variability impacts negatively on the accuracy of anomaly-based network intrusion detection systems (IDS) that are built using predictive models in a batch learning setup. This work investigates how adapting the discriminating threshold of model predictions, specifically to the evaluated traffic, improves the detection rates of these intrusion detection models. Specifically, this research studied the adaptability features of three well known machine learning algorithms: C5.0, Random Forest and Support Vector Machine. Each algorithm’s ability to adapt their prediction thresholds was assessed and analysed under different scenarios that simulated real world settings using the prospective sampling approach. Multiple IDS datasets were used for the analysis, including a newly generated dataset (STA2018). This research demonstrated empirically the importance of threshold adaptation in improving the accuracy of detection models when training and evaluation traffic have different statistical properties. Tests were undertaken to analyse the effects of feature selection and data balancing on model accuracy when different significant features in traffic were used. The effects of threshold adaptation on improving accuracy were statistically analysed. Of the three compared algorithms, Random Forest was the most adaptable and had the highest detection rates.
Collapse
|
19
|
|
20
|
Barzegar R, Asghari Moghaddam A, Adamowski J, Nazemi AH. Delimitation of groundwater zones under contamination risk using a bagged ensemble of optimized DRASTIC frameworks. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2019; 26:8325-8339. [PMID: 30706265 DOI: 10.1007/s11356-019-04252-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Accepted: 01/14/2019] [Indexed: 06/09/2023]
Abstract
Developing a reliable groundwater vulnerability and contamination risk map is very important for groundwater management and protection. This study aims to compare various modified DRASTIC vulnerability frameworks based on rate calibration using the Wilcoxon rank-sum test (WRST), frequency ratio (FR) and weight optimization using the correlation coefficient (CC), the analytic hierarchy process (AHP), and genetic algorithms (GA), as well as to introduce, for the first time, an aggregated approach based on a bagging ensemble to develop a combined modified DRASTIC model. This research was conducted in the Khoy plain, NW Iran. To develop a typical DRASTIC map, seven DRASTIC data layers were generated, weighted, and then overlaid in ArcGIS. The nitrate (NO3) concentrations at 54 sites in the study area were used to validate the models by calculating the correlation coefficient (r) between the vulnerability/risk indices and NO3 concentrations. The calculated r value for the typical DRASTIC was 0.12. A sensitivity analysis reveals that the impact of the vadose zone and conductivity parameters with mean variation indices of 22.2 and 7.5%, respectively, have the highest and lowest influence on aquifer vulnerability. The r values increased for all the optimized frameworks. The results show that the WRST and GA methods are the most effective methods for calibration and optimization of DRASTIC rates and weights, with the WRST-GA-DRASTIC model obtaining an r value of 0.64. A bagging ensemble model was employed to combine the advantages of each standalone model. The bagging ensemble model yields an r value of 0.67. The ensemble model has the potential to increase the r value further than both the standalone optimized frameworks and the typical DRASTIC approach. In terms of spatial distribution class area (%), the bagging ensemble-DRASTIC model demonstrates that the moderate and low contamination risk classes with 16.4 and 23.1% of the total area cover the lowest and highest parts of the plain.
Collapse
Affiliation(s)
- Rahim Barzegar
- Department of Earth Sciences, Faculty of Natural Sciences, University of Tabriz, 29 Bahman Boulevard, Tabriz, Iran.
- Department of Bioresource Engineering, McGill University, 21111 Lakeshore, Ste Anne de Bellevue, Quebec, H9X3V9, Canada.
| | - Asghar Asghari Moghaddam
- Department of Earth Sciences, Faculty of Natural Sciences, University of Tabriz, 29 Bahman Boulevard, Tabriz, Iran
| | - Jan Adamowski
- Department of Bioresource Engineering, McGill University, 21111 Lakeshore, Ste Anne de Bellevue, Quebec, H9X3V9, Canada
| | - Amir Hossein Nazemi
- Department of Water Engineering, Faculty of Agriculture, University of Tabriz, 29 Bahman Boulevard, Tabriz, Iran
| |
Collapse
|
21
|
van Wyk F, Khojandi A, Kamaleswaran R. Improving Prediction Performance Using Hierarchical Analysis of Real-Time Data: A Sepsis Case Study. IEEE J Biomed Health Inform 2019; 23:978-986. [PMID: 30676988 DOI: 10.1109/jbhi.2019.2894570] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
This paper presents a novel method for hierarchical analysis of machine learning algorithms to improve predictions of at risk patients, thus further enabling prompt therapy. Specifically, we develop a multi-layer machine learning approach to analyze continuous, high-frequency data. We illustrate the capabilities of this approach for early identification of patients at risk of sepsis, a potentially life-threatening complication of an infection, using high-frequency (minute-by-minute) physiological data collected from bedside monitors. In our analysis of a cohort of 586 patients, the model obtained from analyzing the output of a previously developed sepsis prediction model resulted in improved outcomes. Specifically, the original model failed to predict 11.76 ± 4.26% of sepsis patients earlier than Systemic Inflammatory Response Syndrome (SIRS) criteria, commonly used to identify patients at risk for rapid physiological deterioration resulting from sepsis. In contrast, the multi-layer model only failed to predict 3.21 ± 3.11% of sepsis patients earlier than SIRS. In addition, sepsis patients were predicted on average 204.87 ± 7.90 minutes earlier than SIRS criteria using the multi-layer model, which can potentially help reduce mortality and morbidity if implemented in the ICU.
Collapse
|
22
|
Guermazi R, Chaabane I, Hammami M. AECID: Asymmetric entropy for classifying imbalanced data. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2018.07.076] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
23
|
Zakeri V, Hodgson AJ. Classifying hard and soft bone tissues using drilling sounds. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2018; 2017:2855-2858. [PMID: 29060493 DOI: 10.1109/embc.2017.8037452] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The purpose of this study was to investigate if the sounds generated during bone drilling could be used to classify between hard (cortical) and soft (cancellous) tissues. Bone drilling is performed in many surgical procedures throughout the world. Inadvertent deviation from the correct drill direction may result in injuries to sensitive anatomical structures such as nerve and vessels. Therefore, to increase the safety of such procedures, it is necessary to identify different bone tissues. The cortical and cancellous tissues of six bovine tibia pieces were drilled and the generated sounds were recorded. Each record was analyzed in different frequency regions based on the spectrograms. From each region, short-time Fourier transform (STFT) coefficients were computed and averaged accordingly to obtain n bins. The total bins of all frequency regions were chosen as the features. A support vector machine (SVM) algorithm was selected for classification and the performance was evaluated in two training/testing scenarios: leave one bone out (LOBO) and bone specific (BSP). The average total accuracy on the testing data was 70.9% and 83% for LOBO and BSP respectively. The results indicated that the drilling sounds obtained from various bone pieces could be used to develop a classification model that had promising performance on identifying hard and soft components of a new bone piece.
Collapse
|
24
|
Baseer A, Weddell SJ, Jones RD. Prediction of microsleeps using pairwise joint entropy and mutual information between EEG channels. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2018; 2017:4495-4498. [PMID: 29060896 DOI: 10.1109/embc.2017.8037855] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Microsleeps are involuntary and brief instances of complete loss of responsiveness, typically of 0.5-15 s duration. They adversely affect performance in extended attention-driven jobs and can be fatal. Our aim was to predict microsleeps from 16 channel EEG signals. Two information theoretic concepts - pairwise joint entropy and mutual information - were independently used to continuously extract features from EEG signals. k-nearest neighbor (kNN) with k = 3 was used to calculate both joint entropy and mutual information. Highly correlated features were discarded and the rest were ranked using Fisher score followed by an average of 3-fold cross-validation area under the curve of the receiver operating characteristic (AUCROC). Leave-one-out method (LOOM) was performed to test the performance of microsleep prediction system on independent data. The best prediction for 0.25 s ahead was AUCROC, sensitivity, precision, geometric mean (GM), and φ of 0.93, 0.68, 0.33, 0.75, and 0.38 respectively with joint entropy using single linear discriminant analysis (LDA) classifier.
Collapse
|
25
|
Mei N, Grossberg MD, Ng K, Navarro KT, Ellmore TM. Identifying sleep spindles with multichannel EEG and classification optimization. Comput Biol Med 2017; 89:441-453. [PMID: 28886481 PMCID: PMC5650544 DOI: 10.1016/j.compbiomed.2017.08.030] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Revised: 08/28/2017] [Accepted: 08/29/2017] [Indexed: 11/18/2022]
Abstract
Researchers classify critical neural events during sleep called spindles that are related to memory consolidation using the method of scalp electroencephalography (EEG). Manual classification is time consuming and is susceptible to low inter-rater agreement. This could be improved using an automated approach. This study presents an optimized filter based and thresholding (FBT) model to set up a baseline for comparison to evaluate machine learning models using naïve features, such as raw signals, peak frequency, and dominant power. The FBT model allows us to formally define sleep spindles using signal processing but may miss examples most human scorers would agree are spindles. Machine learning methods in theory should be able to approach performance of human raters but they require a large quantity of scored data, proper feature representation, intensive feature engineering, and model selection. We evaluate both the FBT model and machine learning models with naïve features. We show that the machine learning models derived from the FBT model improve classification performance. An automated approach designed for the current data was applied to the DREAMS dataset [1]. With one of the expert's annotation as a gold standard, our pipeline yields an excellent sensitivity that is close to a second expert's scores and with the advantage that it can classify spindles based on multiple channels if more channels are available. More importantly, our pipeline could be modified as a guide to aid manual annotation of sleep spindles based on multiple channels quickly (6-10 s for processing a 40-min EEG recording), making spindle detection faster and more objective.
Collapse
Affiliation(s)
- Ning Mei
- Department of Psychology, The City College of the City University of New York, USA
| | - Michael D Grossberg
- Department of Computer Science, The City College of the City University of New York, USA
| | - Kenneth Ng
- Department of Psychology, The City College of the City University of New York, USA
| | - Karen T Navarro
- Department of Psychology, The City College of the City University of New York, USA
| | - Timothy M Ellmore
- Department of Psychology, The City College of the City University of New York, USA.
| |
Collapse
|
26
|
Ahn H. Discussion. Int Stat Rev 2014. [DOI: 10.1111/insr.12061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Affiliation(s)
- Hongshik Ahn
- State University of New York; Stony Brook, NY USA
| |
Collapse
|
27
|
Hagar JC, Eskelson BNI, Haggerty PK, Nelson SK, Vesely DG. Modeling marbled murrelet (Brachyramphus marmoratus) habitat using LiDAR-derived canopy data. WILDLIFE SOC B 2014. [DOI: 10.1002/wsb.407] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Joan C. Hagar
- United States Geological Survey; Forest and Rangeland Ecosystem Science Center; Corvallis OR 97331 USA
| | - Bianca N. I. Eskelson
- Department of Forest Engineering; Resources and Management; Oregon State University; Corvallis OR 97331 USA
| | - Patricia K. Haggerty
- United States Geological Survey; Forest and Rangeland Ecosystem Science Center; Corvallis OR 97331 USA
| | - S. Kim Nelson
- Oregon Cooperative Wildlife Research Unit; Oregon State University; Corvallis OR 97331 USA
| | | |
Collapse
|
28
|
Lin WJ, Chen JJ. Class-imbalanced classifiers for high-dimensional data. Brief Bioinform 2012; 14:13-26. [DOI: 10.1093/bib/bbs006] [Citation(s) in RCA: 178] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
|
29
|
Lim N, Ahn H, Moon H, Chen JJ. Classification of High-Dimensional Data with Ensemble of Logistic Regression Models. J Biopharm Stat 2010; 20:160-71. [DOI: 10.1080/10543400903280639] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Noha Lim
- a Immune Tolerance Network , University of California-San Francisco , San Francisco , California , USA
| | - Hongshik Ahn
- b Department of Applied Mathematics and Statistics , Stony Brook University , Stony Brook , New York , USA
| | - Hojin Moon
- c Department of Mathematics and Statistics , California State University , Long Beach , California , USA
| | - James J. Chen
- d Division of Personalized Nutrition and Medicine, Biometry Branch , National Center for Toxicological Research , Jefferson , Arkansas , USA
| |
Collapse
|
30
|
Wang Y, Li Y, Ding J, Wang Y, Chang Y. Prediction of binding affinity for estrogen receptor alpha modulators using statistical learning approaches. Mol Divers 2008; 12:93-102. [PMID: 18661245 DOI: 10.1007/s11030-008-9080-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2008] [Accepted: 05/23/2008] [Indexed: 02/06/2023]
Abstract
The estrogen receptor (ER), an important drug target for the therapy of breast cancers, received a great deal of attention during recent years. This work aimed at finding more potent and selective ER modulators through the investigations of multiple ligand-receptor interactions by exploring the relationship between the experimental and predicted pIC50 values using in silico methods. A Bayesian-regularized neural network combined with principal component analysis has been conducted on a set of ERalpha modulators (127 molecules), resulting in the correlation coefficients of 0.91 +/- 0.02, 0.87 +/- 0.04 and 0.90 +/- 0.02 for the training set (64 molecules), cross-validation set (32 molecules) and independent test (31 molecules), respectively. Meanwhile, a multiple linear regression (MLR) method has also been applied in order to explore the most important variables related to the biological activities. The proposed MLR model obtains a reasonable predictivity of pIC50 (R = 0.72, Q = 0.79) and makes use of four molecular descriptors, namely, Xvch6, nelem, SsssCH and SaaN. All these results prove the reliabilities of the in silico models, which should be useful not only for the screening but also for the rational design of novel ERalpha modulators with improved potency.
Collapse
Affiliation(s)
- Yonghua Wang
- Key Lab of Mariculture and Biotechnology, Ministry of Agriculture, Dalian Fisheries University, Dalian, China.
| | | | | | | | | |
Collapse
|
31
|
Cox LA. What's wrong with risk matrices? RISK ANALYSIS : AN OFFICIAL PUBLICATION OF THE SOCIETY FOR RISK ANALYSIS 2008; 28:497-512. [PMID: 18419665 DOI: 10.1111/j.1539-6924.2008.01030.x] [Citation(s) in RCA: 156] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Risk matrices-tables mapping "frequency" and "severity" ratings to corresponding risk priority levels-are popular in applications as diverse as terrorism risk analysis, highway construction project management, office building risk analysis, climate change risk management, and enterprise risk management (ERM). National and international standards (e.g., Military Standard 882C and AS/NZS 4360:1999) have stimulated adoption of risk matrices by many organizations and risk consultants. However, little research rigorously validates their performance in actually improving risk management decisions. This article examines some mathematical properties of risk matrices and shows that they have the following limitations. (a) Poor Resolution. Typical risk matrices can correctly and unambiguously compare only a small fraction (e.g., less than 10%) of randomly selected pairs of hazards. They can assign identical ratings to quantitatively very different risks ("range compression"). (b) Errors. Risk matrices can mistakenly assign higher qualitative ratings to quantitatively smaller risks. For risks with negatively correlated frequencies and severities, they can be "worse than useless," leading to worse-than-random decisions. (c) Suboptimal Resource Allocation. Effective allocation of resources to risk-reducing countermeasures cannot be based on the categories provided by risk matrices. (d) Ambiguous Inputs and Outputs. Categorizations of severity cannot be made objectively for uncertain consequences. Inputs to risk matrices (e.g., frequency and severity categorizations) and resulting outputs (i.e., risk ratings) require subjective interpretation, and different users may obtain opposite ratings of the same quantitative risks. These limitations suggest that risk matrices should be used with caution, and only with careful explanations of embedded judgments.
Collapse
Affiliation(s)
- Louis Anthony Cox
- Cox Associates and University of Colorado, 503 Franklin St., Denver, CO 80218, USA.
| |
Collapse
|
32
|
Liu H, Papa E, Walker JD, Gramatica P. In silico screening of estrogen-like chemicals based on different nonlinear classification models. J Mol Graph Model 2007; 26:135-44. [PMID: 17293141 DOI: 10.1016/j.jmgm.2007.01.003] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2006] [Revised: 01/10/2007] [Accepted: 01/12/2007] [Indexed: 01/28/2023]
Abstract
Increasing concern is being shown by the scientific community, government regulators, and the public about endocrine-disrupting chemicals that are adversely affecting human and wildlife health through a variety of mechanisms. There is a great need for an effective means of rapidly assessing endocrine-disrupting activity, especially estrogen-simulating activity, because of the large number of such chemicals in the environment. In this study, quantitative structure activity relationship (QSAR) models were developed to quickly and effectively identify possible estrogen-like chemicals based on 232 structurally-diverse chemicals (training set) by using several nonlinear classification methodologies (least-square support vector machine (LS-SVM), counter-propagation artificial neural network (CP-ANN), and k nearest neighbour (kNN)) based on molecular structural descriptors. The models were externally validated by 87 chemicals (prediction set) not included in the training set. All three methods can give satisfactory prediction results both for training and prediction sets, and the most accurate model was obtained by the LS-SVM approach through the comparison of performance. In addition, our model was also applied to about 58,000 discrete organic chemicals; about 76% were predicted not to bind to Estrogen Receptor. The obtained results indicate that the proposed QSAR models are robust, widely applicable and could provide a feasible and practical tool for the rapid screening of potential estrogens.
Collapse
Affiliation(s)
- Huanxiang Liu
- Department of Structural and Functional Biology, QSAR Research Unit in Environmental Chemistry and Ecotoxicology, University of Insubria, via Dunant 3, 21100 Varese, Italy
| | | | | | | |
Collapse
|