Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

61
(from Reference Citation Analysis)

Article PDFs (13)

Cited by > 0 (39)

Searched Name

Class imbalance

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Xu J, Ruan X, Yang J, Hu B, Li S, Hu J. SME-MFP: A novel spatiotemporal neural network with multiangle initialization embedding toward multifunctional peptides prediction. Comput Biol Chem 2024;109:108033. [PMID: 38412804 DOI: 10.1016/j.compbiolchem.2024.108033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Revised: 01/09/2024] [Accepted: 02/17/2024] [Indexed: 02/29/2024]

Liu CL, Lee MH, Hsueh SN, Chung CC, Lin CJ, Chang PH, Luo AC, Weng HC, Lee YH, Dai MJ, Tsai MJ. A bagging approach for improved predictive accuracy of intradialytic hypotension during hemodialysis treatment. Comput Biol Med 2024;172:108244. [PMID: 38457931 DOI: 10.1016/j.compbiomed.2024.108244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 02/24/2024] [Accepted: 03/04/2024] [Indexed: 03/10/2024]

Cusworth S, Gkoutos GV, Acharjee A. A novel generative adversarial networks modelling for the class imbalance problem in high dimensional omics data. BMC Med Inform Decis Mak 2024;24:90. [PMID: 38549123 PMCID: PMC10979623 DOI: 10.1186/s12911-024-02487-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Accepted: 03/22/2024] [Indexed: 04/01/2024] Open

Zhang S, Zhu C, Li H, Cai J, Yang L. Gradient-aware learning for joint biases: Label noise and class imbalance. Neural Netw 2024;171:374-382. [PMID: 38134600 DOI: 10.1016/j.neunet.2023.12.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 12/04/2023] [Accepted: 12/16/2023] [Indexed: 12/24/2023]

Li Y, Wang Y, Lin G, Huang Y, Liu J, Lin Y, Wei D, Zhang Q, Ma K, Zhang Z, Lu G, Zheng Y. Triplet-branch network with contrastive prior-knowledge embedding for disease grading. Artif Intell Med 2024;149:102801. [PMID: 38462290 DOI: 10.1016/j.artmed.2024.102801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 11/28/2023] [Accepted: 02/03/2024] [Indexed: 03/12/2024]

Lee SJ, Oh HJ, Son YD, Kim JH, Kwon IJ, Kim B, Lee JH, Kim HK. Enhancing deep learning classification performance of tongue lesions in imbalanced data: mosaic-based soft labeling with curriculum learning. BMC Oral Health 2024;24:161. [PMID: 38302981 PMCID: PMC10832072 DOI: 10.1186/s12903-024-03898-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 01/15/2024] [Indexed: 02/03/2024] Open

Abstract

BACKGROUND

Oral potentially malignant disorders (OPMDs) are associated with an increased risk of cancer of the oral cavity including the tongue. The early detection of oral cavity cancers and OPMDs is critical for reducing cancer-specific morbidity and mortality. Recently, there have been studies to apply the rapidly advancing technology of deep learning for diagnosing oral cavity cancer and OPMDs. However, several challenging issues such as class imbalance must be resolved to effectively train a deep learning model for medical imaging classification tasks. The aim of this study is to evaluate a new technique of artificial intelligence to improve the classification performance in an imbalanced tongue lesion dataset.

METHODS

A total of 1,810 tongue images were used for the classification. The class-imbalanced dataset consisted of 372 instances of cancer, 141 instances of OPMDs, and 1,297 instances of noncancerous lesions. The EfficientNet model was used as the feature extraction model for classification. Mosaic data augmentation, soft labeling, and curriculum learning (CL) were employed to improve the classification performance of the convolutional neural network.

RESULTS

Utilizing a mosaic-augmented dataset in conjunction with CL, the final model achieved an accuracy rate of 0.9444, surpassing conventional oversampling and weight balancing methods. The relative precision improvement rate for the minority class OPMD was 21.2%, while the relative [Formula: see text] score improvement rate of OPMD was 4.9%.

CONCLUSIONS

The present study demonstrates that the integration of mosaic-based soft labeling and curriculum learning improves the classification performance of tongue lesions compared to previous methods, establishing a foundation for future research on effectively learning from imbalanced data.

Collapse

Hong C, Liu M, Wojdyla DM, Hickey J, Pencina M, Henao R. Trans-Balance: Reducing demographic disparity for prediction models in the presence of class imbalance. J Biomed Inform 2024;149:104532. [PMID: 38070817 PMCID: PMC10850917 DOI: 10.1016/j.jbi.2023.104532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Revised: 10/21/2023] [Accepted: 10/28/2023] [Indexed: 12/21/2023]

Li X, Wu Q, Wang M, Wu K. Uncertainty-aware network for fine-grained and imbalanced reflux esophagitis grading. Comput Biol Med 2024;168:107751. [PMID: 38016373 DOI: 10.1016/j.compbiomed.2023.107751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Revised: 10/22/2023] [Accepted: 11/20/2023] [Indexed: 11/30/2023]

Alkhawaldeh IM, Albalkhi I, Naswhan AJ. Challenges and limitations of synthetic minority oversampling techniques in machine learning. World J Methodol 2023;13:373-378. [PMID: 38229946 PMCID: PMC10789107 DOI: 10.5662/wjm.v13.i5.373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 09/30/2023] [Accepted: 11/03/2023] [Indexed: 12/20/2023] Open

Wang X, Qiao Y, Cui Y, Ren H, Zhao Y, Linghu L, Ren J, Zhao Z, Chen L, Qiu L. An explainable artificial intelligence framework for risk prediction of COPD in smokers. BMC Public Health 2023;23:2164. [PMID: 37932692 PMCID: PMC10626705 DOI: 10.1186/s12889-023-17011-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Accepted: 10/17/2023] [Indexed: 11/08/2023] Open

Abstract

BACKGROUND

Since the inconspicuous nature of early signs associated with Chronic Obstructive Pulmonary Disease (COPD), individuals often remain unidentified, leading to suboptimal opportunities for timely prevention and treatment. The purpose of this study was to create an explainable artificial intelligence framework combining data preprocessing methods, machine learning methods, and model interpretability methods to identify people at high risk of COPD in the smoking population and to provide a reasonable interpretation of model predictions.

METHODS

The data comprised questionnaire information, physical examination data and results of pulmonary function tests before and after bronchodilatation. First, the factorial analysis for mixed data (FAMD), Boruta and NRSBoundary-SMOTE resampling methods were used to solve the missing data, high dimensionality and category imbalance problems. Then, seven classification models (CatBoost, NGBoost, XGBoost, LightGBM, random forest, SVM and logistic regression) were applied to model the risk level, and the best machine learning (ML) model's decisions were explained using the Shapley additive explanations (SHAP) method and partial dependence plot (PDP).

RESULTS

In the smoking population, age and 14 other variables were significant factors for predicting COPD. The CatBoost, random forest, and logistic regression models performed reasonably well in unbalanced datasets. CatBoost with NRSBoundary-SMOTE had the best classification performance in balanced datasets when composite indicators (the AUC, F1-score, and G-mean) were used as model comparison criteria. Age, COPD Assessment Test (CAT) score, gross annual income, body mass index (BMI), systolic blood pressure (SBP), diastolic blood pressure (DBP), anhelation, respiratory disease, central obesity, use of polluting fuel for household heating, region, use of polluting fuel for household cooking, and wheezing were important factors for predicting COPD in the smoking population.

CONCLUSION

This study combined feature screening methods, unbalanced data processing methods, and advanced machine learning methods to enable early identification of COPD risk groups in the smoking population. COPD risk factors in the smoking population were identified using SHAP and PDP, with the goal of providing theoretical support for targeted screening strategies and smoking population self-management strategies.

Collapse

Abbas Q, Malik KM, Saudagar AKJ, Khan MB. Context-aggregator: An approach of loss- and class imbalance-aware aggregation in federated learning. Comput Biol Med 2023;163:107167. [PMID: 37421740 DOI: 10.1016/j.compbiomed.2023.107167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 05/26/2023] [Accepted: 06/08/2023] [Indexed: 07/10/2023]

Abstract

Federated Learning (FL) is an emerging distributed learning paradigm which offers data privacy to contributing nodes in the collaborating environment. By exploiting the Individual datasets of different hospitals in FL setting could be used to develop reliable screening, diagnosis, and treatment predictive models to tackle major challenges such as pandemics. FL can enable the development of very diverse medical imaging datasets and thus provide more reliable models for all participating nodes, including those with low quality data. However, the issue with the traditional Federated Learning paradigm is the degradation of generalization power due to poorly trained local models at the client nodes. The generalization power of the FL paradigm can be improved by considering the relative learning contribution of client nodes. Simple aggregation of learning parameters in the standard FL model faces a diversity issue and results in more validation loss during the learning process. This issue can be resolved by considering the relative contribution of each client node participating in the learning process. The class imbalance at each site is another significant challenge that greatly impacts the performance of the aggregated learning model. This work considers Context Aggregator FL based on the context of loss-factor and class-imbalance issues by incorporating the relative contribution of the collaborating nodes in FL by proposing Validation-Loss based Context Aggregator (CAVL) and Class Imbalance based Context Aggregator (CACI). The proposed Context Aggregator is evaluated on several different Covid-19 imaging classification datasets present on participating nodes. The evaluation results show that Context Aggregator performs better than standard Federating average Learning algorithms and FedProx Algorithm for Covid-19 image classification problems.

Collapse

Thölke P, Mantilla-Ramos YJ, Abdelhedi H, Maschke C, Dehgan A, Harel Y, Kemtur A, Mekki Berrada L, Sahraoui M, Young T, Bellemare Pépin A, El Khantour C, Landry M, Pascarella A, Hadid V, Combrisson E, O'Byrne J, Jerbi K. Class imbalance should not throw you off balance: Choosing the right classifiers and performance metrics for brain decoding with imbalanced data. Neuroimage 2023:120253. [PMID: 37385392 DOI: 10.1016/j.neuroimage.2023.120253] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 06/05/2023] [Accepted: 06/26/2023] [Indexed: 07/01/2023] Open

Abstract

Machine learning (ML) is increasingly used in cognitive, computational and clinical neuroscience. The reliable and efficient application of ML requires a sound understanding of its subtleties and limitations. Training ML models on datasets with imbalanced classes is a particularly common problem, and it can have severe consequences if not adequately addressed. With the neuroscience ML user in mind, this paper provides a didactic assessment of the class imbalance problem and illustrates its impact through systematic manipulation of data imbalance ratios in (i) simulated data and (ii) brain data recorded with electroencephalography (EEG), magnetoencephalography (MEG) and functional magnetic resonance imaging (fMRI). Our results illustrate how the widely-used Accuracy (Acc) metric, which measures the overall proportion of successful predictions, yields misleadingly high performances, as class imbalance increases. Because Acc weights the per-class ratios of correct predictions proportionally to class size, it largely disregards the performance on the minority class. A binary classification model that learns to systematically vote for the majority class will yield an artificially high decoding accuracy that directly reflects the imbalance between the two classes, rather than any genuine generalizable ability to discriminate between them. We show that other evaluation metrics such as the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC), and the less common Balanced Accuracy (BAcc) metric - defined as the arithmetic mean between sensitivity and specificity, provide more reliable performance evaluations for imbalanced data. Our findings also highlight the robustness of Random Forest (RF), and the benefits of using stratified cross-validation and hyperprameter optimization to tackle data imbalance. Critically, for neuroscience ML applications that seek to minimize overall classification error, we recommend the routine use of BAcc, which in the specific case of balanced data is equivalent to using standard Acc, and readily extends to multi-class settings. Importantly, we present a list of recommendations for dealing with imbalanced data, as well as open-source code to allow the neuroscience community to replicate and extend our observations and explore alternative approaches to coping with imbalanced data.

Collapse

Affiliation(s)

Philipp Thölke Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada; Institute of Cognitive Science, Osnabrück University, Neuer Graben 29/Schloss, Osnabrück, 49074, Lower Saxony, Germany.
Yorguin-Jose Mantilla-Ramos Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada; Neuropsychology and Behavior Group (GRUNECO), Faculty of Medicine, Universidad de Antioquia,53-108, Medellin, Aranjuez, Medellin, 050010, Colombia
Hamza Abdelhedi Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada
Charlotte Maschke Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada; Integrated Program in Neuroscience, McGill University, 1033 Pine Ave,Montreal, H3A 0G4, Canada
Arthur Dehgan Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada; Institut de Neurosciences de la Timone (INT), CNRS, Aix Marseille University,Marseille, 13005, France
Yann Harel Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada
Anirudha Kemtur Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada
Loubna Mekki Berrada Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada
Myriam Sahraoui Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada
Tammy Young Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada; Department of Computing Science, University of Alberta, 116 St & 85 Ave, Edmonton, T6G 2R3, AB, Canada
Antoine Bellemare Pépin Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada; Department of Music, Concordia University, 1550 De Maisonneuve Blvd. W., Montreal, H3H 1G8, QC, Canada
Clara El Khantour Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada
Mathieu Landry Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada
Annalisa Pascarella Institute for Applied Mathematics Mauro Picone, National Research Council, Roma, Italy, Roma, Italy
Vanessa Hadid Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada
Etienne Combrisson Institut de Neurosciences de la Timone (INT), CNRS, Aix Marseille University,Marseille, 13005, France
Jordan O'Byrne Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada
Karim Jerbi Cognitive and Computational Neuroscience Laboratory (CoCo Lab), University of Montreal, 2900, boul. Edouard-Montpetit, Montreal, H3T 1J4, Quebec, Canada; Mila (Quebec Machine Learning Institute),6666 Rue Saint-Urbain, Montreal, H2S 3H1, QC, Canada; UNIQUE Centre (Quebec Neuro-AI Research Centre), 3744 rue Jean-Brillant, Montreal,H3T 1P1,QC, Canada

Collapse

Veturi YA, Woof W, Lazebnik T, Moghul I, Woodward-Court P, Wagner SK, Cabral de Guimarães TA, Daich Varela M, Liefers B, Patel PJ, Beck S, Webster AR, Mahroo O, Keane PA, Michaelides M, Balaskas K, Pontikos N. SynthEye: Investigating the Impact of Synthetic Data on Artificial Intelligence-assisted Gene Diagnosis of Inherited Retinal Disease. Ophthalmol Sci 2023;3:100258. [PMID: 36685715 PMCID: PMC9852957 DOI: 10.1016/j.xops.2022.100258] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 11/08/2022] [Accepted: 11/09/2022] [Indexed: 11/23/2022]

Abstract

Purpose

Rare disease diagnosis is challenging in medical image-based artificial intelligence due to a natural class imbalance in datasets, leading to biased prediction models. Inherited retinal diseases (IRDs) are a research domain that particularly faces this issue. This study investigates the applicability of synthetic data in improving artificial intelligence-enabled diagnosis of IRDs using generative adversarial networks (GANs).

Design

Diagnostic study of gene-labeled fundus autofluorescence (FAF) IRD images using deep learning.

Participants

Moorfields Eye Hospital (MEH) dataset of 15 692 FAF images obtained from 1800 patients with confirmed genetic diagnosis of 1 of 36 IRD genes.

Methods

A StyleGAN2 model is trained on the IRD dataset to generate 512 × 512 resolution images. Convolutional neural networks are trained for classification using different synthetically augmented datasets, including real IRD images plus 1800 and 3600 synthetic images, and a fully rebalanced dataset. We also perform an experiment with only synthetic data. All models are compared against a baseline convolutional neural network trained only on real data.

Main Outcome Measures

We evaluated synthetic data quality using a Visual Turing Test conducted with 4 ophthalmologists from MEH. Synthetic and real images were compared using feature space visualization, similarity analysis to detect memorized images, and Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) score for no-reference-based quality evaluation. Convolutional neural network diagnostic performance was determined on a held-out test set using the area under the receiver operating characteristic curve (AUROC) and Cohen's Kappa (κ).

Results

An average true recognition rate of 63% and fake recognition rate of 47% was obtained from the Visual Turing Test. Thus, a considerable proportion of the synthetic images were classified as real by clinical experts. Similarity analysis showed that the synthetic images were not copies of the real images, indicating that copied real images, meaning the GAN was able to generalize. However, BRISQUE score analysis indicated that synthetic images were of significantly lower quality overall than real images (P < 0.05). Comparing the rebalanced model (RB) with the baseline (R), no significant change in the average AUROC and κ was found (R-AUROC = 0.86[0.85-88], RB-AUROC = 0.88[0.86-0.89], R-k = 0.51[0.49-0.53], and RB-k = 0.52[0.50-0.54]). The synthetic data trained model (S) achieved similar performance as the baseline (S-AUROC = 0.86[0.85-87], S-k = 0.48[0.46-0.50]).

Conclusions

Synthetic generation of realistic IRD FAF images is feasible. Synthetic data augmentation does not deliver improvements in classification performance. However, synthetic data alone deliver a similar performance as real data, and hence may be useful as a proxy to real data. Financial Disclosure(s): Proprietary or commercial disclosure may be found after the references.

Collapse

Affiliation(s)

Yoga Advaith Veturi University College London Institute of Ophthalmology, University College London, London, UK Moorfields Eye Hospital, London, UK
William Woof University College London Institute of Ophthalmology, University College London, London, UK Moorfields Eye Hospital, London, UK
Teddy Lazebnik University College London Cancer Institute, University College London, London, UK
Ismail Moghul Moorfields Eye Hospital, London, UK
Peter Woodward-Court University College London Institute of Ophthalmology, University College London, London, UK Moorfields Eye Hospital, London, UK
Siegfried K. Wagner University College London Institute of Ophthalmology, University College London, London, UK Moorfields Eye Hospital, London, UK
Thales Antonio Cabral de Guimarães University College London Institute of Ophthalmology, University College London, London, UK Moorfields Eye Hospital, London, UK
Malena Daich Varela University College London Institute of Ophthalmology, University College London, London, UK Moorfields Eye Hospital, London, UK
Bart Liefers Moorfields Eye Hospital, London, UK
Praveen J. Patel Moorfields Eye Hospital, London, UK
Stephan Beck University College London Cancer Institute, University College London, London, UK
Andrew R. Webster University College London Institute of Ophthalmology, University College London, London, UK Moorfields Eye Hospital, London, UK
Omar Mahroo University College London Institute of Ophthalmology, University College London, London, UK Moorfields Eye Hospital, London, UK
Pearse A. Keane University College London Institute of Ophthalmology, University College London, London, UK Moorfields Eye Hospital, London, UK
Michel Michaelides University College London Institute of Ophthalmology, University College London, London, UK Moorfields Eye Hospital, London, UK
Konstantinos Balaskas University College London Institute of Ophthalmology, University College London, London, UK Moorfields Eye Hospital, London, UK
Nikolas Pontikos University College London Institute of Ophthalmology, University College London, London, UK Moorfields Eye Hospital, London, UK

Collapse

Junaid M, Ali S, Eid F, El-Sappagh S, Abuhmed T. Explainable machine learning models based on multimodal time-series data for the early detection of Parkinson's disease. Comput Methods Programs Biomed 2023;234:107495. [PMID: 37003039 DOI: 10.1016/j.cmpb.2023.107495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 02/23/2023] [Accepted: 03/17/2023] [Indexed: 06/19/2023]

Abstract

BACKGROUND AND OBJECTIVES

Parkinson's Disease (PD) is a devastating chronic neurological condition. Machine learning (ML) techniques have been used in the early prediction of PD progression. Fusion of heterogeneous data modalities proved its capability to improve the performance of ML models. Time series data fusion supports the tracking of the disease over time. In addition, the trustworthiness of the resulting models is improved by adding model explainability features. The literature on PD has not sufficiently explored these three points.

METHODS

In this work, we proposed an ML pipeline for predicting the progression of PD that is both accurate and explainable. We explore the fusion of different combinations of five time series modalities from the Parkinson's Progression Markers Initiative (PPMI) real-world dataset, including patient characteristics, biosamples, medication history, motor, and non-motor function data. Each patient has six visits. The problem has been formulated in two ways: ❶ a three-class based progression prediction with 953 patients in each time series modality, and ❷ a four-class based progression prediction with 1,060 patients in each time series modality. The statistical features of these six visits were calculated from each modality and diverse feature selection methods were applied to select the most informative feature sets. The extracted features were used to train a set of well-known ML models including Support vector machines (SVM), random forests (RF), extra tree classifier (ETC), light gradient boosting machines (LGBM), and stochastic gradient descent (SGD). We examined a number of data-balancing strategies in the pipeline with different combinations of modalities. ML models have been optimized using the Bayesian optimizer. A comprehensive evaluation of various ML methods has been conducted, and the best models have been extended to provide different explainability features.

RESULTS

We compare the performance of ML models before and after optimization and using and without using feature selection. In the three-class experiment and with various modality fusions, the LGBM model produced the most accurate results with a 10-fold cross-validation (10-CV) accuracy of 90.73% using non-motor function modality. RF produced the best results in the four-class experiment with various modality fusions with a 10-CV accuracy of 94.57% using non-motor modality. With the fused dataset of non-motor and motor function modalities, the LGBM model outperformed the other ML models in both the 3-class and 4-class experiments (i.e., 10-CV accuracy of 94.89% and 93.73%, respectively). Using the Shapely Additive Explanations (SHAP) framework, we employed global and instance-based explanations to explain the behavior of each ML classifier. Moreover, we extended the explainability by implementing the LIME and SHAPASH local explainers. The consistency of these explainers has been explored. The resultant classifiers were accurate, explainable, and thus medically more relevant and applicable.

CONCLUSIONS

The select modalities and feature sets were confirmed by the literature and medical experts. The various explainers suggest that the bradykinesia (NP3BRADY) feature was the most dominant and consistent. By providing thorough insights into the influence of multiple modalities on the disease risk, the suggested approach is expected to help improve the clinical knowledge of PD progression processes.

Collapse

Iqbal S, Qureshi AN, Li J, Choudhry IA, Mahmood T. Dynamic learning for imbalanced data in learning chest X-ray and CT images. Heliyon 2023;9:e16807. [PMID: 37313141 PMCID: PMC10258426 DOI: 10.1016/j.heliyon.2023.e16807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Revised: 05/26/2023] [Accepted: 05/29/2023] [Indexed: 06/15/2023] Open

Zhang H, Zhong X, Li G, Liu W, Liu J, Ji D, Li X, Wu J. BCU-Net: Bridging ConvNeXt and U-Net for medical image segmentation. Comput Biol Med 2023;159:106960. [PMID: 37099973 DOI: 10.1016/j.compbiomed.2023.106960] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 04/12/2023] [Accepted: 04/17/2023] [Indexed: 04/28/2023]

Lu H, Barrett A, Pierce A, Zheng J, Wang Y, Chiang C, Rakovski C. Predicting suicidal and self-injurious events in a correctional setting using AI algorithms on unstructured medical notes and structured data. J Psychiatr Res 2023;160:19-27. [PMID: 36773344 DOI: 10.1016/j.jpsychires.2023.01.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Revised: 01/23/2023] [Accepted: 01/26/2023] [Indexed: 01/31/2023]

Lin F, Xia Y, Song S, Ravikumar N, Frangi AF. High-throughput 3DRA segmentation of brain vasculature and aneurysms using deep learning. Comput Methods Programs Biomed 2023;230:107355. [PMID: 36709557 DOI: 10.1016/j.cmpb.2023.107355] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 01/10/2023] [Accepted: 01/13/2023] [Indexed: 06/18/2023]

Abstract

BACKGROUND AND OBJECTIVES

Automatic segmentation of the cerebral vasculature and aneurysms facilitates incidental detection of aneurysms. The assessment of aneurysm rupture risk assists with pre-operative treatment planning and enables in-silico investigation of cerebral hemodynamics within and in the vicinity of aneurysms. However, ensuring precise and robust segmentation of cerebral vessels and aneurysms in neuroimaging modalities such as three-dimensional rotational angiography (3DRA) is challenging. The vasculature constitutes a small proportion of the image volume, resulting in a large class imbalance (relative to surrounding brain tissue). Additionally, aneurysms and vessels have similar image/appearance characteristics, making it challenging to distinguish the aneurysm sac from the vessel lumen.

METHODS

We propose a novel multi-class convolutional neural network to tackle these challenges and facilitate the automatic segmentation of cerebral vessels and aneurysms in 3DRA images. The proposed model is trained and evaluated on an internal multi-center dataset and an external publicly available challenge dataset.

RESULTS

On the internal clinical dataset, our method consistently outperformed several state-of-the-art approaches for vessel and aneurysm segmentation, achieving an average Dice score of 0.81 (0.15 higher than nnUNet) and an average surface-to-surface error of 0.20 mm (less than the in-plane resolution (0.35 mm/pixel)) for aneurysm segmentation; and an average Dice score of 0.91 and average surface-to-surface error of 0.25 mm for vessel segmentation. In 223 cases of a clinical dataset, our method accurately segmented 190 aneurysm cases.

CONCLUSIONS

The proposed approach can help address class imbalance problems and inter-class interference problems in multi-class segmentation. Besides, this method performs consistently on clinical datasets from four different sources and the generated results are qualified for hemodynamic simulation. Code available at https://github.com/cistib/vessel-aneurysm-segmentation.

Collapse

Bigoulaeva I, Hangya V, Gurevych I, Fraser A. Label modification and bootstrapping for zero-shot cross-lingual hate speech detection. LANG RESOUR EVAL 2023;57:1515-1546. [PMID: 38021031 PMCID: PMC10656307 DOI: 10.1007/s10579-023-09637-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/13/2023] [Indexed: 02/21/2023]

Hu L, Fu C, Ren Z, Cai Y, Yang J, Xu S, Xu W, Tang D. SSELM-neg: spherical search-based extreme learning machine for drug-target interaction prediction. BMC Bioinformatics 2023;24:38. [PMID: 36737694 PMCID: PMC9896467 DOI: 10.1186/s12859-023-05153-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 01/18/2023] [Indexed: 02/05/2023] Open

Affiliation(s)

Lingzhi Hu grid.411847.f0000 0004 1804 4300School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, People’s Republic of China
Chengzhou Fu grid.411847.f0000 0004 1804 4300School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, People’s Republic of China ,3Guangdong Province Precise Medicine Big Data of Traditional Chinese Medicine Engineering Technology Research Center, Guangzhou, People’s Republic of China
Zhonglu Ren grid.411847.f0000 0004 1804 4300School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, People’s Republic of China
Yongming Cai grid.411847.f0000 0004 1804 4300School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, People’s Republic of China ,3Guangdong Province Precise Medicine Big Data of Traditional Chinese Medicine Engineering Technology Research Center, Guangzhou, People’s Republic of China
Jin Yang grid.411847.f0000 0004 1804 4300School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, People’s Republic of China ,3Guangdong Province Precise Medicine Big Data of Traditional Chinese Medicine Engineering Technology Research Center, Guangzhou, People’s Republic of China
Siwen Xu grid.411847.f0000 0004 1804 4300School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, People’s Republic of China
Wenhua Xu grid.411847.f0000 0004 1804 4300School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, People’s Republic of China
Deyu Tang grid.411847.f0000 0004 1804 4300School of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, People’s Republic of China ,2grid.79703.3a0000 0004 1764 3838School of Computer Science and Engineering, South China University of Technology, Guangzhou, People’s Republic of China ,3Guangdong Province Precise Medicine Big Data of Traditional Chinese Medicine Engineering Technology Research Center, Guangzhou, People’s Republic of China

Collapse

Cartus AR, Samuels EA, Cerdá M, Marshall BDL. Outcome class imbalance and rare events: An underappreciated complication for overdose risk prediction modeling. Addiction 2023;118:1167-1176. [PMID: 36683137 PMCID: PMC10175167 DOI: 10.1111/add.16133] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Accepted: 12/22/2022] [Indexed: 01/24/2023]

Abstract

BACKGROUND AND AIMS

Low outcome prevalence, often observed with opioid-related outcomes, poses an underappreciated challenge to accurate predictive modeling. Outcome class imbalance, where non-events (i.e. negative class observations) outnumber events (i.e. positive class observations) by a moderate to extreme degree, can distort measures of predictive accuracy in misleading ways, and make the overall predictive accuracy and the discriminatory ability of a predictive model appear spuriously high. We conducted a simulation study to measure the impact of outcome class imbalance on predictive performance of a simple SuperLearner ensemble model and suggest strategies for reducing that impact.

DESIGN, SETTING, PARTICIPANTS

Using a Monte Carlo design with 250 repetitions, we trained and evaluated these models on four simulated data sets with 100 000 observations each: one with perfect balance between events and non-events, and three where non-events outnumbered events by an approximate factor of 10:1, 100:1, and 1000:1, respectively.

MEASUREMENTS

We evaluated the performance of these models using a comprehensive suite of measures, including measures that are more appropriate for imbalanced data.

FINDINGS

Increasing imbalance tended to spuriously improve overall accuracy (using a high threshold to classify events vs non-events, overall accuracy improved from 0.45 with perfect balance to 0.99 with the most severe outcome class imbalance), but diminished predictive performance was evident using other metrics (corresponding positive predictive value decreased from 0.99 to 0.14).

CONCLUSION

Increasing reliance on algorithmic risk scores in consequential decision-making processes raises critical fairness and ethical concerns. This paper provides broad guidance for analytic strategies that clinical investigators can use to remedy the impacts of outcome class imbalance on risk prediction tools.

Collapse

梁进, 周强, 李婉. [Single-channel electroencephalogram signal used for sleep state recognition based on one-dimensional width kernel convolutional neural networks and long-short-term memory networks]. Sheng Wu Yi Xue Gong Cheng Xue Za Zhi 2022;39:1089-1096. [PMID: 36575077 PMCID: PMC9927194 DOI: 10.7507/1001-5515.202204021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 10/26/2022] [Indexed: 12/29/2022]

Gökkan O, Kuntalp M. A new imbalance-aware loss function to be used in a deep neural network for colorectal polyp segmentation. Comput Biol Med 2022;151:106205. [PMID: 36370582 DOI: 10.1016/j.compbiomed.2022.106205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 09/14/2022] [Accepted: 10/09/2022] [Indexed: 12/27/2022]

Chatterjee S, Maity S, Bhattacharjee M, Banerjee S, Das AK, Ding W. Variational Autoencoder Based Imbalanced COVID-19 Detection Using Chest X-Ray Images. New Gener Comput 2022;41:25-60. [PMID: 36439303 PMCID: PMC9676807 DOI: 10.1007/s00354-022-00194-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 10/16/2022] [Indexed: 06/12/2023]

Kalsotra R, Arora S. Performance analysis of U-Net with hybrid loss for foreground detection. Multimed Syst 2022;29:771-786. [PMID: 36406901 PMCID: PMC9641683 DOI: 10.1007/s00530-022-01014-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 10/10/2022] [Indexed: 06/16/2023]

Qian S, Ren K, Zhang W, Ning H. Skin lesion classification using CNNs with grouping of multi-scale attention and class-specific loss weighting. Comput Methods Programs Biomed 2022;226:107166. [PMID: 36209623 DOI: 10.1016/j.cmpb.2022.107166] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 09/05/2022] [Accepted: 09/29/2022] [Indexed: 06/16/2023]

Xing M, Zhang Y, Yu H, Yang Z, Li X, Li Q, Zhao Y, Zhao Z, Luo Y. Predict DLBCL patients' recurrence within two years with Gaussian mixture model cluster oversampling and multi-kernel learning. Comput Methods Programs Biomed 2022;226:107103. [PMID: 36088813 DOI: 10.1016/j.cmpb.2022.107103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Revised: 08/05/2022] [Accepted: 08/30/2022] [Indexed: 06/15/2023]

Matharaarachchi S, Domaratzki M, Muthukumarana S. Minimizing features while maintaining performance in data classification problems. PeerJ Comput Sci 2022;8:e1081. [PMID: 36262135 PMCID: PMC9575878 DOI: 10.7717/peerj-cs.1081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 08/10/2022] [Indexed: 06/16/2023]

Zhu S, Meng Q. What can we learn from autonomous vehicle collision data on crash severity? A cost-sensitive CART approach. Accid Anal Prev 2022;174:106769. [PMID: 35858521 DOI: 10.1016/j.aap.2022.106769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 04/17/2022] [Accepted: 07/02/2022] [Indexed: 06/15/2023]

Narwane SV, Sawarkar SD. Is handling unbalanced datasets for machine learning uplifts system performance?: A case of diabetic prediction. Diabetes Metab Syndr 2022;16:102609. [PMID: 36099677 DOI: 10.1016/j.dsx.2022.102609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 08/21/2022] [Accepted: 08/23/2022] [Indexed: 11/30/2022]

Holste G, Wang S, Jiang Z, Shen TC, Shih G, Summers RM, Peng Y, Wang Z. Long-Tailed Classification of Thorax Diseases on Chest X-Ray: A New Benchmark Study. Data Augment Label Imperfections (2022) 2022;13567:22-32. [PMID: 36318048 PMCID: PMC9618235 DOI: 10.1007/978-3-031-17027-0_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]

A Romero RA, Y Deypalan MN, Mehrotra S, Jungao JT, Sheils NE, Manduchi E, Moore JH. Benchmarking AutoML frameworks for disease prediction using medical claims. BioData Min 2022;15:15. [PMID: 35883154 PMCID: PMC9327416 DOI: 10.1186/s13040-022-00300-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 06/27/2022] [Indexed: 11/10/2022] Open

Abstract

Objectives

Ascertain and compare the performances of Automated Machine Learning (AutoML) tools on large, highly imbalanced healthcare datasets.

Materials and Methods

We generated a large dataset using historical de-identified administrative claims including demographic information and flags for disease codes in four different time windows prior to 2019. We then trained three AutoML tools on this dataset to predict six different disease outcomes in 2019 and evaluated model performances on several metrics.

Results

The AutoML tools showed improvement from the baseline random forest model but did not differ significantly from each other. All models recorded low area under the precision-recall curve and failed to predict true positives while keeping the true negative rate high. Model performance was not directly related to prevalence. We provide a specific use-case to illustrate how to select a threshold that gives the best balance between true and false positive rates, as this is an important consideration in medical applications.

Discussion

Healthcare datasets present several challenges for AutoML tools, including large sample size, high imbalance, and limitations in the available features. Improvements in scalability, combinations of imbalance-learning resampling and ensemble approaches, and curated feature selection are possible next steps to achieve better performance.

Conclusion

Among the three explored, no AutoML tool consistently outperforms the rest in terms of predictive performance. The performances of the models in this study suggest that there may be room for improvement in handling medical claims data. Finally, selection of the optimal prediction threshold should be guided by the specific practical application.

Supplementary Information

The online version contains supplementary material available at (10.1186/s13040-022-00300-2).

Collapse

Tappeiner E, Welk M, Schubert R. Tackling the class imbalance problem of deep learning-based head and neck organ segmentation. Int J Comput Assist Radiol Surg 2022;17:2103-2111. [PMID: 35578086 PMCID: PMC9515025 DOI: 10.1007/s11548-022-02649-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Accepted: 04/20/2022] [Indexed: 12/03/2022]

Abstract

Purpose

The segmentation of organs at risk (OAR) is a required precondition for the cancer treatment with image- guided radiation therapy. The automation of the segmentation task is therefore of high clinical relevance. Deep learning (DL)-based medical image segmentation is currently the most successful approach, but suffers from the over-presence of the background class and the anatomically given organ size difference, which is most severe in the head and neck (HAN) area.

Methods

To tackle the HAN area-specific class imbalance problem, we first optimize the patch size of the currently best performing general-purpose segmentation framework, the nnU-Net, based on the introduced class imbalance measurement, and second introduce the class adaptive Dice loss to further compensate for the highly imbalanced setting.

Results

Both the patch size and the loss function are parameters with direct influence on the class imbalance, and their optimization leads to a 3% increase in the Dice score and 22% reduction in the 95% Hausdorff distance compared to the baseline, finally reaching \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0.8\pm 0.15$$\end{document}0.8±0.15 and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$3.17\pm 1.7$$\end{document}3.17±1.7 mm for the segmentation of seven HAN organs using a single and simple neural network.

Conclusion

The patch size optimization and the class adaptive Dice loss are both simply integrable in current DL-based segmentation approaches and allow to increase the performance for class imbalance segmentation tasks.

Collapse

Naga D, Muster W, Musvasva E, Ecker GF. Off-targetP ML: an open source machine learning framework for off-target panel safety assessment of small molecules. J Cheminform 2022;14:27. [PMID: 35525988 PMCID: PMC9077900 DOI: 10.1186/s13321-022-00603-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Accepted: 03/26/2022] [Indexed: 11/10/2022] Open

Huynh T, Nibali A, He Z. Semi-supervised learning for medical image classification using imbalanced training data. Comput Methods Programs Biomed 2022;216:106628. [PMID: 35101700 DOI: 10.1016/j.cmpb.2022.106628] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 12/20/2021] [Accepted: 01/07/2022] [Indexed: 06/14/2023]

Abstract

BACKGROUND AND OBJECTIVE

Medical image classification is often challenging for two reasons: a lack of labelled examples due to expensive and time-consuming annotation protocols, and imbalanced class labels due to the relative scarcity of disease-positive individuals in the wider population. Semi-supervised learning methods exist for dealing with a lack of labels, but they generally do not address the problem of class imbalance. Hence, the purpose of this study is to explore a new approach to perturbation-based semi-supervised learning which tackles the problem of applying semi-supervised learning to medical image classification with imbalanced training data.

METHODS

In this study we propose Adaptive Blended Consistency Loss (ABCL), a simple yet effective drop-in replacement for consistency loss in perturbation-based semi-supervised learning methods. ABCL counteracts data skew by adaptively mixing the target class distribution of the consistency loss in accordance with class frequency. Our proposed method is evaluated and compared with existing methods on two different imbalanced medical image classification datasets. An ablation study is also provided to analyse the properties and effectiveness of our proposed method.

RESULTS

Our experiments with ABCL reveal improvements to unweighted average recall (UAR) when compared with existing consistency losses that are not designed to counteract class imbalance and other existing methods. Our proposed ABCL method is able to improve the performance of the baseline consistency loss approach from 0.59 to 0.67 UAR and outperforms methods that address the class imbalance problem for labelled data (between 0.51 and 0.59 UAR) and for unlabelled data (0.61 UAR) on the imbalanced skin cancer dataset. On the imbalanced retinal fundus glaucoma dataset, ABCL (combined with Weighted Cross Entropy loss) achieves 0.67 UAR, which is an improvement over the best existing approach (0.57 UAR).

CONCLUSIONS

Overall the results show the effectiveness of ABCL to alleviate the class imbalance problem for semi-supervised classification for medical images.

Collapse

Jiao J, Du Y, Li X, Guo Y, Ren Y, Wang Y. Prenatal prediction of neonatal respiratory morbidity: a radiomics method based on imbalanced few-shot fetal lung ultrasound images. BMC Med Imaging 2022;22:2. [PMID: 34983431 PMCID: PMC8725479 DOI: 10.1186/s12880-021-00731-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Accepted: 12/30/2021] [Indexed: 11/10/2022] Open

Pes B, Lai G. Cost-sensitive learning strategies for high-dimensional and imbalanced data: a comparative study. PeerJ Comput Sci 2021;7:e832. [PMID: 35036539 PMCID: PMC8725666 DOI: 10.7717/peerj-cs.832] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Accepted: 12/06/2021] [Indexed: 05/28/2023]

Yeung M, Sala E, Schönlieb CB, Rundo L. Unified Focal loss: Generalising Dice and cross entropy-based losses to handle class imbalanced medical image segmentation. Comput Med Imaging Graph 2021;95:102026. [PMID: 34953431 PMCID: PMC8785124 DOI: 10.1016/j.compmedimag.2021.102026] [Citation(s) in RCA: 78] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Revised: 11/18/2021] [Accepted: 12/04/2021] [Indexed: 12/18/2022]

Peng L, Yuan R, Shen L, Gao P, Zhou L. LPI-EnEDT: an ensemble framework with extra tree and decision tree classifiers for imbalanced lncRNA-protein interaction data classification. BioData Min 2021;14:50. [PMID: 34861891 PMCID: PMC8642957 DOI: 10.1186/s13040-021-00277-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 08/22/2021] [Indexed: 12/14/2022] Open

Han S, Williamson BD, Fong Y. Improving random forest predictions in small datasets from two-phase sampling designs. BMC Med Inform Decis Mak 2021;21:322. [PMID: 34809631 PMCID: PMC8607560 DOI: 10.1186/s12911-021-01688-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Accepted: 11/10/2021] [Indexed: 11/10/2022] Open

Ding S, Wu Z, Zheng Y, Liu Z, Yang X, Yang X, Yuan G, Xie J. Deep attention branch networks for skin lesion classification. Comput Methods Programs Biomed 2021;212:106447. [PMID: 34678529 DOI: 10.1016/j.cmpb.2021.106447] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Accepted: 09/28/2021] [Indexed: 06/13/2023]

Abstract

BACKGROUND AND OBJECTIVE

The skin lesion usually covers a small region of the dermoscopy image, and the lesions of different categories might own high similarities. Therefore, it is essential to design an elaborate network for accurate skin lesion classification, which can focus on semantically meaningful lesion parts. Although the Class Activation Mapping (CAM) shows good localization capability of highlighting the discriminative parts, it cannot be obtained in the forward propagation process.

METHODS

We propose a Deep Attention Branch Network (DABN) model, which introduces the attention branches to expand the conventional Deep Convolutional Neural Networks (DCNN). The attention branch is designed to obtain the CAM in the training stage, which is then utilized as an attention map to make the network focus on discriminative parts of skin lesions. DABN is applicable to multiple DCNN structures and can be trained in an end-to-end manner. Moreover, a novel Entropy-guided Loss Weighting (ELW) strategy is designed to counter class imbalance influence in the skin lesion datasets.

RESULTS

The proposed method achieves an Average Precision (AP) of 0.719 on the ISIC-2016 dataset and an average area under the ROC curve (AUC) of 0.922 on the ISIC-2017 dataset. Compared with other state-of-the-art methods, our method obtains better performance without external data and ensemble learning. Moreover, extensive experiments demonstrate that it can be applied to multi-class classification tasks and improves mean sensitivity by more than 2.6% in different DCNN structures.

CONCLUSIONS

The proposed method can adaptively focus on the discriminative regions of dermoscopy images and allows for effective training when facing class imbalance, leading to the performance improvement of skin lesion classification, which could also be applied to other clinical applications.

Collapse

Huang D, Wang M, Zhang L, Li H, Ye M, Li A. Learning rich features with hybrid loss for brain tumor segmentation. BMC Med Inform Decis Mak 2021;21:63. [PMID: 34330265 PMCID: PMC8323198 DOI: 10.1186/s12911-021-01431-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Accepted: 02/09/2021] [Indexed: 11/10/2022] Open

Wang YC, Cheng CH. A multiple combined method for rebalancing medical data with class imbalances. Comput Biol Med 2021;134:104527. [PMID: 34091384 DOI: 10.1016/j.compbiomed.2021.104527] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 05/24/2021] [Accepted: 05/25/2021] [Indexed: 11/24/2022]

Deng L, Yang B, Kang Z, Yang S, Wu S. A noisy label and negative sample robust loss function for DNN-based distant supervised relation extraction. Neural Netw 2021;139:358-70. [PMID: 33901772 DOI: 10.1016/j.neunet.2021.03.030] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2020] [Revised: 03/08/2021] [Accepted: 03/19/2021] [Indexed: 11/20/2022]

Harerimana G, Kim JW, Jang B. A deep attention model to forecast the Length Of Stay and the in-hospital mortality right on admission from ICD codes and demographic data. J Biomed Inform 2021;118:103778. [PMID: 33872817 DOI: 10.1016/j.jbi.2021.103778] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Revised: 03/15/2021] [Accepted: 04/06/2021] [Indexed: 11/28/2022]

Abstract

Leveraging the Electronic Health Records (EHR) longitudinal data to produce actionable clinical insights has always been a critical issue for recent studies. Non-forecasted extended hospitalizations account for a disproportionate amount of resource use, the mediocre quality of inpatient care, and avoidable fatalities. The capability to predict the Length of Stay (LoS) and mortality in the early stages of the admission provides opportunities to improve care and prevent many preventable losses. Forecasting the in-hospital mortality is important in providing clinicians with enough insights to make decisions and hospitals to allocate resources, hence predicting the LoS and mortality within the first day of admission is a difficult but a paramount endeavor. The biggest challenge is that few data are available by this time, thus the prediction has to bring in the previous admissions history and free text diagnosis that are recorded immediately on admission. We propose a model that uses the multi-modal EHR structured medical codes and key demographic information to classify the LoS in 3 classes; Short Los (LoS⩽10 days), Medium LoS (10<LoS⩽30 days) and Long LoS (LoS>30 days) as well as mortality as a binary classification of a patient's death during current admission. The prediction has to use data available only within 24 h of admission. The key predictors include previous ICD9 diagnosis codes, ICD9 procedures, key demographic data, and free text diagnosis of the current admission recorded right on admission. We propose a Hierarchical Attention Network (HAN-LoS and HAN-Mor) model and train it to a dataset of over 45321 admissions recorded in the de-identified MIMIC-III dataset. For improved prediction, our attention mechanisms can focus on the most influential past admissions and most influential codes in these admissions. For fair performance evaluation, we implemented and compared the HAN model with previous approaches. With dataset balancing techniques HAN-LoS achieved an AUROC of over 0.82 and a Micro-F1 score of 0.24 and HAN-Mor achieved AUC-ROC of 0.87 hence outperforming the existing baselines that use structured medical codes as well as clinical time series for LoS and Mortality forecasting. By predicting mortality and LoS using the same model, we show that with little tuning the proposed model can be used for other clinical predictive tasks like phenotyping, decompensation,re-admission prediction, and survival analysis.

Collapse

Yahaya M, Guo R, Jiang X, Bashir K, Matara C, Xu S. Ensemble-based model selection for imbalanced data to investigate the contributing factors to multiple fatality road crashes in Ghana. Accid Anal Prev 2021;151:105851. [PMID: 33383521 DOI: 10.1016/j.aap.2020.105851] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 09/25/2020] [Accepted: 10/16/2020] [Indexed: 06/12/2023]

Abstract

The study aims to identify relevant variables to improve the prediction performance of the crash injury severity (CIS) classification model. Unfortunately, the CIS database is invariably characterized by the class imbalance. For instance, the samples of multiple fatal injury (MFI) severity class are typically rare as opposed to other classes. The imbalance phenomenon may introduce a prediction bias in favour of the majority class and affect the quality of the learning algorithm. The paper proposes an ensemble-based variable ranking scheme that incorporates the data resampling. At the data pre-processing level, majority weighted minority oversampling (MWMOTE) is employed to treat the imbalanced training data. Ensemble of classifiers induced from the balanced data is used to evaluate and rank the individual variables according to their importance to the injury severity prediction. The relevant variables selected are then applied to the balanced data to form a training set for the CIS classification modelling. An empirical comparison is conducted through considering the variable ranking by: 1) the learning of single inductive algorithm with imbalanced data where the relevant variables are applied to the imbalanced data to form the training data; 2) the learning of single inductive algorithm with MWMOTE data and the relevant variables identified are applied to the balanced data to form the training data; and 3) the learning of ensembles with imbalanced data where the relevant variables identified are applied to the imbalanced data to form the training data. Bayesian Networks (BNs) classifiers are then developed for each ranking method, where nested subsets of the top ranked variables are adopted. The model predictions are captured in four performance indicators in the comparative study. Based on three-year (2014-2016) crash data in Ghana, the empirical results show that the proposed method is effective to identify the most prolific predictors of the CIS level. Finally, based on the inference results of BNs developed on the best subset, the study offers the most probable explanations to the occurrence of MFI crashes in Ghana.

Collapse

Wang D, Zhang X, Chen H, Zhou Y, Cheng F. Sintering conditions recognition of rotary kiln based on kernel modification considering class imbalance. ISA Trans 2020;106:271-282. [PMID: 32674852 DOI: 10.1016/j.isatra.2020.07.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2019] [Revised: 07/06/2020] [Accepted: 07/07/2020] [Indexed: 06/11/2023]

Qu W, Balki I, Mendez M, Valen J, Levman J, Tyrrell PN. Assessing and mitigating the effects of class imbalance in machine learning with application to X-ray imaging. Int J Comput Assist Radiol Surg 2020;15:2041-8. [PMID: 32965624 DOI: 10.1007/s11548-020-02260-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Accepted: 09/04/2020] [Indexed: 10/23/2022]

Ashraf S, Saleem S, Ahmed T, Aslam Z, Muhammad D. Conversion of adverse data corpus to shrewd output using sampling metrics. Vis Comput Ind Biomed Art 2020;3:19. [PMID: 32779031 PMCID: PMC7417470 DOI: 10.1186/s42492-020-00055-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Accepted: 07/24/2020] [Indexed: 11/11/2022] Open

Sleeman Iv WC, Nalluri J, Syed K, Ghosh P, Krawczyk B, Hagan M, Palta J, Kapoor R. A Machine Learning method for relabeling arbitrary DICOM structure sets to TG-263 defined labels. J Biomed Inform 2020;109:103527. [PMID: 32777484 DOI: 10.1016/j.jbi.2020.103527] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Revised: 07/11/2020] [Accepted: 08/02/2020] [Indexed: 10/23/2022]

Abstract

PURPOSE

To present a Machine Learning pipeline for automatically relabeling anatomical structure sets in the Digital Imaging and Communications in Medicine (DICOM) format to a standard nomenclature that will enable data abstraction for research and quality improvement.

METHODS

DICOM structure sets from approximately 1200 lung and prostate cancer patients across 40 treatment centers were used to build predictive models to automate the relabeling of clinically specified structure labels to standardized labels as defined by the American Association of Physics in Medicine's (AAPM) Task Group 263 (TG-263). Volumetric bitmaps were created based on the delineated volumes and were combined with associated bony anatomy data to build feature vectors. Feature reduction was performed with singular value decomposition and the resulting vectors were used for predicting the label of each structure using five different classifier algorithms on the Apache Spark platform with 5-fold cross-validation. Undersampling methods were used to deal with underlying class imbalance that hindered the performance of classifiers. Experiments were performed on both a curated version of the data, which included only annotated structures, and the non-curated data that included all structures from the original treatment plans.

RESULTS

Random Forest provided the highest accuracies with F₁ scores of 98.77 for lung and 95.06 for prostate on the curated data sets. Scores were lower with 95.67 for lung and 90.22 for prostate on the non-curated data sets, highlighting some of the challenges of classifying real clinical data. Including bony anatomy data and pooling information from all structures for the same patient both increased accuracies. In some cases, undersampling with k-Means clustering for class balancing improved classifier accuracy but in all experiments it significantly reduced run time compared to random undersampling.

CONCLUSION

This work shows that structure sets can be relabeled using our approach with accuracies over 95% for many structure types when presented with curated data. Although accuracies dropped when using the full non-curated data sets, some structure types were still correctly labeled over 90% of the time. With similar results obtained on an external test data set, we can infer that the proposed models are likely to work on other clinical data sets.

Collapse