1
|
Li S, Zhang B, Song J, Ruan G, Wang C, Xie J. Graph Neural Networks with Coarse- and Fine-Grained Division for mitigating label noise and sparsity. Neural Netw 2025; 187:107338. [PMID: 40086132 DOI: 10.1016/j.neunet.2025.107338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2024] [Revised: 01/02/2025] [Accepted: 02/28/2025] [Indexed: 03/16/2025]
Abstract
Graph Neural Networks (GNNs) have gained considerable prominence in semi-supervised learning tasks in processing graph-structured data, primarily owing to their message-passing mechanism, which largely relies on the availability of clean labels. However, in real-world scenarios, labels on nodes of graphs are inevitably noisy and sparsely labeled, significantly degrading the performance of GNNs. Exploring robust GNNs for semi-supervised node classification in the presence of noisy and sparse labels remains a critical challenge. Therefore, we propose a novel Graph Neural Network with Coarse- and Fine-Grained Division for mitigating label sparsity and noise, namely GNN-CFGD. The key idea of GNN-CFGD is reducing the negative impact of noisy labels via coarse- and fine-grained division, along with graph reconstruction. Specifically, we first investigate the effectiveness of linking unlabeled nodes to cleanly labeled nodes, demonstrating that this approach is more effective in combating labeling noise than linking to potentially noisy labeled nodes. Based on this observation, we introduce a Gaussian Mixture Model (GMM) based on the memory effect to perform a coarse-grained division of the given labels into clean and noisy labels. Next, we propose a clean labels oriented link that connects unlabeled nodes to cleanly labeled nodes, aimed at mitigating label sparsity and promoting supervision propagation. Furthermore, to provide refined supervision for noisy labeled nodes and additional supervision for unlabeled nodes, we fine-grain the noisy labeled and unlabeled nodes into two candidate sets based on confidence, respectively. Extensive experiments on various datasets demonstrate the superior effectiveness and robustness of GNN-CFGD.
Collapse
Affiliation(s)
- Shuangjie Li
- State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China; Department of Computer Science and Technology, Nanjing University, Nanjing, 210023, China.
| | - Baoming Zhang
- State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China; Department of Computer Science and Technology, Nanjing University, Nanjing, 210023, China
| | - Jianqing Song
- State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China; Department of Computer Science and Technology, Nanjing University, Nanjing, 210023, China
| | - Gaoli Ruan
- State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China; Department of Computer Science and Technology, Nanjing University, Nanjing, 210023, China
| | - Chongjun Wang
- State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China; Department of Computer Science and Technology, Nanjing University, Nanjing, 210023, China.
| | - Junyuan Xie
- State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China; Department of Computer Science and Technology, Nanjing University, Nanjing, 210023, China
| |
Collapse
|
2
|
Pan X, Gu Y, Zhou W, Zhang Y. Enhancing Transthyretin Binding Affinity Prediction with a Consensus Model: Insights from the Tox24 Challenge. Chem Res Toxicol 2025. [PMID: 40285676 DOI: 10.1021/acs.chemrestox.4c00560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2025]
Abstract
Transthyretin (TTR) plays a vital role in thyroid hormone transport and homeostasis in both the blood and target tissues. Interactions between exogenous compounds and TTR can disrupt the function of the endocrine system, potentially causing toxicity. In the Tox24 challenge, we leveraged the data set provided by the organizers to develop a deep learning-based consensus model, integrating sPhysNet, KANO, and GGAP-CPI for predicting TTR binding affinity. Each model utilized distinct levels of molecular information, including 2D topology, 3D geometry, and protein-ligand interactions. Our consensus model achieved favorable performance on the blind test set, yielding an RMSE of 20.8 and ranking fifth among all submissions. Following the release of the blind test set, we incorporated the leaderboard test set into our training data, further reducing the RMSE to 20.6 in an offlineretrospective study. These results demonstrate that combining three regression models across different modalities significantly enhances the predictive accuracy. Furthermore, we employ the standard deviation of the consensus model's ensemble outputs as an uncertainty estimate. Our analysis reveals that both the RMSE and interval error of predictions increase with rising uncertainty, indicating that the uncertainty can serve as a useful measure of prediction confidence. We believe that this consensus model can be a valuable resource for identifying potential TTR binders and predicting their binding affinity in silico. The source code for data preparation, model training, and prediction can be accessed at https://github.com/xiaolinpan/tox24_challenge_submission_yingkai_lab.
Collapse
Affiliation(s)
- Xiaolin Pan
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Yaowen Gu
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Weijun Zhou
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department of Chemistry, New York University, New York, New York 10003, United States
- Simons Center for Computational Physical Chemistry at New York University, New York, New York 10003, United States
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
3
|
Takasan K, Iiyama M. Improving fishing ground estimation with weak supervision and meta-learning. PLoS One 2025; 20:e0321116. [PMID: 40215460 PMCID: PMC11991730 DOI: 10.1371/journal.pone.0321116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Accepted: 02/28/2025] [Indexed: 04/14/2025] Open
Abstract
Estimating fishing grounds is an important task in the fishing industry. This study modeled the fisher's decision-making process based on sea surface temperature patterns as a pattern recognition task. We used a deep learning-based keypoint detector to estimate fishing ground locations from these patterns. However, training the model required catch data for annotation, the amount of which was limited. To address this, we proposed a training strategy that combines weak supervision and meta-learning to estimate fishing grounds. Weak supervision involves using partially annotated or noisy data, where the labels are incomplete or imprecise. In our case, catch data cover only a subset of fishing grounds, and trajectory data, which are readily available and larger in volume than catch data, provide imprecise representations of fishing grounds. Meta-learning helps the model adapt to the noise by refining its learning rate during training. Our approach involved pre-training with trajectory data and fine-tuning with catch data, with a meta-learner further mitigating label noise during pre-training. Experimental results showed that our method improved the F1-score by 64% compared to the baseline using only catch data, demonstrating the effectiveness of pre-training and meta-learning.
Collapse
Affiliation(s)
- Kazuki Takasan
- Graduate School of Data Science, Shiga University, Hikone, Shiga, Japan
| | - Masaaki Iiyama
- Graduate School of Data Science, Shiga University, Hikone, Shiga, Japan
| |
Collapse
|
4
|
Chen ZH, Zha HL, Yao Q, Zhang WB, Zhou GQ, Li CY. Predicting Pathological Characteristics of HER2-Positive Breast Cancer from Ultrasound Images: a Deep Ensemble Approach. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2025; 38:850-857. [PMID: 39187701 PMCID: PMC11950582 DOI: 10.1007/s10278-024-01229-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 08/04/2024] [Accepted: 08/05/2024] [Indexed: 08/28/2024]
Abstract
The objective is to evaluate the feasibility of utilizing ultrasound images in identifying critical prognostic biomarkers for HER2-positive breast cancer (HER2 + BC). This study enrolled 512 female patients diagnosed with HER2-positive breast cancer through pathological validation at our institution from January 2016 to December 2021. Five distinct deep convolutional neural networks (DCNNs) and a deep ensemble (DE) approach were trained to classify axillary lymph node involvement (ALNM), lymphovascular invasion (LVI), and histological grade (HG). The efficacy of the models was evaluated based on accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), receiver operating characteristic (ROC) curves, areas under the ROC curve (AUCs), and heat maps. DeLong test was applied to compare differences in AUC among different models. The deep ensemble approach, as the most effective model, demonstrated AUCs and accuracy of 0.869 (95% CI: 0.802-0.936) and 69.7% in LVI, 0.973 (95% CI: 0.949-0.998) and 73.8% in HG, thus providing superior classification performance in the context of imbalanced data (p < 0.05 by the DeLong test). On ALNM, AUC and accuracy were 0.780 (95% CI: 0.688-0.873) and 77.5%, which were comparable to other single models. The pretreatment US-based DE model could hold promise as a clinical guidance for predicting pathological characteristics of patients with HER2-positive breast cancer, thereby providing benefit of facilitating timely adjustments in treatment strategies.
Collapse
Affiliation(s)
- Zhi-Hui Chen
- Department of Ultrasound, Affiliated Hangzhou First People's Hospital, Westlake University School of Medicine, No. 261, Huansha Road, Shangcheng district, Hangzhou, 310006, China
| | - Hai-Ling Zha
- Department of Ultrasound, The First Affiliated Hospital of Nanjing Medical University, No. 300 Guangzhou Road, Nanjing, 210029, China
| | - Qing Yao
- Department of Ultrasound, The First Affiliated Hospital of Nanjing Medical University, No. 300 Guangzhou Road, Nanjing, 210029, China
| | - Wen-Bo Zhang
- Jiangsu Key Laboratory of Biomaterials and Devices, State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, No. 2 Sipailou Road, Nanjing, 210096, China
| | - Guang-Quan Zhou
- Jiangsu Key Laboratory of Biomaterials and Devices, State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, No. 2 Sipailou Road, Nanjing, 210096, China.
| | - Cui-Ying Li
- Department of Ultrasound, The First Affiliated Hospital of Nanjing Medical University, No. 300 Guangzhou Road, Nanjing, 210029, China.
| |
Collapse
|
5
|
Jiao L, Wang M, Liu X, Li L, Liu F, Feng Z, Yang S, Hou B. Multiscale Deep Learning for Detection and Recognition: A Comprehensive Survey. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:5900-5920. [PMID: 38652624 DOI: 10.1109/tnnls.2024.3389454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2024]
Abstract
Recently, the multiscale problem in computer vision has gradually attracted people's attention. This article focuses on multiscale representation for object detection and recognition, comprehensively introduces the development of multiscale deep learning, and constructs an easy-to-understand, but powerful knowledge structure. First, we give the definition of scale, explain the multiscale mechanism of human vision, and then lead to the multiscale problem discussed in computer vision. Second, advanced multiscale representation methods are introduced, including pyramid representation, scale-space representation, and multiscale geometric representation. Third, the theory of multiscale deep learning is presented, which mainly discusses the multiscale modeling in convolutional neural networks (CNNs) and Vision Transformers (ViTs). Fourth, we compare the performance of multiple multiscale methods on different tasks, illustrating the effectiveness of different multiscale structural designs. Finally, based on the in-depth understanding of the existing methods, we point out several open issues and future directions for multiscale deep learning.
Collapse
|
6
|
Werthen-Brabants L, Castillo-Escario Y, Groenendaal W, Jane R, Dhaene T, Deschrijver D. Deep Learning-Based Event Counting for Apnea-Hypopnea Index Estimation Using Recursive Spiking Neural Networks. IEEE Trans Biomed Eng 2025; 72:1306-1315. [PMID: 40030371 DOI: 10.1109/tbme.2024.3498097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/22/2025]
Abstract
OBJECTIVE To develop a novel method for improved screening of sleep apnea in home environments, focusing on reliable estimation of the Apnea-Hypopnea Index (AHI) without the need for highly precise event localization. METHODS RSN-Count is introduced, a technique leveraging Spiking Neural Networks to directly count apneic events in recorded signals. This approach aims to reduce dependence on the exact time-based pinpointing of events, a potential source of variability in conventional analysis. RESULTS RSN-Count demonstrates a superior ability to quantify apneic events (AHI MAE ) compared to established methods (AHI MAE ) on a dataset of whole-night audio and SpO recordings (N = 33). This is particularly valuable for accurate AHI estimation, even in the absence of highly precise event localization. CONCLUSION RSN-Count offers a promising improvement in sleep apnea screening within home settings. Its focus on event quantification enhances AHI estimation accuracy. SIGNIFICANCE This method addresses limitations in current sleep apnea diagnostics, potentially increasing screening accuracy and accessibility while reducing dependence on costly and complex polysomnography.
Collapse
|
7
|
Nie H, Fan S, Liu Y, Yao Q, Wang Z. Using samples with label noise for robust continual learning. Neural Netw 2025; 188:107422. [PMID: 40184866 DOI: 10.1016/j.neunet.2025.107422] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Revised: 02/21/2025] [Accepted: 03/18/2025] [Indexed: 04/07/2025]
Abstract
Recent studies have shown that effectively leveraging samples with label noise can enhance model robustness by uncovering more reliable feature patterns. While existing methods, such as label correction methods and loss correction techniques, have demonstrated success in utilizing noisy labels, they assume that noisy and clean samples (samples with correct annotations) share the same label space.However, this assumption does not hold in continual machine learning, where new categories and tasks emerge over time, leading to label shift problems that are specific to this setting. As a result, existing methods may struggle to accurately estimate the ground truth labels for noisy samples in such dynamic environments, potentially exacerbating label noise and further degrading performance. To address this critical gap, we propose a Shift-Adaptive Noise Utilization (SANU) method, designed to transform samples with label noise into usable samples for continual learning. SANU introduces a novel source detection mechanism that identifies the appropriate label space for noisy samples, leveraging a meta-knowledge representation module to improve the generalization of the detection process. By re-annotating noisy samples through label guessing and label generation strategies, SANU adapts to label shifts, turning noisy data into useful inputs for training. Experimental results across three continual learning datasets demonstrate that SANU effectively mitigates the label shift problem, significantly enhancing model performance by utilizing re-annotated samples with label noise.
Collapse
Affiliation(s)
- Hongyi Nie
- School of Mechanical Engineering, Northwestern Polytechnical University, Xi'an, 710072, Shaanxi, China; Shenzhen Research Institute, Northwestern Polytechnical University, Shenzhen, 518057, Guangdong, China
| | - Shiqi Fan
- School of Cybersecurity, Northwestern Polytechnical University, Xi'an, 710072, Shaanxi, China
| | - Yang Liu
- School of Artificial Intelligence, Optics and Electronics, Northwestern Polytechnical University, Xi'an, 710072, Shaanxi, China; Shenzhen Research Institute, Northwestern Polytechnical University, Shenzhen, 518057, Guangdong, China
| | - Quanming Yao
- Department of Electronic Engineering, Tsinghua University, BeiJing, 100084, China
| | - Zhen Wang
- School of Mechanical Engineering, Northwestern Polytechnical University, Xi'an, 710072, Shaanxi, China.
| |
Collapse
|
8
|
Zhang S, Ren X, Qiang Y, Zhao J, Qiao Y, Yue H. Dual-threshold sample selection with latent tendency difference for label-noise-robust pneumoconiosis staging. JOURNAL OF X-RAY SCIENCE AND TECHNOLOGY 2025:8953996251319652. [PMID: 40130489 DOI: 10.1177/08953996251319652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/26/2025]
Abstract
BackgroundThe precise pneumoconiosis staging suffers from progressive pair label noise (PPLN) in chest X-ray datasets, because adjacent stages are confused due to unidentifialble and diffuse opacities in the lung fields. As deep neural networks are employed to aid the disease staging, the performance is degraded under such label noise.ObjectiveThis study improves the effectiveness of pneumoconiosis staging by mitigating the impact of PPLN through network architecture refinement and sample selection mechanism adjustment.MethodsWe propose a novel multi-branch architecture that incorporates the dual-threshold sample selection. Several auxiliary branches are integrated in a two-phase module to learn and predict the progressive feature tendency. A novel difference-based metric is introduced to iteratively obtained the instance-specific thresholds as a complementary criterion of dynamic sample selection. All the samples are finally partitioned into clean and hard sets according to dual-threshold criteria and treated differently by loss functions with penalty terms.ResultsCompared with the state-of-the-art, the proposed method obtains the best metrics (accuracy: 90.92%, precision: 84.25%, sensitivity: 81.11%, F1-score: 82.06%, and AUC: 94.64%) under real-world PPLN, and is less sensitive to the rise of synthetic PPLN rate. An ablation study validates the respective contributions of critical modules and demonstrates how variations of essential hyperparameters affect model performance.ConclusionsThe proposed method achieves substantial effectiveness and robustness against PPLN in pneumoconiosis dataset, and can further assist physicians in diagnosing the disease with a higher accuracy and confidence.
Collapse
Affiliation(s)
- Shuming Zhang
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, China
| | - Xueting Ren
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, China
| | - Yan Qiang
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, China
- School of Software, North University of China, Taiyuan, China
| | - Juanjuan Zhao
- College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan, China
- School of Software, Taiyuan University of Technology, Taiyuan, China
- College of Information, Jinzhong College of Information, Jinzhong, China
| | - Ying Qiao
- First Hospital of Shanxi Medical University, Taiyuan, China
| | - Huajie Yue
- First Hospital of Shanxi Medical University, Taiyuan, China
| |
Collapse
|
9
|
Patt E, Classen S, Hammel M, Schneidman-Duhovny D. Predicting RNA structure and dynamics with deep learning and solution scattering. Biophys J 2025; 124:549-564. [PMID: 39722452 PMCID: PMC11866959 DOI: 10.1016/j.bpj.2024.12.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2024] [Revised: 09/15/2024] [Accepted: 12/23/2024] [Indexed: 12/28/2024] Open
Abstract
Advanced deep learning and statistical methods can predict structural models for RNA molecules. However, RNAs are flexible, and it remains difficult to describe their macromolecular conformations in solutions where varying conditions can induce conformational changes. Small-angle x-ray scattering (SAXS) in solution is an efficient technique to validate structural predictions by comparing the experimental SAXS profile with those calculated from predicted structures. There are two main challenges in comparing SAXS profiles to RNA structures: the absence of cations essential for stability and charge neutralization in predicted structures and the inadequacy of a single structure to represent RNA's conformational plasticity. We introduce a solution conformation predictor for RNA (SCOPER) to address these challenges. This pipeline integrates kinematics-based conformational sampling with the innovative deep learning model, IonNet, designed for predicting Mg2+ ion binding sites. Validated through benchmarking against 14 experimental data sets, SCOPER significantly improved the quality of SAXS profile fits by including Mg2+ ions and sampling of conformational plasticity. We observe that an increased content of monovalent and bivalent ions leads to decreased RNA plasticity. Therefore, carefully adjusting the plasticity and ion density is crucial to avoid overfitting experimental SAXS data. SCOPER is an efficient tool for accurately validating the solution state of RNAs given an initial, sufficiently accurate structure and provides the corrected atomistic model, including ions.
Collapse
Affiliation(s)
- Edan Patt
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Scott Classen
- Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Laboratory, Berkeley, California
| | - Michal Hammel
- Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Laboratory, Berkeley, California.
| | - Dina Schneidman-Duhovny
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel.
| |
Collapse
|
10
|
Fan J, Huang L, Gong C, You Y, Gan M, Wang Z. KMT-PLL: K-Means Cross-Attention Transformer for Partial Label Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:2789-2800. [PMID: 38194387 DOI: 10.1109/tnnls.2023.3347792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2024]
Abstract
Partial label learning (PLL) studies the problem of learning instance classification with a set of candidate labels and only one is correct. While recent works have demonstrated that the Vision Transformer (ViT) has achieved good results when training from clean data, its applications to PLL remain limited and challenging. To address this issue, we rethink the relationship between instances and object queries to propose K-means cross-attention transformer for PLL (KMT-PLL), which can continuously learn cluster centers and be used for downstream disambiguation tasks. More specifically, K-means cross-attention as a clustering process can effectively learn the cluster centers to represent label classes. The purpose of this operation is to make the similarity between instances and labels measurable, which can effectively detect noise labels. Furthermore, we propose a new corrected cross entropy formulation, which can assign weights to candidate labels according to the instance-to-label relevance to guide the training of the instance classifier. As the training goes on, the ground-truth label is progressively identified, and the refined labels and cluster centers in turn help to improve the classifier. Simulation results demonstrate the advantage of the KMT-PLL and its suitability for PLL.
Collapse
|
11
|
Gu S, Xu C, Hu D, Hou C. Adaptive Learning for Dynamic Features and Noisy Labels. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:1219-1237. [PMID: 39480720 DOI: 10.1109/tpami.2024.3489217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2024]
Abstract
Applying current machine learning algorithms in complex and open environments remains challenging, especially when different changing elements are coupled and the training data is scarce. For example, in the activity recognition task, the motion sensors may change position or fall off due to the intensity of the activity, leading to changes in feature space and finally resulting in label noise. Learning from such a problem where the dynamic features are coupled with noisy labels is crucial but rarely studied, particularly when the noisy samples in new feature space are limited. In this paper, we tackle the above problem by proposing a novel two-stage algorithm, called Adaptive Learning for Dynamic features and Noisy labels (ALDN). Specifically, optimal transport is first modified to map the previously learned heterogeneous model to the prior model of the current stage. Then, to fully reuse the mapped prior model, we add a simple yet efficient regularizer as the consistency constraint to assist both the estimation of the noise transition matrix and the model training in the current stage. Finally, two implementations with direct (ALDN-D) and indirect (ALDN-ID) constraints are illustrated for better investigation. More importantly, we provide theoretical guarantees for risk minimization of ALDN-D and ALDN-ID. Extensive experiments validate the effectiveness of the proposed algorithms.
Collapse
|
12
|
He Y, Chen W, Wang S, Liu T, Wang M. Recalling Unknowns Without Losing Precision: An Effective Solution to Large Model-Guided Open World Object Detection. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2025; 34:729-742. [PMID: 39292592 DOI: 10.1109/tip.2024.3459589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/20/2024]
Abstract
Open World Object Detection (OWOD) aims to adapt object detection to an open-world environment, so as to detect unknown objects and learn knowledge incrementally. Existing OWOD methods typically leverage training sets with a relatively small number of known objects. Due to the absence of generic object knowledge, they fail to comprehensively perceive objects beyond the scope of training sets. Recent advancements in large vision models (LVMs), trained on extensive large-scale data, offer a promising opportunity to harness rich generic knowledge for the fundamental advancement of OWOD. Motivated by Segment Anything Model (SAM), a prominent LVM lauded for its exceptional ability to segment generic objects, we first demonstrate the possibility to employ SAM for OWOD and establish the very first SAM-Guided OWOD baseline solution. Subsequently, we identify and address two fundamental challenges in SAM-Guided OWOD and propose a pioneering SAM-Guided Robust Open-world Detector (SGROD) method, which can significantly improve the recall of unknown objects without losing the precision on known objects. Specifically, the two challenges in SAM-Guided OWOD include: 1) Noisy labels caused by the class-agnostic nature of SAM; 2) Precision degradation on known objects when more unknown objects are recalled. For the first problem, we propose a dynamic label assignment (DLA) method that adaptively selects confident labels from SAM during training, evidently reducing the noise impact. For the second problem, we introduce cross-layer learning (CLL) and SAM-based negative sampling (SNS), which enable SGROD to avoid precision loss by learning robust decision boundaries of objectness and classification. Experiments on public datasets show that SGROD not only improves the recall of unknown objects by a large margin (~20%), but also preserves highly-competitive precision on known objects. The program codes are available at https://github.com/harrylin-hyl/SGROD.
Collapse
|
13
|
Yang Y, Chen Y, Dong X, Zhang J, Long C, Jin Z, Dai Y. An annotated heterogeneous ultrasound database. Sci Data 2025; 12:148. [PMID: 39863639 PMCID: PMC11762285 DOI: 10.1038/s41597-025-04464-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Accepted: 01/13/2025] [Indexed: 01/27/2025] Open
Abstract
Ultrasound is a primary diagnostic tool commonly used to evaluate internal body structures, including organs, blood vessels, the musculoskeletal system, and fetal development. Due to challenges such as operator dependence, noise, limited field of view, difficulty in imaging through bone and air, and variability across different systems, diagnosing abnormalities in ultrasound images is particularly challenging for less experienced clinicians. The development of artificial intelligence (AI) technology could assist in the diagnosis of ultrasound images. However, many databases are created using a single device type and collection site, limiting the generalizability of machine learning models. Therefore, we have collected a large, publicly accessible ultrasound challenge database that is intended to significantly enhance the performance of AI-assisted ultrasound diagnosis. This database is derived from publicly available data on the Internet and comprises a total of 1,833 distinct ultrasound data. It includes 13 different ultrasound image anomalies, and all data have been anonymized. Our data-sharing program aims to support benchmark testing of ultrasound disease diagnosis in multi-center environments.
Collapse
Affiliation(s)
- Yuezhe Yang
- Anhui Provincial International Joint Research Center for Advanced Technology in Medical Imaging, School of Artificial Intelligence, Anhui University, Hefei, 230601, China
| | - Yonglin Chen
- School of Electronic and Information Engineering, Anhui Jianzhu University, Hefei, 230601, China
| | - Xingbo Dong
- Anhui Provincial International Joint Research Center for Advanced Technology in Medical Imaging, School of Artificial Intelligence, Anhui University, Hefei, 230601, China.
| | - Junning Zhang
- School of Public Health, Anhui University of Science and Technology, Huainan, 232001, China
| | - Chihui Long
- Department of Radiology, Wuhan Third Hospital/Tongren Hospital of Wuhan University, Wuhan, 430060, China
| | - Zhe Jin
- Anhui Provincial International Joint Research Center for Advanced Technology in Medical Imaging, School of Artificial Intelligence, Anhui University, Hefei, 230601, China
| | - Yong Dai
- School of Medicine, Anhui University of Science and Technology, Huainan, 232001, China
- The First Hospital, Anhui University of Science and Technology, Huainan, 232001, China
| |
Collapse
|
14
|
Haggenmüller S, Wies C, Abels J, Winterstein JT, Heinlein L, Nogueira Garcia C, Utikal JS, Wohlfeil SA, Meier F, Hobelsberger S, Gellrich FF, Sergon M, Hauschild A, French LE, Heinzerling L, Schlager JG, Ghoreschi K, Schlaak M, Hilke FJ, Poch G, Korsing S, Sarfert C, Berking C, Heppt MV, Erdmann M, Haferkamp S, Drexler K, Schadendorf D, Sondermann W, Goebeler M, Schilling B, Kather JN, Fröhling S, Llamas-Velasco M, Requena LC, Ferrara G, Fernandez-Figueras M, Fraitag S, Müller CSL, Starz H, Kutzner H, Barnhill R, Carr R, Resnik KS, Braun SA, Holland-Letz T, Brinker TJ. Discordance, accuracy and reproducibility study of pathologists' diagnosis of melanoma and melanocytic tumors. Nat Commun 2025; 16:789. [PMID: 39824857 PMCID: PMC11742048 DOI: 10.1038/s41467-025-56160-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Accepted: 01/10/2025] [Indexed: 01/20/2025] Open
Abstract
Accurate melanoma diagnosis is crucial for patient outcomes and reliability of AI diagnostic tools. We assess interrater variability among eight expert pathologists reviewing histopathological images and clinical metadata of 792 melanoma-suspicious lesions prospectively collected at eight German hospitals. Moreover, we provide access to the largest panel-validated dataset featuring dermoscopic and histopathological images with metadata. Complete agreement is achieved in 53.5% of cases (424/792), and a majority vote ( ≥ five pathologists) in 90.9% (720/792). Considerable discordance is observed for non-invasive melanomas (complete agreement in only 10/73 cases). The expert panel disagrees with the local pathologists' and dermatologists' diagnoses in 14.9% and 33.5% of cases, respectively. This variability highlights the diagnostic challenges of early-stage melanomas and the need to reconsider how ground truth is established in routine care and AI research. Including at least two pathologists or virtual panels may contribute to more consistent diagnostic results.
Collapse
Affiliation(s)
- Sarah Haggenmüller
- Digital Biomarkers for Oncology Group, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Christoph Wies
- Digital Biomarkers for Oncology Group, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Medical Faculty, University Heidelberg, Heidelberg, Germany
| | - Julia Abels
- Digital Biomarkers for Oncology Group, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Jana T Winterstein
- Digital Biomarkers for Oncology Group, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Medical Faculty, University Heidelberg, Heidelberg, Germany
| | - Lukas Heinlein
- Digital Biomarkers for Oncology Group, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Medical Faculty, University Heidelberg, Heidelberg, Germany
| | - Carina Nogueira Garcia
- Digital Biomarkers for Oncology Group, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Jochen S Utikal
- Department of Dermatology, Venereology and Allergology, University Medical Center Mannheim, Ruprecht-Karl University of Heidelberg, Mannheim, Germany
- Skin Cancer Unit, German Cancer Research Center (DKFZ), Heidelberg, Germany
- DKFZ Hector Cancer Institute at the University Medical Center Mannheim, Mannheim, Germany
| | - Sebastian A Wohlfeil
- Department of Dermatology, Venereology and Allergology, University Medical Center Mannheim, Ruprecht-Karl University of Heidelberg, Mannheim, Germany
- Skin Cancer Unit, German Cancer Research Center (DKFZ), Heidelberg, Germany
- DKFZ Hector Cancer Institute at the University Medical Center Mannheim, Mannheim, Germany
| | - Friedegund Meier
- Skin Cancer Center at the University Cancer Center and National Center for Tumor Diseases Dresden, Department of Dermatology, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Sarah Hobelsberger
- Skin Cancer Center at the University Cancer Center and National Center for Tumor Diseases Dresden, Department of Dermatology, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Frank F Gellrich
- Skin Cancer Center at the University Cancer Center and National Center for Tumor Diseases Dresden, Department of Dermatology, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Mildred Sergon
- Institute of Pathology, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
- Skin Cancer Center at the National Center for Tumor Diseases (NCT/UCC, Faculty of Medicine and University Hospital Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Axel Hauschild
- Department of Dermatology, University Hospital (UKSH), Kiel, Germany
| | - Lars E French
- Department of Dermatology and Allergy, University Hospital, LMU Munich, Munich, Germany
- Dr. Phillip Frost Department of Dermatology and Cutaneous Surgery, University of Miami, Miller School of Medicine, Miami, FL, USA
| | - Lucie Heinzerling
- Department of Dermatology and Allergy, University Hospital, LMU Munich, Munich, Germany
- Department of Dermatology, University Hospital Erlangen, Comprehensive Cancer Center Erlangen - European Metropolitan Region Nürnberg, CCC Alliance WERA, Erlangen, Germany
| | - Justin G Schlager
- Department of Dermatology and Allergy, University Hospital, LMU Munich, Munich, Germany
| | - Kamran Ghoreschi
- Department of Dermatology, Venereology and Allergology, Charité - Universitätsmedizin Berlin, Corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Max Schlaak
- Department of Dermatology, Venereology and Allergology, Charité - Universitätsmedizin Berlin, Corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Franz J Hilke
- Department of Dermatology, Venereology and Allergology, Charité - Universitätsmedizin Berlin, Corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Gabriela Poch
- Department of Dermatology, Venereology and Allergology, Charité - Universitätsmedizin Berlin, Corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Sören Korsing
- Department of Dermatology, Venereology and Allergology, Charité - Universitätsmedizin Berlin, Corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Cosimo Sarfert
- Department of Dermatology, Venereology and Allergology, Charité - Universitätsmedizin Berlin, Corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Carola Berking
- Department of Dermatology, University Hospital Erlangen, Comprehensive Cancer Center Erlangen - European Metropolitan Region Nürnberg, CCC Alliance WERA, Erlangen, Germany
| | - Markus V Heppt
- Department of Dermatology, University Hospital Erlangen, Comprehensive Cancer Center Erlangen - European Metropolitan Region Nürnberg, CCC Alliance WERA, Erlangen, Germany
| | - Michael Erdmann
- Department of Dermatology, University Hospital Erlangen, Comprehensive Cancer Center Erlangen - European Metropolitan Region Nürnberg, CCC Alliance WERA, Erlangen, Germany
| | - Sebastian Haferkamp
- Department of Dermatology, University Hospital Regensburg, Regensburg, Germany
| | - Konstantin Drexler
- Department of Dermatology, University Hospital Regensburg, Regensburg, Germany
| | - Dirk Schadendorf
- Department of Dermatology, Venereology and Allergology, University Hospital Essen, Essen, Germany
| | - Wiebke Sondermann
- Department of Dermatology, Venereology and Allergology, University Hospital Essen, Essen, Germany
| | - Matthias Goebeler
- Department of Dermatology, Venereology and Allergology, University Hospital Würzburg and National Center for Tumor Diseases (NCT) WERA Würzburg, Würzburg, Germany
| | - Bastian Schilling
- Goethe University Frankfurt, University Hospital, Department of Dermatology, Frankfurt, Germany
| | - Jakob Nikolas Kather
- Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany
| | - Stefan Fröhling
- Department of Translational Medical Oncology, National Center for Tumor Diseases (NCT) Heidelberg and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Mar Llamas-Velasco
- Department of Dermatology, University Hospital La Princesa, Madrid, Spain
| | - Luis C Requena
- Dermatology Department, Fundación Jiménez Díaz, Autonomous University of Madrid, Madrid, Spain; Anatomic Pathology Service, Fundación Jiménez Díaz, Madrid, Spain
| | - Gerardo Ferrara
- Anatomic Pathology and Cytopathology Unit, Istituto Nazionale Tumori I.R.C.C.S. Fondazione 'G. Pascale', Naples, Italy
| | - Maite Fernandez-Figueras
- University General Hospital of Catalonia, Grupo Quironsalud, International University of Catalonia, Sant Cugat del Vallés, Barcelona, Spain
| | - Sylvie Fraitag
- Pathology department, Necker-Enfants Malades Hospital, Université Paris-Cité, Assistance Publique des Hopitaux de Paris, Paris, France
| | - Cornelia S L Müller
- Center for Histology, Cytology and Molecular Diagnostics, Trier, Germany
- Saarland University, Homburg/Saar, Germany
| | | | - Heinz Kutzner
- Dermatopathology Friedrichshafen, Friedrichshafen, Germany
| | - Raymond Barnhill
- Departments of Pathology and Translational Research, Institut Curie, Paris, France
| | - Richard Carr
- Department of Pathology, Warwick Hospital, Warwick, UK
| | | | - Stephan Alexander Braun
- Department of Dermatology, University of Münster, Münster, Germany; Department of Dermatology, Medical Faculty, Heinrich-Heine-University, Düsseldorf, Germany
| | - Tim Holland-Letz
- Department of Biostatistics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Titus J Brinker
- Digital Biomarkers for Oncology Group, German Cancer Research Center (DKFZ), Heidelberg, Germany.
| |
Collapse
|
15
|
Munafò R, Saitta S, Tondi D, Ingallina G, Denti P, Maisano F, Agricola E, Votta E. Automatic 4D mitral valve segmentation from transesophageal echocardiography: a semi-supervised learning approach. Med Biol Eng Comput 2025:10.1007/s11517-024-03275-w. [PMID: 39797996 DOI: 10.1007/s11517-024-03275-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Accepted: 12/18/2024] [Indexed: 01/13/2025]
Abstract
Performing automatic and standardized 4D TEE segmentation and mitral valve analysis is challenging due to the limitations of echocardiography and the scarcity of manually annotated 4D images. This work proposes a semi-supervised training strategy using pseudo labelling for MV segmentation in 4D TEE; it employs a Teacher-Student framework to ensure reliable pseudo-label generation. 120 4D TEE recordings from 60 candidates for MV repair are used. The Teacher model, an ensemble of three convolutional neural networks, is trained on end-systole and end-diastole frames and is used to generate MV pseudo-segmentations on intermediate frames of the cardiac cycle. The pseudo-annotated frames augment the Student model's training set, improving segmentation accuracy and temporal consistency. The Student outperforms individual Teachers, achieving a Dice score of 0.82, an average surface distance of 0.37 mm, and a 95% Hausdorff distance of 1.72 mm for MV leaflets. The Student model demonstrates reliable frame-by-frame MV segmentation, accurately capturing leaflet morphology and dynamics throughout the cardiac cycle, with a significant reduction in inference time compared to the ensemble. This approach greatly reduces manual annotation workload and ensures reliable, repeatable, and time-efficient MV analysis. Our method holds strong potential to enhance the precision and efficiency of MV diagnostics and treatment planning in clinical settings.
Collapse
Affiliation(s)
- Riccardo Munafò
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, Italy.
| | - Simone Saitta
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, Italy
- Department of Biomedical Engineering and Physics, Amsterdam UMC, Amsterdam, The Netherlands
- Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
| | - Davide Tondi
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, Italy
| | - Giacomo Ingallina
- Unit of Cardiovascular Imaging, IRCCS San Raffaele Hospital, Milan, Italy
| | - Paolo Denti
- Cardiac Surgery Department, IRCCS San Raffaele Hospital, Milan, Italy
| | - Francesco Maisano
- Cardiac Surgery Department, IRCCS San Raffaele Hospital, Milan, Italy
| | - Eustachio Agricola
- Unit of Cardiovascular Imaging, IRCCS San Raffaele Hospital, Milan, Italy
- Vita-Salute San Raffaele University, Milan, Italy
| | - Emiliano Votta
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, Italy
| |
Collapse
|
16
|
Cui C, Liu S, Kwon J, Incorvia JAC. Spintronic Artificial Neurons Showing Integrate-and-Fire Behavior with Reliable Cycling Operation. NANO LETTERS 2025; 25:361-367. [PMID: 39686822 DOI: 10.1021/acs.nanolett.4c05063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2024]
Abstract
The rich dynamics of magnetic materials makes them promising candidates for neural networks that, like the brain, take advantage of dynamical behaviors to efficiently compute. Here, we experimentally show that integrate-and-fire neurons can be achieved using a magnetic nanodevice consisting of a domain wall racetrack and magnetic tunnel junctions in a way that has reliable, continuous operation over many cycles. We demonstrate the domain propagation in the domain wall racetrack (integration), reading using a magnetic tunnel junction (fire), and reset as the domain is ejected from the racetrack with over 100 continuous cycles. Both the pulse amplitude and pulse number encoding are shown. By simulating a spiking neural network task, we benchmark the performance of the devices against an ideal leaky, integrate-and-fire neuron, showing that the spintronic neuron can match the performance of the ideal. These results achieve demonstration of reliable integrated-fire reset in domain wall-magnetic tunnel junction-based neuron devices for neuromorphic computing.
Collapse
Affiliation(s)
- Can Cui
- Dept. of Electrical and Computer Engineering, University of Texas at Austin, Austin, Texas 78712, United States
- Microelectronics Research Center, University of Texas at Austin, Austin, Texas 78758, United States
| | - Samuel Liu
- Dept. of Electrical and Computer Engineering, University of Texas at Austin, Austin, Texas 78712, United States
- Microelectronics Research Center, University of Texas at Austin, Austin, Texas 78758, United States
| | - Jaesuk Kwon
- Dept. of Electrical and Computer Engineering, University of Texas at Austin, Austin, Texas 78712, United States
- Microelectronics Research Center, University of Texas at Austin, Austin, Texas 78758, United States
| | - Jean Anne C Incorvia
- Dept. of Electrical and Computer Engineering, University of Texas at Austin, Austin, Texas 78712, United States
- Microelectronics Research Center, University of Texas at Austin, Austin, Texas 78758, United States
| |
Collapse
|
17
|
Jiang R, Yan Y, Xue JH, Chen S, Wang N, Wang H. Knowledge Distillation Meets Label Noise Learning: Ambiguity-Guided Mutual Label Refinery. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:939-952. [PMID: 38019631 DOI: 10.1109/tnnls.2023.3335829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/01/2023]
Abstract
Knowledge distillation (KD), which aims at transferring the knowledge from a complex network (a teacher) to a simpler and smaller network (a student), has received considerable attention in recent years. Typically, most existing KD methods work on well-labeled data. Unfortunately, real-world data often inevitably involve noisy labels, thus leading to performance deterioration of these methods. In this article, we study a little-explored but important issue, i.e., KD with noisy labels. To this end, we propose a novel KD method, called ambiguity-guided mutual label refinery KD (AML-KD), to train the student model in the presence of noisy labels. Specifically, based on the pretrained teacher model, a two-stage label refinery framework is innovatively introduced to refine labels gradually. In the first stage, we perform label propagation (LP) with small-loss selection guided by the teacher model, improving the learning capability of the student model. In the second stage, we perform mutual LP between the teacher and student models in a mutual-benefit way. During the label refinery, an ambiguity-aware weight estimation (AWE) module is developed to address the problem of ambiguous samples, avoiding overfitting these samples. One distinct advantage of AML-KD is that it is capable of learning a high-accuracy and low-cost student model with label noise. The experimental results on synthetic and real-world noisy datasets show the effectiveness of our AML-KD against state-of-the-art KD methods and label noise learning (LNL) methods. Code is available at https://github.com/Runqing-forMost/ AML-KD.
Collapse
|
18
|
Luo W, Chen S, Liu T, Han B, Niu G, Sugiyama M, Tao D, Gong C. Estimating Per-Class Statistics for Label Noise Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:305-322. [PMID: 39312440 DOI: 10.1109/tpami.2024.3466182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
Real-world data may contain a considerable amount of noisily labeled examples, which usually mislead the training algorithm and result in degraded classification performance on test data. Therefore, Label Noise Learning (LNL) was proposed, of which one popular research trend focused on estimating the critical statistics (e.g., sample mean and sample covariance), to recover the clean data distribution. However, existing methods may suffer from the unreliable sample selection process or can hardly be applied to multi-class cases. Inspired by the centroid estimation theory, we propose Per-Class Statistic Estimation (PCSE), which establishes the quantitative relationship between the clean (first-order and second-order) statistics and the corresponding noisy statistics for every class. This relationship is further utilized to induce a generative classifier for model inference. Unlike existing methods, our approach does not require sample selection from the instance level. Moreover, our PCSE can serve as a general post-processing strategy applicable to various popular networks pre-trained on the noisy dataset for boosting their classification performance. Theoretically, we prove that the estimated statistics converge to their ground-truth values as the sample size increases, even if the estimated label transition matrix is biased. Empirically, we conducted intensive experiments on various binary and multi-class datasets, and the results demonstrate that PCSE achieves more precise statistic estimation as well as higher classification accuracy when compared with state-of-the-art methods in LNL.
Collapse
|
19
|
Tam K, Li L, Han B, Xu C, Fu H. Federated Noisy Client Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:1799-1812. [PMID: 38039172 DOI: 10.1109/tnnls.2023.3336050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2023]
Abstract
Federated learning (FL) collaboratively trains a shared global model depending on multiple local clients, while keeping the training data decentralized to preserve data privacy. However, standard FL methods ignore the noisy client issue, which may harm the overall performance of the shared model. We first investigate the critical issue caused by noisy clients in FL and quantify the negative impact of the noisy clients in terms of the representations learned by different layers. We have the following two key observations: 1) the noisy clients can severely impact the convergence and performance of the global model in FL and 2) the noisy clients can induce greater bias in the deeper layers than the former layers of the global model. Based on the above observations, we propose federated noisy client learning (Fed-NCL), a framework that conducts robust FL with noisy clients. Specifically, Fed-NCL first identifies the noisy clients through well estimating the data quality and model divergence. Then robust layerwise aggregation is proposed to adaptively aggregate the local models of each client to deal with the data heterogeneity caused by the noisy clients. We further perform label correction on the noisy clients to improve the generalization of the global model. Experimental results on various datasets demonstrate that our algorithm boosts the performances of different state-of-the-art systems with noisy clients. Our code is available at https://github.com/TKH666/Fed-NCL.
Collapse
|
20
|
Patt E, Classen S, Hammel M, Schneidman-Duhovny D. Predicting RNA Structure and Dynamics with Deep Learning and Solution Scattering. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.08.598075. [PMID: 39764023 PMCID: PMC11702515 DOI: 10.1101/2024.06.08.598075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2025]
Abstract
Advanced deep learning and statistical methods can predict structural models for RNA molecules. However, RNAs are flexible, and it remains difficult to describe their macromolecular conformations in solutions where varying conditions can induce conformational changes. Small-angle X-ray scattering (SAXS) in solution is an efficient technique to validate structural predictions by comparing the experimental SAXS profile with those calculated from predicted structures. There are two main challenges in comparing SAXS profiles to RNA structures: the absence of cations essential for stability and charge neutralization in predicted structures and the inadequacy of a single structure to represent RNA's conformational plasticity. We introduce Solution Conformation Predictor for RNA (SCOPER) to address these challenges. This pipeline integrates kinematics-based conformational sampling with the innovative deep-learning model, IonNet, designed for predicting Mg2+ ion binding sites. Validated through benchmarking against fourteen experimental datasets, SCOPER significantly improved the quality of SAXS profile fits by including Mg2+ ions and sampling of conformational plasticity. We observe that an increased content of monovalent and bivalent ions leads to decreased RNA plasticity. Therefore, carefully adjusting the plasticity and ion density is crucial to avoid overfitting experimental SAXS data. SCOPER is an efficient tool for accurately validating the solution state of RNAs given an initial, sufficiently accurate structure and provides the corrected atomistic model, including ions.
Collapse
Affiliation(s)
- Edan Patt
- School of Computer Science and Engineering, The Hebrew University of Jerusalem
| | - Scott Classen
- Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Michal Hammel
- Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | | |
Collapse
|
21
|
Tian Y, Li Z, Jin Y, Wang M, Wei X, Zhao L, Liu Y, Liu J, Liu C. Foundation model of ECG diagnosis: Diagnostics and explanations of any form and rhythm on ECG. Cell Rep Med 2024; 5:101875. [PMID: 39694017 PMCID: PMC11722092 DOI: 10.1016/j.xcrm.2024.101875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Revised: 09/21/2024] [Accepted: 11/21/2024] [Indexed: 12/20/2024]
Abstract
We propose a knowledge-enhanced electrocardiogram (ECG) diagnosis foundation model (KED) that utilizes large language models to incorporate domain-specific knowledge of ECG signals. This model is trained on 800,000 ECGs from nearly 160,000 unique patients. Despite being trained on single-center data, KED demonstrates exceptional zero-shot diagnosis performance across various regions, including different locales in China, the United States, and other regions. This performance spans across all age groups for various conditions such as morphological abnormalities, rhythm abnormalities, conduction blocks, hypertrophy, myocardial ischemia, and infarction. Moreover, KED exhibits robust performance on diseases it has not encountered during its training. When compared to three experienced cardiologists on real clinical datasets, the model achieves comparable performance in zero-shot diagnosis of seven common clinical ECG types. We concentrate on the zero-shot diagnostic capability and the generalization performance of the proposed ECG foundation model, particularly in the context of external multi-center data and previously unseen disease.
Collapse
Affiliation(s)
- Yuanyuan Tian
- State Key Laboratory of Mechanical System and Vibration, School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China; MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai 200240, China.
| | - Zhiyuan Li
- State Key Laboratory of Mechanical System and Vibration, School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China; MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yanrui Jin
- State Key Laboratory of Mechanical System and Vibration, School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China; MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Mengxiao Wang
- State Key Laboratory of Mechanical System and Vibration, School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China; MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Xiaoyang Wei
- State Key Laboratory of Mechanical System and Vibration, School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China; MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Liqun Zhao
- Department of Cardiology, Shanghai First People's Hospital Affiliated to Shanghai Jiao Tong University, Shanghai 200080, China
| | - Yunqing Liu
- State Key Laboratory of Mechanical System and Vibration, School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China; MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Jinlei Liu
- State Key Laboratory of Mechanical System and Vibration, School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China; MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Chengliang Liu
- State Key Laboratory of Mechanical System and Vibration, School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China; MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai 200240, China.
| |
Collapse
|
22
|
Zhang Q, Zhu Y, Yang M, Jin G, Zhu Y, Lu Y, Zou Y, Chen Q. An improved sample selection framework for learning with noisy labels. PLoS One 2024; 19:e0309841. [PMID: 39636882 PMCID: PMC11620405 DOI: 10.1371/journal.pone.0309841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Accepted: 08/20/2024] [Indexed: 12/07/2024] Open
Abstract
Deep neural networks have powerful memory capabilities, yet they frequently suffer from overfitting to noisy labels, leading to a decline in classification and generalization performance. To address this issue, sample selection methods that filter out potentially clean labels have been proposed. However, there is a significant gap in size between the filtered, possibly clean subset and the unlabeled subset, which becomes particularly pronounced at high-noise rates. Consequently, this results in underutilizing label-free samples in sample selection methods, leaving room for performance improvement. This study introduces an enhanced sample selection framework with an oversampling strategy (SOS) to overcome this limitation. This framework leverages the valuable information contained in label-free instances to enhance model performance by combining an SOS with state-of-the-art sample selection methods. We validate the effectiveness of SOS through extensive experiments conducted on both synthetic noisy datasets and real-world datasets such as CIFAR, WebVision, and Clothing1M. The source code for SOS will be made available at https://github.com/LanXiaoPang613/SOS.
Collapse
Affiliation(s)
- Qian Zhang
- School of Information Technology, Jiangsu Open University, Nanjing, Jiangsu, China
| | - Yi Zhu
- School of Information Technology, Jiangsu Open University, Nanjing, Jiangsu, China
| | - Ming Yang
- School of Computer and Electronic Information, Nanjing Normal University, Nanjing, Jiangsu, China
| | - Ge Jin
- School of Information Technology, Jiangsu Open University, Nanjing, Jiangsu, China
| | - Yingwen Zhu
- School of Information Technology, Jiangsu Open University, Nanjing, Jiangsu, China
| | - Yanjun Lu
- School of Information Technology, Jiangsu Open University, Nanjing, Jiangsu, China
| | - Yu Zou
- School of Information Technology, Jiangsu Open University, Nanjing, Jiangsu, China
- School of Artificial Intelligence (School of Future Technology), Nanjing University of Information Science & Technology, Nanjing, Jiangsu, China
| | - Qiu Chen
- Department of Electrical Engineering and Electronics, Graduate School of Engineering, Kogakuin University, Tokyo, Japan
| |
Collapse
|
23
|
Dorsey PJ, Lau CL, Chang TC, Doerschuk PC, D'Addio SM. Review of machine learning for lipid nanoparticle formulation and process development. J Pharm Sci 2024; 113:3413-3433. [PMID: 39341497 DOI: 10.1016/j.xphs.2024.09.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2024] [Revised: 09/10/2024] [Accepted: 09/11/2024] [Indexed: 10/01/2024]
Abstract
Lipid nanoparticles (LNPs) are a subset of pharmaceutical nanoparticulate formulations designed to encapsulate, stabilize, and deliver nucleic acid cargoes in vivo. Applications for LNPs include new interventions for genetic disorders, novel classes of vaccines, and alternate modes of intracellular delivery for therapeutic proteins. In the pharmaceutical industry, establishing a robust formulation and process to achieve target product performance is a critical component of drug development. Fundamental understanding of the processes for making LNPs and their interactions with biological systems have advanced considerably in the wake of the COVID-19 pandemic. Nevertheless, LNP formulation research remains largely empirical and resource intensive due to the multitude of input parameters and the complex physical phenomena that govern the processes of nanoparticle precipitation, self-assembly, structure evolution, and stability. Increasingly, artificial intelligence and machine learning (AI/ML) are being applied to improve the efficiency of research activities through in silico models and predictions, and to drive deeper fundamental understanding of experimental inputs to functional outputs. This review will identify current challenges and opportunities in the development of robust LNP formulations of nucleic acids, review studies that apply machine learning methods to experimental datasets, and provide discussion on associated data science challenges to facilitate collaboration between formulation and data scientists, aiming to accelerate the advancement of AI/ML applied to LNP formulation and process optimization.
Collapse
Affiliation(s)
- Phillip J Dorsey
- Pharmaceutical Sciences & Clinical Supply, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA; University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
| | - Christina L Lau
- Cornell University, School of Electrical and Computer Engineering, Ithaca, NY 14853, USA
| | - Ti-Chiun Chang
- Pharmaceutical Sciences & Clinical Supply, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA
| | - Peter C Doerschuk
- Cornell University, School of Electrical and Computer Engineering, Ithaca, NY 14853, USA
| | - Suzanne M D'Addio
- Pharmaceutical Sciences & Clinical Supply, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA.
| |
Collapse
|
24
|
Zeng B, Yang X, Chen Y, Yu H, Hu C, Zhang Y. Federated Data Quality Assessment Approach: Robust Learning With Mixed Label Noise. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:17620-17634. [PMID: 37651486 DOI: 10.1109/tnnls.2023.3306874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
Federated learning (FL) has been an effective way to train a machine learning model distributedly, holding local data without exchanging them. However, due to the inaccessibility of local data, FL with label noise would be more challenging. Most existing methods assume only open-set or closed-set noise and correspondingly propose filtering or correction solutions, ignoring that label noise can be mixed in real-world scenarios. In this article, we propose a novel FL method to discriminate the type of noise and make the FL mixed noise-robust, named FedMIN. FedMIN employs a composite framework that captures local-global differences in multiparticipant distributions to model generalized noise patterns. By determining adaptive thresholds for identifying mixed label noise in each client and assigning appropriate weights during model aggregation, FedMIN enhances the performance of the global model. Furthermore, FedMIN incorporates a loss alignment mechanism using local and global Gaussian mixture models (GMMs) to mitigate the risk of revealing samplewise loss. Extensive experiments are conducted on several public datasets, which include the simulated FL testbeds, i.e., CIFAR-10, CIFAR-100, and SVHN, and the real-world ones, i.e., Camelyon17 and multiorgan nuclei challenge (MoNuSAC). Compared to FL benchmarks, FedMIN improves model accuracy by up to 9.9% due to its superior noise estimation capabilities.
Collapse
|
25
|
Cai Z, Lin L, He H, Cheng P, Tang X. Uni4Eye++: A General Masked Image Modeling Multi-Modal Pre-Training Framework for Ophthalmic Image Classification and Segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:4419-4429. [PMID: 38954581 DOI: 10.1109/tmi.2024.3422102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2024]
Abstract
A large-scale labeled dataset is a key factor for the success of supervised deep learning in most ophthalmic image analysis scenarios. However, limited annotated data is very common in ophthalmic image analysis, since manual annotation is time-consuming and labor-intensive. Self-supervised learning (SSL) methods bring huge opportunities for better utilizing unlabeled data, as they do not require massive annotations. To utilize as many unlabeled ophthalmic images as possible, it is necessary to break the dimension barrier, simultaneously making use of both 2D and 3D images as well as alleviating the issue of catastrophic forgetting. In this paper, we propose a universal self-supervised Transformer framework named Uni4Eye++ to discover the intrinsic image characteristic and capture domain-specific feature embedding in ophthalmic images. Uni4Eye++ can serve as a global feature extractor, which builds its basis on a Masked Image Modeling task with a Vision Transformer architecture. On the basis of our previous work Uni4Eye, we further employ an image entropy guided masking strategy to reconstruct more-informative patches and a dynamic head generator module to alleviate modality confusion. We evaluate the performance of our pre-trained Uni4Eye++ encoder by fine-tuning it on multiple downstream ophthalmic image classification and segmentation tasks. The superiority of Uni4Eye++ is successfully established through comparisons to other state-of-the-art SSL pre-training methods. Our code is available at https://github.com/Davidczy/Uni4Eye++.
Collapse
|
26
|
Fan F, Shi Y, Guggemos T, Zhu XX. Hybrid Quantum-Classical Convolutional Neural Network Model for Image Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:18145-18159. [PMID: 37721886 DOI: 10.1109/tnnls.2023.3312170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/20/2023]
Abstract
Image classification plays an important role in remote sensing. Earth observation (EO) has inevitably arrived in the big data era, but the high requirement on computation power has already become a bottleneck for analyzing large amounts of remote sensing data with sophisticated machine learning models. Exploiting quantum computing might contribute to a solution to tackle this challenge by leveraging quantum properties. This article introduces a hybrid quantum-classical convolutional neural network (QC-CNN) that applies quantum computing to effectively extract high-level critical features from EO data for classification purposes. Besides that, the adoption of the amplitude encoding technique reduces the required quantum bit resources. The complexity analysis indicates that the proposed model can accelerate the convolutional operation in comparison with its classical counterpart. The model's performance is evaluated with different EO benchmarks, including Overhead-MNIST, So2Sat LCZ42, PatternNet, RSI-CB256, and NaSC-TG2, through the TensorFlow Quantum platform, and it can achieve better performance than its classical counterpart and have higher generalizability, which verifies the validity of the QC-CNN model on EO data classification tasks.
Collapse
|
27
|
Li S, Xia X, Deng J, Ge S, Liu T. Transferring Annotator- and Instance-Dependent Transition Matrix for Learning From Crowds. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:7377-7391. [PMID: 38607713 DOI: 10.1109/tpami.2024.3388209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/14/2024]
Abstract
Learning from crowds describes that the annotations of training data are obtained with crowd-sourcing services. Multiple annotators each complete their own small part of the annotations, where labeling mistakes that depend on annotators occur frequently. Modeling the label-noise generation process by the noise transition matrix is a powerful tool to tackle the label noise. In real-world crowd-sourcing scenarios, noise transition matrices are both annotator- and instance-dependent. However, due to the high complexity of annotator- and instance-dependent transition matrices (AIDTM), annotation sparsity, which means each annotator only labels a tiny part of instances, makes modeling AIDTM very challenging. Without prior knowledge, existing works simplify the problem by assuming the transition matrix is instance-independent or using simple parametric ways, which lose modeling generality. Motivated by this, we target a more realistic problem, estimating general AIDTM in practice. Without losing modeling generality, we parameterize AIDTM with deep neural networks. To alleviate the modeling challenge, we suppose every annotator shares its noise pattern with similar annotators, and estimate AIDTM via knowledge transfer. We hence first model the mixture of noise patterns by all annotators, and then transfer this modeling to individual annotators. Furthermore, considering that the transfer from the mixture of noise patterns to individuals may cause two annotators with highly different noise generations to perturb each other, we employ the knowledge transfer between identified neighboring annotators to calibrate the modeling. Theoretical analyses are derived to demonstrate that both the knowledge transfer from global to individuals and the knowledge transfer between neighboring individuals can effectively help mitigate the challenge of modeling general AIDTM. Experiments confirm the superiority of the proposed approach on synthetic and real-world crowd-sourcing data.
Collapse
|
28
|
Strijbis VI, Gurney-Champion O, Slotman BJ, Verbakel WF. Impact of annotation imperfections and auto-curation for deep learning-based organ-at-risk segmentation. Phys Imaging Radiat Oncol 2024; 32:100684. [PMID: 39720784 PMCID: PMC11667007 DOI: 10.1016/j.phro.2024.100684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 11/24/2024] [Accepted: 11/26/2024] [Indexed: 12/26/2024] Open
Abstract
Background and purpose Segmentation imperfections (noise) in radiotherapy organ-at-risk segmentation naturally arise from specialist experience and image quality. Using clinical contours can result in sub-optimal convolutional neural network (CNN) training and performance, but manual curation is costly. We address the impact of simulated and clinical segmentation noise on CNN parotid gland (PG) segmentation performance and provide proof-of-concept for an easily implemented auto-curation countermeasure. Methods and Materials The impact of segmentation imperfections was investigated by simulating noise in clean, high-quality segmentations. Curation efficacy was tested by removing lowest-scoring Dice similarity coefficient (DSC) cases early during CNN training, both in simulated (5-fold) and clinical (10-fold) settings, using our full radiotherapy clinical cohort (RTCC; N = 1750 individual PGs). Statistical significance was assessed using Bonferroni-corrected Wilcoxon signed-rank tests. Curation efficacies were evaluated using DSC and mean surface distance (MSD) on in-distribution and out-of-distribution data and visual inspection. Results The curation step correctly removed median(range) 98(90-100)% of corrupted segmentations and restored the majority (1.2 %/1.3 %) of DSC lost from training with 30 % corrupted segmentations. This effect was masked when using typical (non-curated) validation data. In RTCC, 20 % curation showed improved model generalizability which significantly improved out-of-distribution DSC and MSD (p < 1.0e-12, p < 1.0e-6). Improved consistency was observed in particularly the medial and anterior lobes. Conclusions Up to 30% case removal, the curation benefit outweighed the training variance lost through curation. Considering the notable ease of implementation, high sensitivity in simulations and performance gains already at lower curation fractions, as a conservative middle ground, we recommend 15% curation of training cases when training CNNs using clinical PG contours.
Collapse
Affiliation(s)
- Victor I.J. Strijbis
- Amsterdam UMC location Vrije Universiteit Amsterdam, Department of Radiation Oncology, De Boelelaan 1117, Amsterdam, the Netherlands
- Cancer Center Amsterdam, Cancer Treatment and Quality of Life, Amsterdam, the Netherlands
| | - O.J. Gurney-Champion
- Amsterdam UMC location University of Amsterdam, Department of Radiology and Nuclear Medicine, Meibergdreef 9, Amsterdam, Netherlands
- Cancer Center Amsterdam, Imaging and Biomarkers, Amsterdam, the Netherlands
| | - Berend J. Slotman
- Amsterdam UMC location Vrije Universiteit Amsterdam, Department of Radiation Oncology, De Boelelaan 1117, Amsterdam, the Netherlands
- Cancer Center Amsterdam, Cancer Treatment and Quality of Life, Amsterdam, the Netherlands
| | - Wilko F.A.R. Verbakel
- Amsterdam UMC location Vrije Universiteit Amsterdam, Department of Radiation Oncology, De Boelelaan 1117, Amsterdam, the Netherlands
- Cancer Center Amsterdam, Cancer Treatment and Quality of Life, Amsterdam, the Netherlands
- Varian Medical Systems, a Siemens Healthineers Company, Palo Alto, USA
| |
Collapse
|
29
|
Zhang Y, Chung ACS. Retinal Vessel Segmentation by a Transformer-U-Net Hybrid Model With Dual-Path Decoder. IEEE J Biomed Health Inform 2024; 28:5347-5359. [PMID: 38669172 DOI: 10.1109/jbhi.2024.3394151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/28/2024]
Abstract
This paper introduces an effective and efficient framework for retinal vessel segmentation. First, we design a Transformer-CNN hybrid model in which a Transformer module is inserted inside the U-Net to capture long-range interactions. Second, we design a dual-path decoder in the U-Net framework, which contains two decoding paths for multi-task outputs. Specifically, we train the extra decoder to predict vessel skeletons as an auxiliary task which helps the model learn balanced features. The proposed framework, named as TSNet, not only achieves good performances in a fully supervised learning manner but also enables a rough skeleton annotation process. The annotators only need to roughly delineate vessel skeletons instead of giving precise pixel-wise vessel annotations. To learn with rough skeleton annotations plus a few precise vessel annotations, we propose a skeleton semi-supervised learning scheme. We adopt a mean teacher model to produce pseudo vessel annotations and conduct annotation correction for roughly labeled skeletons annotations. This learning scheme can achieve promising performance with fewer annotation efforts. We have evaluated TSNet through extensive experiments on five benchmarking datasets. Experimental results show that TSNet yields state-of-the-art performances on retinal vessel segmentation and provides an efficient training scheme in practice.
Collapse
|
30
|
Liu L, Liu T, Chen CLP, Wang Y. Modal-Regression-Based Broad Learning System for Robust Regression and Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:12344-12357. [PMID: 37030755 DOI: 10.1109/tnnls.2023.3256999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
A novel neural network, namely, broad learning system (BLS), has shown impressive performance on various regression and classification tasks. Nevertheless, most BLS models may suffer serious performance degradation for contaminated data, since they are derived under the least-squares criterion which is sensitive to noise and outliers. To enhance the model robustness, in this article we proposed a modal-regression-based BLS (MRBLS) to tackle the regression and classification tasks of data corrupted by noise and outliers. Specifically, modal regression is adopted to train the output weights instead of the minimum mean square error (MMSE) criterion. Moreover, the l2,1 -norm-induced constraint is used to encourage row sparsity of the connection weight matrix and achieve feature selection. To effectively and efficiently train the network, the half-quadratic theory is used to optimize MRBLS. The validity and robustness of the proposed method are verified on various regression and classification datasets. The experimental results demonstrate that the proposed MRBLS achieves better performance than the existing state-of-the-art BLS methods in terms of both accuracy and robustness.
Collapse
|
31
|
Koga R, Koide S, Tanaka H, Taguchi K, Kugler M, Yokota T, Ohshima K, Miyoshi H, Nagaishi M, Hashimoto N, Takeuchi I, Hontani H. A study of criteria for grading follicular lymphoma using a cell type classifier from pathology images based on complementary-label learning. Micron 2024; 184:103663. [PMID: 38843576 DOI: 10.1016/j.micron.2024.103663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 05/21/2024] [Accepted: 05/21/2024] [Indexed: 06/30/2024]
Abstract
We propose a criterion for grading follicular lymphoma that is consistent with the intuitive evaluation, which is conducted by experienced pathologists. A criterion for grading follicular lymphoma is defined by the World Health Organization (WHO) based on the number of centroblasts and centrocytes within the field of view. However, the WHO criterion is not often used in clinical practice because it is impractical for pathologists to visually identify the cell type of each cell and count the number of centroblasts and centrocytes. Hence, based on the widespread use of digital pathology, we make it practical to identify and count the cell type by using image processing and then construct a criterion for grading based on the number of cells. Here, the problem is that labeling the cell type is not easy even for experienced pathologists. To alleviate this problem, we build a new dataset for cell type classification, which contains the pathologists' confusion records during labeling, and we construct the cell type classifier using complementary-label learning from this dataset. Then we propose a criterion based on the composition ratio of cell types that is consistent with the pathologists' grading. Our experiments demonstrate that the classifier can accurately identify cell types and the proposed criterion is more consistent with the pathologists' grading than the current WHO criterion.
Collapse
Affiliation(s)
- Ryoichi Koga
- Dapartment of Computer Science, Gokiso-cho, Showa-ku, Nagoya-shi, Aichi 466-8555, Japan
| | - Shingo Koide
- Dapartment of Computer Science, Gokiso-cho, Showa-ku, Nagoya-shi, Aichi 466-8555, Japan
| | - Hiromu Tanaka
- Dapartment of Computer Science, Gokiso-cho, Showa-ku, Nagoya-shi, Aichi 466-8555, Japan
| | - Kei Taguchi
- Dapartment of Computer Science, Gokiso-cho, Showa-ku, Nagoya-shi, Aichi 466-8555, Japan
| | - Mauricio Kugler
- Dapartment of Computer Science, Gokiso-cho, Showa-ku, Nagoya-shi, Aichi 466-8555, Japan
| | - Tatsuya Yokota
- Dapartment of Computer Science, Gokiso-cho, Showa-ku, Nagoya-shi, Aichi 466-8555, Japan
| | - Koichi Ohshima
- Department of Pathology, 67 Asahi-cho, Kurume-shi, Fukuoka 830-0011, Japan
| | - Hiroaki Miyoshi
- Department of Pathology, 67 Asahi-cho, Kurume-shi, Fukuoka 830-0011, Japan
| | - Miharu Nagaishi
- Department of Pathology, 67 Asahi-cho, Kurume-shi, Fukuoka 830-0011, Japan
| | - Noriaki Hashimoto
- RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan
| | - Ichiro Takeuchi
- RIKEN Center for Advanced Intelligence Project, 1-4-1 Nihonbashi, Chuo-ku, Tokyo 103-0027, Japan; Department of Mechanical Systems Engineering, Furo-sho, Chikusa-ku, Nagoya-shi Aichi 464-8601, Japan
| | - Hidekata Hontani
- Dapartment of Computer Science, Gokiso-cho, Showa-ku, Nagoya-shi, Aichi 466-8555, Japan.
| |
Collapse
|
32
|
Ding K, Nouri E, Zheng G, Liu H, White R. Toward Robust Graph Semi-Supervised Learning Against Extreme Data Scarcity. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:11661-11670. [PMID: 38421848 DOI: 10.1109/tnnls.2024.3351938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/02/2024]
Abstract
The success of graph neural networks (GNNs) in graph-based web mining highly relies on abundant human-annotated data, which is laborious to obtain in practice. When only a few labeled nodes are available, how to improve their robustness is key to achieving replicable and sustainable graph semi-supervised learning. Though self-training is powerful for semi-supervised learning, its application on graph-structured data may fail because 1) larger receptive fields are not leveraged to capture long-range node interactions, which exacerbates the difficulty of propagating feature-label patterns from labeled nodes to unlabeled nodes and 2) limited labeled data makes it challenging to learn well-separated decision boundaries for different node classes without explicitly capturing the underlying semantic structure. To address the challenges of capturing informative structural and semantic knowledge, we propose a new graph data augmentation framework, augmented graph self-training (AGST), which is built with two new (i.e., structural and semantic) augmentation modules on top of a decoupled GST backbone. In this work, we investigate whether this novel framework can learn a robust graph predictive model under the low-data context. We conduct comprehensive evaluations on semi-supervised node classification under different scenarios of limited labeled-node data. The experimental results demonstrate the unique contributions of the novel data augmentation framework for node classification with few labeled data.
Collapse
|
33
|
Nomura Y, Hanaoka S, Hayashi N, Yoshikawa T, Koshino S, Sato C, Tatsuta M, Tanaka Y, Kano S, Nakaya M, Inui S, Kusakabe M, Nakao T, Miki S, Watadani T, Nakaoka R, Shimizu A, Abe O. Performance changes due to differences among annotating radiologists for training data in computerized lesion detection. Int J Comput Assist Radiol Surg 2024; 19:1527-1536. [PMID: 38625446 PMCID: PMC11329536 DOI: 10.1007/s11548-024-03136-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 03/28/2024] [Indexed: 04/17/2024]
Abstract
PURPOSE The quality and bias of annotations by annotators (e.g., radiologists) affect the performance changes in computer-aided detection (CAD) software using machine learning. We hypothesized that the difference in the years of experience in image interpretation among radiologists contributes to annotation variability. In this study, we focused on how the performance of CAD software changes with retraining by incorporating cases annotated by radiologists with varying experience. METHODS We used two types of CAD software for lung nodule detection in chest computed tomography images and cerebral aneurysm detection in magnetic resonance angiography images. Twelve radiologists with different years of experience independently annotated the lesions, and the performance changes were investigated by repeating the retraining of the CAD software twice, with the addition of cases annotated by each radiologist. Additionally, we investigated the effects of retraining using integrated annotations from multiple radiologists. RESULTS The performance of the CAD software after retraining differed among annotating radiologists. In some cases, the performance was degraded compared to that of the initial software. Retraining using integrated annotations showed different performance trends depending on the target CAD software, notably in cerebral aneurysm detection, where the performance decreased compared to using annotations from a single radiologist. CONCLUSIONS Although the performance of the CAD software after retraining varied among the annotating radiologists, no direct correlation with their experience was found. The performance trends differed according to the type of CAD software used when integrated annotations from multiple radiologists were used.
Collapse
Affiliation(s)
- Yukihiro Nomura
- Center for Frontier Medical Engineering, Chiba University, 1-33 Yayoi-cho, Inage-ku, Chiba, 263-8522, Japan.
- Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, Tokyo, Japan.
| | - Shouhei Hanaoka
- Department of Radiology, The University of Tokyo Hospital, Tokyo, Japan
- Division of Radiology and Biomedical Engineering, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Naoto Hayashi
- Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, Tokyo, Japan
| | - Takeharu Yoshikawa
- Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, Tokyo, Japan
| | - Saori Koshino
- Department of Radiology, The University of Tokyo Hospital, Tokyo, Japan
| | - Chiaki Sato
- Department of Radiology, Tokyo Metropolitan Bokutoh Hospital, Tokyo, Japan
| | - Momoko Tatsuta
- Department of Diagnostic Radiology, Kitasato University Hospital, Sagamihara, Kanagawa, Japan
| | - Yuya Tanaka
- Division of Radiology and Biomedical Engineering, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Shintaro Kano
- Department of Radiology, The University of Tokyo Hospital, Tokyo, Japan
| | - Moto Nakaya
- Division of Radiology and Biomedical Engineering, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Shohei Inui
- Department of Radiology, The University of Tokyo Hospital, Tokyo, Japan
| | | | - Takahiro Nakao
- Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, Tokyo, Japan
| | - Soichiro Miki
- Department of Computational Diagnostic Radiology and Preventive Medicine, The University of Tokyo Hospital, Tokyo, Japan
| | - Takeyuki Watadani
- Department of Radiology, The University of Tokyo Hospital, Tokyo, Japan
- Division of Radiology and Biomedical Engineering, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Ryusuke Nakaoka
- Division of Medical Devices, National Institute of Health Sciences, Kawasaki, Kanagawa, Japan
| | - Akinobu Shimizu
- Institute of Engineering, Tokyo University of Agriculture and Technology, Tokyo, Japan
| | - Osamu Abe
- Department of Radiology, The University of Tokyo Hospital, Tokyo, Japan
- Division of Radiology and Biomedical Engineering, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
34
|
Guan H, Yap PT, Bozoki A, Liu M. Federated learning for medical image analysis: A survey. PATTERN RECOGNITION 2024; 151:110424. [PMID: 38559674 PMCID: PMC10976951 DOI: 10.1016/j.patcog.2024.110424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Machine learning in medical imaging often faces a fundamental dilemma, namely, the small sample size problem. Many recent studies suggest using multi-domain data pooled from different acquisition sites/centers to improve statistical power. However, medical images from different sites cannot be easily shared to build large datasets for model training due to privacy protection reasons. As a promising solution, federated learning, which enables collaborative training of machine learning models based on data from different sites without cross-site data sharing, has attracted considerable attention recently. In this paper, we conduct a comprehensive survey of the recent development of federated learning methods in medical image analysis. We have systematically gathered research papers on federated learning and its applications in medical image analysis published between 2017 and 2023. Our search and compilation were conducted using databases from IEEE Xplore, ACM Digital Library, Science Direct, Springer Link, Web of Science, Google Scholar, and PubMed. In this survey, we first introduce the background of federated learning for dealing with privacy protection and collaborative learning issues. We then present a comprehensive review of recent advances in federated learning methods for medical image analysis. Specifically, existing methods are categorized based on three critical aspects of a federated learning system, including client end, server end, and communication techniques. In each category, we summarize the existing federated learning methods according to specific research problems in medical image analysis and also provide insights into the motivations of different approaches. In addition, we provide a review of existing benchmark medical imaging datasets and software platforms for current federated learning research. We also conduct an experimental study to empirically evaluate typical federated learning methods for medical image analysis. This survey can help to better understand the current research status, challenges, and potential research opportunities in this promising research field.
Collapse
Affiliation(s)
- Hao Guan
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Pew-Thian Yap
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Andrea Bozoki
- Department of Neurology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Mingxia Liu
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
35
|
Xie Z, Liu Y, He HY, Li M, Zhou ZH. Weakly Supervised AUC Optimization: A Unified Partial AUC Approach. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:4780-4795. [PMID: 38265903 DOI: 10.1109/tpami.2024.3357814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
Since acquiring perfect supervision is usually difficult, real-world machine learning tasks often confront inaccurate, incomplete, or inexact supervision, collectively referred to as weak supervision. In this work, we present WSAUC, a unified framework for weakly supervised AUC optimization problems, which covers noisy label learning, positive-unlabeled learning, multi-instance learning, and semi-supervised learning scenarios. Within the WSAUC framework, we first frame the AUC optimization problems in various weakly supervised scenarios as a common formulation of minimizing the AUC risk on contaminated sets, and demonstrate that the empirical risk minimization problems are consistent with the true AUC. Then, we introduce a new type of partial AUC, specifically, the reversed partial AUC (rpAUC), which serves as a robust training objective for AUC maximization in the presence of contaminated labels. WSAUC offers a universal solution for AUC optimization in various weakly supervised scenarios by maximizing the empirical rpAUC. Theoretical and experimental results under multiple settings support the effectiveness of WSAUC on a range of weakly supervised AUC optimization tasks.
Collapse
|
36
|
Uehara K, Uegami W, Nosato H, Murakawa M, Fukuoka J, Sakanashi H. Ensemble Distillation of Divergent Opinions for Robust Pathological Image Classification. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2024; 2024:1-4. [PMID: 40038937 DOI: 10.1109/embc53108.2024.10782712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
The construction of highly accurate deep neural networks (DNNs) requires consistent labeled data. However, there are numerous cases wherein the ground truth is not uniquely determined, even for the same data, owing to different interpretations depending on observers' decision criteria. Studies on the definition of labels and the building of a DNN model under such circumstances are scarce. Thus, this study addresses this issue in the field of pathological image diagnosis, where opinions occasionally vary, even among medical experts. We propose a method for constructing DNN models that are more robust to inter-observer variability by exploiting the knowledge about the relationships among the data learned by multiple DNN models. Comparison experiments were conducted using multiple pathology datasets of independently labeled images by different pathologists for the same set of images. The proposed method exhibited good generalization capability and outperformed the classification accuracy of baseline models specific to certain decision criteria.
Collapse
|
37
|
Yang J, Triendl H, Soltan AAS, Prakash M, Clifton DA. Addressing label noise for electronic health records: insights from computer vision for tabular data. BMC Med Inform Decis Mak 2024; 24:183. [PMID: 38937744 PMCID: PMC11212446 DOI: 10.1186/s12911-024-02581-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Accepted: 06/20/2024] [Indexed: 06/29/2024] Open
Abstract
The analysis of extensive electronic health records (EHR) datasets often calls for automated solutions, with machine learning (ML) techniques, including deep learning (DL), taking a lead role. One common task involves categorizing EHR data into predefined groups. However, the vulnerability of EHRs to noise and errors stemming from data collection processes, as well as potential human labeling errors, poses a significant risk. This risk is particularly prominent during the training of DL models, where the possibility of overfitting to noisy labels can have serious repercussions in healthcare. Despite the well-documented existence of label noise in EHR data, few studies have tackled this challenge within the EHR domain. Our work addresses this gap by adapting computer vision (CV) algorithms to mitigate the impact of label noise in DL models trained on EHR data. Notably, it remains uncertain whether CV methods, when applied to the EHR domain, will prove effective, given the substantial divergence between the two domains. We present empirical evidence demonstrating that these methods, whether used individually or in combination, can substantially enhance model performance when applied to EHR data, especially in the presence of noisy/incorrect labels. We validate our methods and underscore their practical utility in real-world EHR data, specifically in the context of COVID-19 diagnosis. Our study highlights the effectiveness of CV methods in the EHR domain, making a valuable contribution to the advancement of healthcare analytics and research.
Collapse
Affiliation(s)
- Jenny Yang
- Institute of Biomedical Engineering, Dept. Engineering Science, University of Oxford, Oxford, England.
| | | | - Andrew A S Soltan
- Institute of Biomedical Engineering, Dept. Engineering Science, University of Oxford, Oxford, England
- Oxford Cancer & Haematology Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, England
- Department of Oncology, University of Oxford, Oxford, England
| | - Mangal Prakash
- Work done at Exscientia, Currently Independent Researcher, Reading, United Kingdom
| | - David A Clifton
- Institute of Biomedical Engineering, Dept. Engineering Science, University of Oxford, Oxford, England
- Oxford-Suzhou Centre for Advanced Research (OSCAR), Suzhou, China
| |
Collapse
|
38
|
Wei Y, Deng Y, Sun C, Lin M, Jiang H, Peng Y. Deep learning with noisy labels in medical prediction problems: a scoping review. J Am Med Inform Assoc 2024; 31:1596-1607. [PMID: 38814164 PMCID: PMC11187424 DOI: 10.1093/jamia/ocae108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 04/27/2024] [Accepted: 05/03/2024] [Indexed: 05/31/2024] Open
Abstract
OBJECTIVES Medical research faces substantial challenges from noisy labels attributed to factors like inter-expert variability and machine-extracted labels. Despite this, the adoption of label noise management remains limited, and label noise is largely ignored. To this end, there is a critical need to conduct a scoping review focusing on the problem space. This scoping review aims to comprehensively review label noise management in deep learning-based medical prediction problems, which includes label noise detection, label noise handling, and evaluation. Research involving label uncertainty is also included. METHODS Our scoping review follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. We searched 4 databases, including PubMed, IEEE Xplore, Google Scholar, and Semantic Scholar. Our search terms include "noisy label AND medical/healthcare/clinical," "uncertainty AND medical/healthcare/clinical," and "noise AND medical/healthcare/clinical." RESULTS A total of 60 papers met inclusion criteria between 2016 and 2023. A series of practical questions in medical research are investigated. These include the sources of label noise, the impact of label noise, the detection of label noise, label noise handling techniques, and their evaluation. Categorization of both label noise detection methods and handling techniques are provided. DISCUSSION From a methodological perspective, we observe that the medical community has been up to date with the broader deep-learning community, given that most techniques have been evaluated on medical data. We recommend considering label noise as a standard element in medical research, even if it is not dedicated to handling noisy labels. Initial experiments can start with easy-to-implement methods, such as noise-robust loss functions, weighting, and curriculum learning.
Collapse
Affiliation(s)
- Yishu Wei
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, United States
- Reddit Inc., San Francisco, CA 16093, United States
| | - Yu Deng
- Center for Health Information Partnerships, Northwestern University, Chicago, IL 10611, United States
| | - Cong Sun
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, United States
| | - Mingquan Lin
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, United States
- Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States
| | - Hongmei Jiang
- Department of Statistics and Data Science, Northwestern University, Evanston, IL 60208, United States
| | - Yifan Peng
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, United States
| |
Collapse
|
39
|
Penso C, Frenkel L, Goldberger J. Confidence Calibration of a Medical Imaging Classification System That is Robust to Label Noise. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:2050-2060. [PMID: 38224509 DOI: 10.1109/tmi.2024.3353762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/17/2024]
Abstract
A classification model is calibrated if its predicted probabilities of outcomes reflect their accuracy. Calibrating neural networks is critical in medical analysis applications where clinical decisions rely upon the predicted probabilities. Most calibration procedures, such as temperature scaling, operate as a post processing step by using holdout validation data. In practice, it is difficult to collect medical image data with correct labels due to the complexity of the medical data and the considerable variability across experts. This study presents a network calibration procedure that is robust to label noise. We draw on the fact that the confusion matrix of the noisy labels can be expressed as the matrix product between the confusion matrix of the clean labels and the label noises. The method is based on estimating the noise level as part of a noise-robust training method. The noise level is then used to estimate the network accuracy required by the calibration procedure. We show that despite the unreliable labels, we can still achieve calibration results that are on a par with the results of a calibration procedure using data with reliable labels.
Collapse
|
40
|
Goetz L, Seedat N, Vandersluis R, van der Schaar M. Generalization-a key challenge for responsible AI in patient-facing clinical applications. NPJ Digit Med 2024; 7:126. [PMID: 38773304 PMCID: PMC11109198 DOI: 10.1038/s41746-024-01127-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 04/25/2024] [Indexed: 05/23/2024] Open
Affiliation(s)
- Lea Goetz
- Artificial Intelligence and Machine Learning, GSK, London, UK.
| | - Nabeel Seedat
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK.
| | | | - Mihaela van der Schaar
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
- Cambridge Centre for AI in Medicine, University of Cambridge, Cambridge, UK
| |
Collapse
|
41
|
Gao M, Jiang H, Hu Y, Ren Q, Xie Z, Liu J. Suppressing label noise in medical image classification using mixup attention and self-supervised learning. Phys Med Biol 2024; 69:105026. [PMID: 38636495 DOI: 10.1088/1361-6560/ad4083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Accepted: 04/18/2024] [Indexed: 04/20/2024]
Abstract
Deep neural networks (DNNs) have been widely applied in medical image classification and achieve remarkable classification performance. These achievements heavily depend on large-scale accurately annotated training data. However, label noise is inevitably introduced in the medical image annotation, as the labeling process heavily relies on the expertise and experience of annotators. Meanwhile, DNNs suffer from overfitting noisy labels, degrading the performance of models. Therefore, in this work, we innovatively devise a noise-robust training approach to mitigate the adverse effects of noisy labels in medical image classification. Specifically, we incorporate contrastive learning and intra-group mixup attention strategies into vanilla supervised learning. The contrastive learning for feature extractor helps to enhance visual representation of DNNs. The intra-group mixup attention module constructs groups and assigns self-attention weights for group-wise samples, and subsequently interpolates massive noisy-suppressed samples through weighted mixup operation. We conduct comparative experiments on both synthetic and real-world noisy medical datasets under various noise levels. Rigorous experiments validate that our noise-robust method with contrastive learning and mixup attention can effectively handle with label noise, and is superior to state-of-the-art methods. An ablation study also shows that both components contribute to boost model performance. The proposed method demonstrates its capability of curb label noise and has certain potential toward real-world clinic applications.
Collapse
Affiliation(s)
- Mengdi Gao
- College of Chemistry and Life Science, Beijing University of Technology, Beijing, People's Republic of China
- Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing, People's Republic of China
| | - Hongyang Jiang
- Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, People's Republic of China
- Research Institute of Trustworthy Autonomous Systems, Southern University of Science and Technology, Shenzhen 518055, People's Republic of China
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong SAR, People's Republic of China
| | - Yan Hu
- Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, People's Republic of China
- Research Institute of Trustworthy Autonomous Systems, Southern University of Science and Technology, Shenzhen 518055, People's Republic of China
| | - Qiushi Ren
- Department of Biomedical Engineering, College of Future Technology, Peking University, Beijing 100871, People's Republic of China
| | - Zhaoheng Xie
- Institute of Medical Technology, Peking University Health Science Center, Peking University, Beijing 100191, People's Republic of China
| | - Jiang Liu
- Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, People's Republic of China
- Research Institute of Trustworthy Autonomous Systems, Southern University of Science and Technology, Shenzhen 518055, People's Republic of China
| |
Collapse
|
42
|
Xia X, Lu P, Gong C, Han B, Yu J, Yu J, Liu T. Regularly Truncated M-Estimators for Learning With Noisy Labels. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:3522-3536. [PMID: 38153827 DOI: 10.1109/tpami.2023.3347850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2023]
Abstract
The sample selection approach is very popular in learning with noisy labels. As deep networks "learn pattern first", prior methods built on sample selection share a similar training procedure: the small-loss examples can be regarded as clean examples and used for helping generalization, while the large-loss examples are treated as mislabeled ones and excluded from network parameter updates. However, such a procedure is arguably debatable from two folds: (a) it does not consider the bad influence of noisy labels in selected small-loss examples; (b) it does not make good use of the discarded large-loss examples, which may be clean or have meaningful information for generalization. In this paper, we propose regularly truncated M-estimators (RTME) to address the above two issues simultaneously. Specifically, RTME can alternately switch modes between truncated M-estimators and original M-estimators. The former can adaptively select small-losses examples without knowing the noise rate and reduce the side-effects of noisy labels in them. The latter makes the possibly clean examples but with large losses involved to help generalization. Theoretically, we demonstrate that our strategies are label-noise-tolerant. Empirically, comprehensive experimental results show that our method can outperform multiple baselines and is robust to broad noise types and levels.
Collapse
|
43
|
Ding C, Guo Z, Rudin C, Xiao R, Shah A, Do DH, Lee RJ, Clifford G, Nahab FB, Hu X. Learning From Alarms: A Robust Learning Approach for Accurate Photoplethysmography-Based Atrial Fibrillation Detection Using Eight Million Samples Labeled With Imprecise Arrhythmia Alarms. IEEE J Biomed Health Inform 2024; 28:2650-2661. [PMID: 38300786 PMCID: PMC11270897 DOI: 10.1109/jbhi.2024.3360952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Atrial fibrillation (AF) is a common cardiac arrhythmia with serious health consequences if not detected and treated early. Detecting AF using wearable devices with photoplethysmography (PPG) sensors and deep neural networks has demonstrated some success using proprietary algorithms in commercial solutions. However, to improve continuous AF detection in ambulatory settings towards a population-wide screening use case, we face several challenges, one of which is the lack of large-scale labeled training data. To address this challenge, we propose to leverage AF alarms from bedside patient monitors to label concurrent PPG signals, resulting in the largest PPG-AF dataset so far (8.5 M 30-second records from 24,100 patients) and demonstrating a practical approach to build large labeled PPG datasets. Furthermore, we recognize that the AF labels thus obtained contain errors because of false AF alarms generated from imperfect built-in algorithms from bedside monitors. Dealing with label noise with unknown distribution characteristics in this case requires advanced algorithms. We, therefore, introduce and open-source a novel loss design, the cluster membership consistency (CMC) loss, to mitigate label errors. By comparing CMC with state-of-the-art methods selected from a noisy label competition, we demonstrate its superiority in handling label noise in PPG data, resilience to poor-quality signals, and computational efficiency.
Collapse
|
44
|
Nandakumar N, Hsu D, Ahmed R, Venkataraman A. A DEEP LEARNING FRAMEWORK TO CHARACTERIZE NOISY LABELS IN EPILEPTOGENIC ZONE LOCALIZATION USING FUNCTIONAL CONNECTIVITY. PROCEEDINGS. IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING 2024; 2024:10.1109/isbi56570.2024.10635583. [PMID: 39464200 PMCID: PMC11500830 DOI: 10.1109/isbi56570.2024.10635583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
Resting-sate fMRI (rs-fMRI) has emerged as a viable tool to localize the epileptogenic zone (EZ) in medication refractory focal epilepsy patients. However, due to clinical protocol, datasets with reliable labels for the EZ are scarce. Some studies have used the entire resection area from post-operative structural T1 scans to act as the ground truth EZ labels during training and testing. These labels are subject to noise, as usually the resection area will be larger than the actual EZ tissue. We develop a mathematical framework for characterizing noisy labels in EZ localization. We use a multi-task deep learning framework to identify both the probability of a noisy label as well as the localization prediction for each ROI. We train our framework on a simulated dataset derived from the Human Connectome Project and evaluate it on both the simulated and a clinical epilepsy dataset. We show superior localization performance in our method against published localization networks on both the real and simulated dataset.
Collapse
Affiliation(s)
- Naresh Nandakumar
- Department of Electrical and Computer Engineering, Johns Hopkins University, USA
| | - David Hsu
- Department of Neurology, University of Wisconsin School of Medicine, USA
| | - Raheel Ahmed
- Department of Neurosurgery, University of Wisconsin School of Medicine, USA
| | - Archana Venkataraman
- Department of Electrical and Computer Engineering, Johns Hopkins University, USA
- Department of Electrical and Computer Engineering, Boston University, USA
| |
Collapse
|
45
|
Aksoy AK, Ravanbakhsh M, Demir B. Multi-Label Noise Robust Collaborative Learning for Remote Sensing Image Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:6438-6451. [PMID: 36264722 DOI: 10.1109/tnnls.2022.3209992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The development of accurate methods for multi-label classification (MLC) of remote sensing (RS) images is one of the most important research topics in RS. The MLC methods based on convolutional neural networks (CNNs) have shown strong performance gains in RS. However, they usually require a high number of reliable training images annotated with multiple land-cover class labels. Collecting such data is time-consuming and costly. To address this problem, the publicly available thematic products, which can include noisy labels, can be used to annotate RS images with zero-labeling cost. However, multi-label noise (which can be associated with wrong and missing label annotations) can distort the learning process of the MLC methods. To address this problem, we propose a novel multi-label noise robust collaborative learning (RCML) method to alleviate the negative effects of multi-label noise during the training phase of a CNN model. RCML identifies, ranks, and excludes noisy multi-labels in RS images based on three main modules: 1) the discrepancy module; 2) the group lasso module; and 3) the swap module. The discrepancy module ensures that the two networks learn diverse features, while producing the same predictions. The task of the group lasso module is to detect the potentially noisy labels assigned to multi-labeled training images, while the swap module is devoted to exchange the ranking information between two networks. Unlike the existing methods that make assumptions about noise distribution, our proposed RCML does not make any prior assumption about the type of noise in the training set. The experiments conducted on two multi-label RS image archives confirm the robustness of the proposed RCML under extreme multi-label noise rates. Our code is publicly available at: https://www.noisy-labels-in-rs.org.
Collapse
|
46
|
Shakya KS, Alavi A, Porteous J, K P, Laddi A, Jaiswal M. A Critical Analysis of Deep Semi-Supervised Learning Approaches for Enhanced Medical Image Classification. INFORMATION 2024; 15:246. [DOI: 10.3390/info15050246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2025] Open
Abstract
Deep semi-supervised learning (DSSL) is a machine learning paradigm that blends supervised and unsupervised learning techniques to improve the performance of various models in computer vision tasks. Medical image classification plays a crucial role in disease diagnosis, treatment planning, and patient care. However, obtaining labeled medical image data is often expensive and time-consuming for medical practitioners, leading to limited labeled datasets. DSSL techniques aim to address this challenge, particularly in various medical image tasks, to improve model generalization and performance. DSSL models leverage both the labeled information, which provides explicit supervision, and the unlabeled data, which can provide additional information about the underlying data distribution. That offers a practical solution to resource-intensive demands of data annotation, and enhances the model’s ability to generalize across diverse and previously unseen data landscapes. The present study provides a critical review of various DSSL approaches and their effectiveness and challenges in enhancing medical image classification tasks. The study categorized DSSL techniques into six classes: consistency regularization method, deep adversarial method, pseudo-learning method, graph-based method, multi-label method, and hybrid method. Further, a comparative analysis of performance for six considered methods is conducted using existing studies. The referenced studies have employed metrics such as accuracy, sensitivity, specificity, AUC-ROC, and F1 score to evaluate the performance of DSSL methods on different medical image datasets. Additionally, challenges of the datasets, such as heterogeneity, limited labeled data, and model interpretability, were discussed and highlighted in the context of DSSL for medical image classification. The current review provides future directions and considerations to researchers to further address the challenges and take full advantage of these methods in clinical practices.
Collapse
Affiliation(s)
- Kaushlesh Singh Shakya
- Academy of Scientific & Innovative Research (AcSIR), Ghaziabad 201002, India
- CSIR-Central Scientific Instruments Organisation, Chandigarh 160030, India
- School of Computing Technologies, RMIT University, Melbourne, VIC 3000, Australia
| | - Azadeh Alavi
- School of Computing Technologies, RMIT University, Melbourne, VIC 3000, Australia
| | - Julie Porteous
- School of Computing Technologies, RMIT University, Melbourne, VIC 3000, Australia
| | - Priti K
- Academy of Scientific & Innovative Research (AcSIR), Ghaziabad 201002, India
- CSIR-Central Scientific Instruments Organisation, Chandigarh 160030, India
| | - Amit Laddi
- Academy of Scientific & Innovative Research (AcSIR), Ghaziabad 201002, India
- CSIR-Central Scientific Instruments Organisation, Chandigarh 160030, India
| | - Manojkumar Jaiswal
- Oral Health Sciences Centre, Post Graduate Institute of Medical Education & Research (PGIMER), Chandigarh 160012, India
| |
Collapse
|
47
|
Samadi ME, Mirzaieazar H, Mitsos A, Schuppert A. Noisecut: a python package for noise-tolerant classification of binary data using prior knowledge integration and max-cut solutions. BMC Bioinformatics 2024; 25:155. [PMID: 38641616 PMCID: PMC11031902 DOI: 10.1186/s12859-024-05769-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 04/09/2024] [Indexed: 04/21/2024] Open
Abstract
BACKGROUND Classification of binary data arises naturally in many clinical applications, such as patient risk stratification through ICD codes. One of the key practical challenges in data classification using machine learning is to avoid overfitting. Overfitting in supervised learning primarily occurs when a model learns random variations from noisy labels in training data rather than the underlying patterns. While traditional methods such as regularization and early stopping have demonstrated effectiveness in interpolation tasks, addressing overfitting in the classification of binary data, in which predictions always amount to extrapolation, demands extrapolation-enhanced strategies. One such approach is hybrid mechanistic/data-driven modeling, which integrates prior knowledge on input features into the learning process, enhancing the model's ability to extrapolate. RESULTS We present NoiseCut, a Python package for noise-tolerant classification of binary data by employing a hybrid modeling approach that leverages solutions of defined max-cut problems. In a comparative analysis conducted on synthetically generated binary datasets, NoiseCut exhibits better overfitting prevention compared to the early stopping technique employed by different supervised machine learning algorithms. The noise tolerance of NoiseCut stems from a dropout strategy that leverages prior knowledge of input features and is further enhanced by the integration of max-cut problems into the learning process. CONCLUSIONS NoiseCut is a Python package for the implementation of hybrid modeling for the classification of binary data. It facilitates the integration of mechanistic knowledge on the input features into learning from data in a structured manner and proves to be a valuable classification tool when the available training data is noisy and/or limited in size. This advantage is especially prominent in medical and biomedical applications where data scarcity and noise are common challenges. The codebase, illustrations, and documentation for NoiseCut are accessible for download at https://pypi.org/project/noisecut/ . The implementation detailed in this paper corresponds to the version 0.2.1 release of the software.
Collapse
Affiliation(s)
- Moein E Samadi
- Institute for Computational Biomedicine, RWTH Aachen University, Aachen, Germany
| | - Hedieh Mirzaieazar
- Institute for Computational Biomedicine, RWTH Aachen University, Aachen, Germany
| | - Alexander Mitsos
- Process Systems Engineering (AVT.SVT), RWTH Aachen University, Aachen, Germany
| | - Andreas Schuppert
- Institute for Computational Biomedicine, RWTH Aachen University, Aachen, Germany.
| |
Collapse
|
48
|
Ding C, Xiao R, Wang W, Holdsworth E, Hu X. Photoplethysmography based atrial fibrillation detection: a continually growing field. Physiol Meas 2024; 45:04TR01. [PMID: 38530307 PMCID: PMC11744514 DOI: 10.1088/1361-6579/ad37ee] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Revised: 02/24/2024] [Accepted: 03/26/2024] [Indexed: 03/27/2024]
Abstract
Objective. Atrial fibrillation (AF) is a prevalent cardiac arrhythmia associated with significant health ramifications, including an elevated susceptibility to ischemic stroke, heart disease, and heightened mortality. Photoplethysmography (PPG) has emerged as a promising technology for continuous AF monitoring for its cost-effectiveness and widespread integration into wearable devices. Our team previously conducted an exhaustive review on PPG-based AF detection before June 2019. However, since then, more advanced technologies have emerged in this field.Approach. This paper offers a comprehensive review of the latest advancements in PPG-based AF detection, utilizing digital health and artificial intelligence (AI) solutions, within the timeframe spanning from July 2019 to December 2022. Through extensive exploration of scientific databases, we have identified 57 pertinent studies.Significance. Our comprehensive review encompasses an in-depth assessment of the statistical methodologies, traditional machine learning techniques, and deep learning approaches employed in these studies. In addition, we address the challenges encountered in the domain of PPG-based AF detection. Furthermore, we maintain a dedicated website to curate the latest research in this area, with regular updates on a regular basis.
Collapse
Affiliation(s)
- Cheng Ding
- Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, GA, United States of America
- The Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA, United States of America
| | - Ran Xiao
- Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, GA, United States of America
| | - Weijia Wang
- Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, GA, United States of America
| | - Elizabeth Holdsworth
- Georgia Tech Library, Georgia Institute of Technology, Atlanta, GA, United States of America
| | - Xiao Hu
- Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, GA, United States of America
- The Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA, United States of America
- Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, GA, United States of America
| |
Collapse
|
49
|
Liang F, Li Q, Li X, Liu Y, Wang W. CAC: Confidence-Aware Co-Training for Weakly Supervised Crack Segmentation. ENTROPY (BASEL, SWITZERLAND) 2024; 26:328. [PMID: 38667882 PMCID: PMC11049554 DOI: 10.3390/e26040328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Revised: 03/29/2024] [Accepted: 04/07/2024] [Indexed: 04/28/2024]
Abstract
Automatic crack segmentation plays an essential role in maintaining the structural health of buildings and infrastructure. Despite the success in fully supervised crack segmentation, the costly pixel-level annotation restricts its application, leading to increased exploration in weakly supervised crack segmentation (WSCS). However, WSCS methods inevitably bring in noisy pseudo-labels, which results in large fluctuations. To address this problem, we propose a novel confidence-aware co-training (CAC) framework for WSCS. This framework aims to iteratively refine pseudo-labels, facilitating the learning of a more robust segmentation model. Specifically, a co-training mechanism is designed and constructs two collaborative networks to learn uncertain crack pixels, from easy to hard. Moreover, the dynamic division strategy is designed to divide the pseudo-labels based on the crack confidence score. Among them, the high-confidence pseudo-labels are utilized to optimize the initialization parameters for the collaborative network, while low-confidence pseudo-labels enrich the diversity of crack samples. Extensive experiments conducted on the Crack500, DeepCrack, and CFD datasets demonstrate that the proposed CAC significantly outperforms other WSCS methods.
Collapse
Affiliation(s)
- Fengjiao Liang
- Key Laboratory of Big Data Artificial Intelligence in Transportation (Beijing Jiaotong University), Ministry of Education, Beijing 100044, China; (F.L.); (Q.L.); (Y.L.)
| | - Qingyong Li
- Key Laboratory of Big Data Artificial Intelligence in Transportation (Beijing Jiaotong University), Ministry of Education, Beijing 100044, China; (F.L.); (Q.L.); (Y.L.)
| | - Xiaobao Li
- School of Computer Science and Technology, Jiangsu Normal University, Xuzhou 221116, China;
| | - Yang Liu
- Key Laboratory of Big Data Artificial Intelligence in Transportation (Beijing Jiaotong University), Ministry of Education, Beijing 100044, China; (F.L.); (Q.L.); (Y.L.)
| | - Wen Wang
- Key Laboratory of Big Data Artificial Intelligence in Transportation (Beijing Jiaotong University), Ministry of Education, Beijing 100044, China; (F.L.); (Q.L.); (Y.L.)
| |
Collapse
|
50
|
Jiang Z, Seyedi S, Griner E, Abbasi A, Rad AB, Kwon H, Cotes RO, Clifford GD. Multimodal Mental Health Digital Biomarker Analysis From Remote Interviews Using Facial, Vocal, Linguistic, and Cardiovascular Patterns. IEEE J Biomed Health Inform 2024; 28:1680-1691. [PMID: 38198249 PMCID: PMC10986761 DOI: 10.1109/jbhi.2024.3352075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2024]
Abstract
OBJECTIVE Psychiatric evaluation suffers from subjectivity and bias, and is hard to scale due to intensive professional training requirements. In this work, we investigated whether behavioral and physiological signals, extracted from tele-video interviews, differ in individuals with psychiatric disorders. METHODS Temporal variations in facial expression, vocal expression, linguistic expression, and cardiovascular modulation were extracted from simultaneously recorded audio and video of remote interviews. Averages, standard deviations, and Markovian process-derived statistics of these features were computed from 73 subjects. Four binary classification tasks were defined: detecting 1) any clinically-diagnosed psychiatric disorder, 2) major depressive disorder, 3) self-rated depression, and 4) self-rated anxiety. Each modality was evaluated individually and in combination. RESULTS Statistically significant feature differences were found between psychiatric and control subjects. Correlations were found between features and self-rated depression and anxiety scores. Heart rate dynamics provided the best unimodal performance with areas under the receiver-operator curve (AUROCs) of 0.68-0.75 (depending on the classification task). Combining multiple modalities provided AUROCs of 0.72-0.82. CONCLUSION Multimodal features extracted from remote interviews revealed informative characteristics of clinically diagnosed and self-rated mental health status. SIGNIFICANCE The proposed multimodal approach has the potential to facilitate scalable, remote, and low-cost assessment for low-burden automated mental health services.
Collapse
|