1
|
Li G, Yu Z, Yang K, Chen CLP, Li X. Ensemble-Enhanced Semi-Supervised Learning With Optimized Graph Construction for High-Dimensional Data. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025; 47:1103-1119. [PMID: 39446542 DOI: 10.1109/tpami.2024.3486319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/26/2024]
Abstract
Graph-based methods have demonstrated exceptional performance in semi-supervised classification. However, existing graph-based methods typically construct either a predefined graph in the original space or an adaptive graph within the output space, which often limits their ability to fully utilize prior information and capture the optimal intrinsic data distribution, particularly in high-dimensional data with abundant redundant and noisy features. This paper introduces a novel approach: Semi-Supervised Classification with Optimized Graph Construction (SSC-OGC). SSC-OGC leverages both predefined and adaptive graphs to explore intrinsic data distribution and effectively employ prior information. Additionally, a graph constraint regularization term (GCR) and a collaborative constraint regularization term (CCR) are incorporated to further enhance the quality of the adaptive graph structure and the learned subspace, respectively. To eliminate the negative effect of constructing a predefined graph in the original data space, we further propose a Hybrid Subspace Ensemble-enhanced framework based on the proposed Optimized Graph Construction method (HSE-OGC). Specifically, we construct multiple hybrid subspaces, which consist of meticulously chosen features from the original data to achieve high-quality and diverse space representations. Then, HSE-OGC constructs multiple predefined graphs within hybrid subspaces and trains multiple SSC-OGC classifiers to complement each other, significantly improving the overall performance. Experimental results conducted on various high-dimensional datasets demonstrate that HSE-OGC exhibits outstanding performance.
Collapse
|
2
|
Sheng H, Chen L, Zhao Y, Long X, Chen Q, Wu C, Li B, Fei Y, Mi L, Ma J. Closed, one-stop intelligent and accurate particle characterization based on micro-Raman spectroscopy and digital microfluidics. Talanta 2024; 266:124895. [PMID: 37454511 DOI: 10.1016/j.talanta.2023.124895] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Revised: 06/19/2023] [Accepted: 07/01/2023] [Indexed: 07/18/2023]
Abstract
Monoclonal antibodies are prone to form protein particles through aggregation, fragmentation, and oxidation under varying stress conditions during the manufacturing, shipping, and storage of parenteral drug products. According to pharmacopeia requirements, sub-visible particle levels need to be controlled throughout the shelf life of the product. Therefore, in addition to determining particle counts, it is crucial to accurately characterize particles in drug product to understand the stress condition of exposure and to implement appropriate mitigation actions for a specific formulation. In this study, we developed a new method for intelligent characterization of protein particles using micro-Raman spectroscopy on a digital microfluidic chip (DMF). Several microliters of protein particle solutions induced by stress degradation were loaded onto a DMF chip to generate multiple droplets for Raman spectroscopy testing. By training multiple machine learning classification models on the obtained Raman spectra of protein particles, eight types of protein particles were successfully characterized and predicted with high classification accuracy (93%-100%). The advantages of the novel particle characterization method proposed in this study include a closed system to prevent particle contamination, one-stop testing of morphological and chemical structure information, low sample volume consumption, reusable particle droplets, and simplified data analysis with high classification accuracy. It provides great potential to determine the probable root cause of the particle source or stress conditions by a single testing, so that an accurate particle control strategy can be developed and ultimately extend the product shelf-life.
Collapse
Affiliation(s)
- Han Sheng
- Institute of Biomedical Engineering and Technology, Academy for Engineer and Technology, Fudan University, 220 Handan Road, Shanghai, 200433, China
| | - Liwen Chen
- Shanghai Engineering Research Center of Ultra-precision Optical Manufacturing, Key Laboratory of Micro and Nano Photonic Structures (Ministry of Education), Green Photoelectron Platform, Department of Optical Science and Engineering, Fudan University, 220 Handan Road, Shanghai, 200433, China; Ruidge Biotech Co. Ltd., No. 888, Huanhu West 2nd Road, Lin-Gang Special Area, China (Shanghai) Pilot Free Trade Zone, Shanghai, 200131, China
| | - Yinping Zhao
- Institute of Biomedical Engineering and Technology, Academy for Engineer and Technology, Fudan University, 220 Handan Road, Shanghai, 200433, China
| | - Xiangan Long
- Institute of Biomedical Engineering and Technology, Academy for Engineer and Technology, Fudan University, 220 Handan Road, Shanghai, 200433, China
| | - Qiushu Chen
- Shanghai Engineering Research Center of Ultra-precision Optical Manufacturing, Key Laboratory of Micro and Nano Photonic Structures (Ministry of Education), Green Photoelectron Platform, Department of Optical Science and Engineering, Fudan University, 220 Handan Road, Shanghai, 200433, China
| | - Chuanyong Wu
- Shanghai Hengxin BioTechnology, Ltd., 1688 North Guo Quan Rd, Bldg A8, Rm 801, Shanghai, 200438, China
| | - Bei Li
- State Key Laboratory of Applied Optics, Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, No.3888 Dong Nanhu Road, Changchun, Jilin, 130033, China
| | - Yiyan Fei
- Shanghai Engineering Research Center of Ultra-precision Optical Manufacturing, Key Laboratory of Micro and Nano Photonic Structures (Ministry of Education), Green Photoelectron Platform, Department of Optical Science and Engineering, Fudan University, 220 Handan Road, Shanghai, 200433, China
| | - Lan Mi
- Shanghai Engineering Research Center of Ultra-precision Optical Manufacturing, Key Laboratory of Micro and Nano Photonic Structures (Ministry of Education), Green Photoelectron Platform, Department of Optical Science and Engineering, Fudan University, 220 Handan Road, Shanghai, 200433, China.
| | - Jiong Ma
- Institute of Biomedical Engineering and Technology, Academy for Engineer and Technology, Fudan University, 220 Handan Road, Shanghai, 200433, China; Shanghai Engineering Research Center of Ultra-precision Optical Manufacturing, Key Laboratory of Micro and Nano Photonic Structures (Ministry of Education), Green Photoelectron Platform, Department of Optical Science and Engineering, Fudan University, 220 Handan Road, Shanghai, 200433, China; Shanghai Engineering Research Center of Industrial Microorganisms, The Multiscale Research Institute of Complex Systems (MRICS), School of Life Sciences, Fudan University, 220 Handan Road, Shanghai, 200433, China.
| |
Collapse
|
3
|
A multiple criteria ensemble pruning method for binary classification based on D-S theory of evidence. INT J MACH LEARN CYB 2023. [DOI: 10.1007/s13042-022-01690-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2023]
|
4
|
Xia S, Zheng Y, Wang G, He P, Li H, Chen Z. Random Space Division Sampling for Label-Noisy Classification or Imbalanced Classification. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:10444-10457. [PMID: 33909577 DOI: 10.1109/tcyb.2021.3070005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
This article presents a simple sampling method, which is very easy to be implemented, for classification by introducing the idea of random space division, called "random space division sampling" (RSDS). It can extract the boundary points as the sampled result by efficiently distinguishing the label noise points, inner points, and boundary points. This makes it the first general sampling method for classification that not only can reduce the data size but also enhance the classification accuracy of a classifier, especially in the label-noisy classification. The "general" means that it is not restricted to any specific classifiers or datasets (regardless of whether a dataset is linear or not). Furthermore, the RSDS can online accelerate most classifiers because of its lower time complexity than most classifiers. Moreover, the RSDS can be used as an undersampling method for imbalanced classification. The experimental results on benchmark datasets demonstrate its effectiveness and efficiency. The code of the RSDS and comparison algorithms is available at: https://github.com/syxiaa/RSDS.
Collapse
|
5
|
Vahmiyan M, Kheirabadi M, Akbari E. Feature selection methods in microarray gene expression data: a systematic mapping study. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07661-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/07/2022]
|
6
|
Yu Z, Lan K, Liu Z, Han G. Progressive Ensemble Kernel-Based Broad Learning System for Noisy Data Classification. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:9656-9669. [PMID: 33784632 DOI: 10.1109/tcyb.2021.3064821] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The broad learning system (BLS) is an algorithm that facilitates feature representation learning and data classification. Although weights of BLS are obtained by analytical computation, which brings better generalization and higher efficiency, BLS suffers from two drawbacks: 1) the performance depends on the number of hidden nodes, which requires manual tuning, and 2) double random mappings bring about the uncertainty, which leads to poor resistance to noise data, as well as unpredictable effects on performance. To address these issues, a kernel-based BLS (KBLS) method is proposed by projecting feature nodes obtained from the first random mapping into kernel space. This manipulation reduces the uncertainty, which contributes to performance improvements with the fixed number of hidden nodes, and indicates that manually tuning is no longer needed. Moreover, to further improve the stability and noise resistance of KBLS, a progressive ensemble framework is proposed, in which the residual of the previous base classifiers is used to train the following base classifier. We conduct comparative experiments against the existing state-of-the-art hierarchical learning methods on multiple noisy real-world datasets. The experimental results indicate our approaches achieve the best or at least comparable performance in terms of accuracy.
Collapse
|
7
|
Zhu S, Xu L, Goodman ED. Hierarchical Topology-Based Cluster Representation for Scalable Evolutionary Multiobjective Clustering. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:9846-9860. [PMID: 34106873 DOI: 10.1109/tcyb.2021.3081988] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Evolutionary multiobjective clustering (MOC) algorithms have shown promising potential to outperform conventional single-objective clustering algorithms, especially when the number of clusters k is not set before clustering. However, the computational burden becomes a tricky problem due to the extensive search space and fitness computational time of the evolving population, especially when the data size is large. This article proposes a new, hierarchical, topology-based cluster representation for scalable MOC, which can simplify the search procedure and decrease computational overhead. A coarse-to-fine-trained topological structure that fits the spatial distribution of the data is utilized to identify a set of seed points/nodes, then a tree-based graph is built to represent clusters. During optimization, a bipartite graph partitioning strategy incorporated with the graph nodes helps in performing a cluster ensemble operation to generate offspring solutions more effectively. For the determination of the final result, which is underexplored in the existing methods, the usage of a cluster ensemble strategy is also presented, whether k is provided or not. Comparison experiments are conducted on a series of different data distributions, revealing the superiority of the proposed algorithm in terms of both clustering performance and computing efficiency.
Collapse
|
8
|
Ke J, Gong C, Liu T, Zhao L, Yang J, Tao D. Laplacian Welsch Regularization for Robust Semisupervised Learning. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:164-177. [PMID: 32149703 DOI: 10.1109/tcyb.2019.2953337] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Semisupervised learning (SSL) has been widely used in numerous practical applications where the labeled training examples are inadequate while the unlabeled examples are abundant. Due to the scarcity of labeled examples, the performances of the existing SSL methods are often affected by the outliers in the labeled data, leading to the imperfect trained classifier. To enhance the robustness of SSL methods to the outliers, this article proposes a novel SSL algorithm called Laplacian Welsch regularization (LapWR). Specifically, apart from the conventional Laplacian regularizer, we also introduce a bounded, smooth, and nonconvex Welsch loss which can suppress the adverse effect brought by the labeled outliers. To handle the model nonconvexity caused by the Welsch loss, an iterative half-quadratic (HQ) optimization algorithm is adopted in which each subproblem has an ideal closed-form solution. To handle the large datasets, we further propose an accelerated model by utilizing the Nyström method to reduce the computational complexity of LapWR. Theoretically, the generalization bound of LapWR is derived based on analyzing its Rademacher complexity, which suggests that our proposed algorithm is guaranteed to obtain satisfactory performance. By comparing LapWR with the existing representative SSL algorithms on various benchmark and real-world datasets, we experimentally found that LapWR performs robustly to outliers and is able to consistently achieve the top-level results.
Collapse
|
9
|
Yu Z, Ye F, Yang K, Cao W, Chen CLP, Cheng L, You J, Wong HS. Semisupervised Classification With Novel Graph Construction for High-Dimensional Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:75-88. [PMID: 33048763 DOI: 10.1109/tnnls.2020.3027526] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Graph-based methods have achieved impressive performance on semisupervised classification (SSC). Traditional graph-based methods have two main drawbacks. First, the graph is predefined before training a classifier, which does not leverage the interactions between the classifier training and similarity matrix learning. Second, when handling high-dimensional data with noisy or redundant features, the graph constructed in the original input space is actually unsuitable and may lead to poor performance. In this article, we propose an SSC method with novel graph construction (SSC-NGC), in which the similarity matrix is optimized in both label space and an additional subspace to get a better and more robust result than in original data space. Furthermore, to obtain a high-quality subspace, we learn the projection matrix of the additional subspace by preserving the local and global structure of the data. Finally, we intergrade the classifier training, the graph construction, and the subspace learning into a unified framework. With this framework, the classifier parameters, similarity matrix, and projection matrix of subspace are adaptively learned in an iterative scheme to obtain an optimal joint result. We conduct extensive comparative experiments against state-of-the-art methods over multiple real-world data sets. Experimental results demonstrate the superiority of the proposed method over other state-of-the-art algorithms.
Collapse
|
10
|
Huang S, Liu Z, Jin W, Mu Y. Broad learning system with manifold regularized sparse features for semi-supervised classification. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.08.052] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
11
|
Ding W, Abdel-Basset M, Hawash H. RCTE: A reliable and consistent temporal-ensembling framework for semi-supervised segmentation of COVID-19 lesions. Inf Sci (N Y) 2021; 578:559-573. [PMID: 34305162 PMCID: PMC8294559 DOI: 10.1016/j.ins.2021.07.059] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 06/17/2021] [Accepted: 07/17/2021] [Indexed: 12/16/2022]
Abstract
The segmentation of COVID-19 lesions from computed tomography (CT) scans is crucial to develop an efficient automated diagnosis system. Deep learning (DL) has shown success in different segmentation tasks. However, an efficient DL approach requires a large amount of accurately annotated data, which is difficult to aggregate owing to the urgent situation of COVID-19. Inaccurate annotation can easily occur without experts, and segmentation performance is substantially worsened by noisy annotations. Therefore, this study presents a reliable and consistent temporal-ensembling (RCTE) framework for semi-supervised lesion segmentation. A segmentation network is integrated into a teacher-student architecture to segment infection regions from a limited number of annotated CT scans and a large number of unannotated CT scans. The network generates reliable and unreliable targets, and to evenly handle these targets potentially degrades performance. To address this, a reliable teacher-student architecture is introduced, where a reliable teacher network is the exponential moving average (EMA) of a reliable student network that is reliably renovated by restraining the student involvement to EMA when its loss is larger. We also present a noise-aware loss based on improvements to generalized cross-entropy loss to lead the segmentation performance toward noisy annotations. Comprehensive analysis validates the robustness of RCTE over recent cutting-edge semi-supervised segmentation techniques, with a 65.87% Dice score.
Collapse
Affiliation(s)
- Weiping Ding
- School of Information Science and Technology, Nantong University, Nantong 226019, China
| | - Mohamed Abdel-Basset
- Zagazig Univesitry, Shaibet an Nakareyah, Zagazig 2, 44519 Ash Sharqia Governorate, Egypt
| | - Hossam Hawash
- Zagazig Univesitry, Shaibet an Nakareyah, Zagazig 2, 44519 Ash Sharqia Governorate, Egypt
| |
Collapse
|
12
|
Shi Y, Yu Z, Chen CLP, You J, Wong HS, Wang Y, Zhang J. Transfer Clustering Ensemble Selection. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:2872-2885. [PMID: 30596592 DOI: 10.1109/tcyb.2018.2885585] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Clustering ensemble (CE) takes multiple clustering solutions into consideration in order to effectively improve the accuracy and robustness of the final result. To reduce redundancy as well as noise, a CE selection (CES) step is added to further enhance performance. Quality and diversity are two important metrics of CES. However, most of the CES strategies adopt heuristic selection methods or a threshold parameter setting to achieve tradeoff between quality and diversity. In this paper, we propose a transfer CES (TCES) algorithm which makes use of the relationship between quality and diversity in a source dataset, and transfers it into a target dataset based on three objective functions. Furthermore, a multiobjective self-evolutionary process is designed to optimize these three objective functions. Finally, we construct a transfer CE framework (TCE-TCES) based on TCES to obtain better clustering results. The experimental results on 12 transfer clustering tasks obtained from the 20newsgroups dataset show that TCE-TCES can find a better tradeoff between quality and diversity, as well as obtaining more desirable clustering results.
Collapse
|
13
|
Zhang Y, Michi A, Wagner J, Andre E, Schuller B, Weninger F. A Generic Human-Machine Annotation Framework Based on Dynamic Cooperative Learning. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:1230-1239. [PMID: 30872254 DOI: 10.1109/tcyb.2019.2901499] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The task of obtaining meaningful annotations is a tedious work, incurring considerable costs and time consumption. Dynamic active learning and cooperative learning are recently proposed approaches to reduce human effort of annotating data with subjective phenomena. In this paper, we introduce a novel generic annotation framework, with the aim to achieve the optimal tradeoff between label reliability and cost reduction by making efficient use of human and machine work force. To this end, we use dropout to assess model uncertainty and thereby to decide which instances can be automatically labeled by the machine and which ones require human inspection. In addition, we propose an early stopping criterion based on inter-rater agreement in order to focus human resources on those ambiguous instances that are difficult to label. In contrast to the existing algorithms, the new confidence measures are not only applicable to binary classification tasks but also regression problems. The proposed method is evaluated on the benchmark datasets for non-native English prosody estimation, provided in the Interspeech computational paralinguistics challenge. In the result, the novel dynamic cooperative learning algorithm yields 0.424 Spearman's correlation coefficient compared to 0.413 with passive learning, while reducing the amount of human annotations by 74%.
Collapse
|