1
|
Patel AN, Srinivasan K. Deep learning paradigms in lung cancer diagnosis: A methodological review, open challenges, and future directions. Phys Med 2025; 131:104914. [PMID: 39938402 DOI: 10.1016/j.ejmp.2025.104914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 12/19/2024] [Accepted: 01/30/2025] [Indexed: 02/14/2025] Open
Abstract
Lung cancer is the leading cause of global cancer-related deaths, which emphasizes the critical importance of early diagnosis in enhancing patient outcomes. Deep learning has demonstrated significant promise in lung cancer diagnosis, excelling in nodule detection, classification, and prognosis prediction. This methodological review comprehensively explores deep learning models' application in lung cancer diagnosis, uncovering their integration across various imaging modalities. Deep learning consistently achieves state-of-the-art performance, occasionally surpassing human expert accuracy. Notably, deep neural networks excel in detecting lung nodules, distinguishing between benign and malignant nodules, and predicting patient prognosis. They have also led to the development of computer-aided diagnosis systems, enhancing diagnostic accuracy for radiologists. This review follows the specified criteria for article selection outlined by PRISMA framework. Despite challenges such as data quality and interpretability limitations, this review emphasizes the potential of deep learning to significantly improve the precision and efficiency of lung cancer diagnosis, facilitating continued research efforts to overcome these obstacles and fully harness neural network's transformative impact in this field.
Collapse
Affiliation(s)
- Aryan Nikul Patel
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India.
| | - Kathiravan Srinivasan
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India.
| |
Collapse
|
2
|
Gao J, Huang Z, Lei Y, Shan H, Wang JZ, Wang FY, Zhang J. Deep Rank-Consistent Pyramid Model for Enhanced Crowd Counting. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:299-312. [PMID: 38090870 DOI: 10.1109/tnnls.2023.3336774] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]
Abstract
Most conventional crowd counting methods utilize a fully-supervised learning framework to establish a mapping between scene images and crowd density maps. They usually rely on a large quantity of costly and time-intensive pixel-level annotations for training supervision. One way to mitigate the intensive labeling effort and improve counting accuracy is to leverage large amounts of unlabeled images. This is attributed to the inherent self-structural information and rank consistency within a single image, offering additional qualitative relation supervision during training. Contrary to earlier methods that utilized the rank relations at the original image level, we explore such rank-consistency relation within the latent feature spaces. This approach enables the incorporation of numerous pyramid partial orders, strengthening the model representation capability. A notable advantage is that it can also increase the utilization ratio of unlabeled samples. Specifically, we propose a Deep Rank-consist Ent pyrAmid Model (DREAM), which makes full use of rank consistency across coarse-to-fine pyramid features in latent spaces for enhanced crowd counting with massive unlabeled images. In addition, we have collected a new unlabeled crowd counting dataset, FUDAN-UCC, comprising 4000 images for training purposes. Extensive experiments on four benchmark datasets, namely UCF-QNRF, ShanghaiTech PartA and PartB, and UCF-CC-50, show the effectiveness of our method compared with previous semi-supervised methods. The codes are available at https://github.com/bridgeqiqi/DREAM.
Collapse
|
3
|
Zhu P, Li J, Cao B, Hu Q. Multi-Task Credible Pseudo-Label Learning for Semi-Supervised Crowd Counting. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:10394-10406. [PMID: 37022812 DOI: 10.1109/tnnls.2023.3241211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
As a widely used semi-supervised learning strategy, self-training generates pseudo-labels to alleviate the labor-intensive and time-consuming annotation problems in crowd counting while boosting the model performance with limited labeled data and massive unlabeled data. However, the noise in the pseudo-labels of the density maps greatly hinders the performance of semi-supervised crowd counting. Although auxiliary tasks, e.g., binary segmentation, are utilized to help improve the feature representation learning ability, they are isolated from the main task, i.e., density map regression and the multi-task relationships are totally ignored. To address the above issues, we develop a multi-task credible pseudo-label learning (MTCP) framework for crowd counting, consisting of three multi-task branches, i.e., density regression as the main task, and binary segmentation and confidence prediction as the auxiliary tasks. Multi-task learning is conducted on the labeled data by sharing the same feature extractor for all three tasks and taking multi-task relations into account. To reduce epistemic uncertainty, the labeled data are further expanded, by trimming the labeled data according to the predicted confidence map for low-confidence regions, which can be regarded as an effective data augmentation strategy. For unlabeled data, compared with the existing works that only use the pseudo-labels of binary segmentation, we generate credible pseudo-labels of density maps directly, which can reduce the noise in pseudo-labels and therefore decrease aleatoric uncertainty. Extensive comparisons on four crowd-counting datasets demonstrate the superiority of our proposed model over the competing methods. The code is available at: https://github.com/ljq2000/MTCP.
Collapse
|
4
|
Shu W, Wan J, Chan AB. Generalized Characteristic Function Loss for Crowd Analysis in the Frequency Domain. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:2882-2899. [PMID: 37995158 DOI: 10.1109/tpami.2023.3336196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2023]
Abstract
Typical approaches that learn crowd density maps are limited to extracting the supervisory information from the loosely organized spatial information in the crowd dot/density maps. This paper tackles this challenge by performing the supervision in the frequency domain. More specifically, we devise a new loss function for crowd analysis called generalized characteristic function loss (GCFL). This loss carries out two steps: 1) transforming the spatial information in density or dot maps to the frequency domain; 2) calculating a loss value between their frequency contents. For step 1, we establish a series of theoretical fundaments by extending the definition of the characteristic function for probability distributions to density maps, as well as proving some vital properties of the extended characteristic function. After taking the characteristic function of the density map, its information in the frequency domain is well-organized and hierarchically distributed, while in the spatial domain it is loose-organized and dispersed everywhere. In step 2, we design a loss function that can fit the information organization in the frequency domain, allowing the exploitation of the well-organized frequency information for the supervision of crowd analysis tasks. The loss function can be adapted to various crowd analysis tasks through the specification of its window functions. In this paper, we demonstrate its power in three tasks: Crowd Counting, Crowd Localization and Noisy Crowd Counting. We show the advantages of our GCFL compared to other SOTA losses and its competitiveness to other SOTA methods by theoretical analysis and empirical results on benchmark datasets. Our codes are available at https://github.com/wbshu/Crowd_Counting_in_the_Frequency_Domain.
Collapse
|
5
|
Eldele E, Ragab M, Chen Z, Wu M, Kwoh CK, Li X, Guan C. Self-Supervised Contrastive Representation Learning for Semi-Supervised Time-Series Classification. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:15604-15618. [PMID: 37639415 DOI: 10.1109/tpami.2023.3308189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Learning time-series representations when only unlabeled data or few labeled samples are available can be a challenging task. Recently, contrastive self-supervised learning has shown great improvement in extracting useful representations from unlabeled data via contrasting different augmented views of data. In this work, we propose a novel Time-Series representation learning framework via Temporal and Contextual Contrasting (TS-TCC) that learns representations from unlabeled data with contrastive learning. Specifically, we propose time-series-specific weak and strong augmentations and use their views to learn robust temporal relations in the proposed temporal contrasting module, besides learning discriminative representations by our proposed contextual contrasting module. Additionally, we conduct a systematic study of time-series data augmentation selection, which is a key part of contrastive learning. We also extend TS-TCC to the semi-supervised learning settings and propose a Class-Aware TS-TCC (CA-TCC) that benefits from the available few labeled data to further improve representations learned by TS-TCC. Specifically, we leverage the robust pseudo labels produced by TS-TCC to realize a class-aware contrastive loss. Extensive experiments show that the linear evaluation of the features learned by our proposed framework performs comparably with the fully supervised training. Additionally, our framework shows high efficiency in few labeled data and transfer learning scenarios.
Collapse
|
6
|
Li Y, Tang Y, Liu Y, Zheng D. Semi-supervised Counting of Grape Berries in the Field Based on Density Mutual Exclusion. PLANT PHENOMICS (WASHINGTON, D.C.) 2023; 5:0115. [PMID: 38033720 PMCID: PMC10684290 DOI: 10.34133/plantphenomics.0115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 10/29/2023] [Indexed: 12/02/2023]
Abstract
Automated counting of grape berries has become one of the most important tasks in grape yield prediction. However, dense distribution of berries and the severe occlusion between berries bring great challenges to counting algorithm based on deep learning. The collection of data required for model training is also a tedious and expensive work. To address these issues and cost-effectively count grape berries, a semi-supervised counting of grape berries in the field based on density mutual exclusion (CDMENet) is proposed. The algorithm uses VGG16 as the backbone to extract image features. Auxiliary tasks based on density mutual exclusion are introduced. The tasks exploit the spatial distribution pattern of grape berries in density levels to make full use of unlabeled data. In addition, a density difference loss is designed. The feature representation is enhanced by amplifying the difference of features between different density levels. The experimental results on the field grape berry dataset show that CDMENet achieves less counting errors. Compared with the state of the arts, coefficient of determination (R2) is improved by 6.10%, and mean absolute error and root mean square error are reduced by 49.36% and 54.08%, respectively. The code is available at https://github.com/youth-tang/CDMENet-main.
Collapse
Affiliation(s)
- Yanan Li
- School of Computer Science and Engineering, School of Artificial Intelligence,
Wuhan Institute of Technology, Wuhan 430205, China
- Hubei Key Laboratory of Intelligent Robot,
Wuhan Institute of Technology, Wuhan 430073, China
| | - Yuling Tang
- School of Computer Science and Engineering, School of Artificial Intelligence,
Wuhan Institute of Technology, Wuhan 430205, China
- Hubei Key Laboratory of Intelligent Robot,
Wuhan Institute of Technology, Wuhan 430073, China
| | - Yifei Liu
- School of Computer Science and Engineering, School of Artificial Intelligence,
Wuhan Institute of Technology, Wuhan 430205, China
- Hubei Key Laboratory of Intelligent Robot,
Wuhan Institute of Technology, Wuhan 430073, China
| | - Dingrun Zheng
- School of Computer Science and Engineering, School of Artificial Intelligence,
Wuhan Institute of Technology, Wuhan 430205, China
- Hubei Key Laboratory of Intelligent Robot,
Wuhan Institute of Technology, Wuhan 430073, China
| |
Collapse
|
7
|
Wang Y, Lin J, Cai Q, Pan Y, Yao T, Chao H, Mei T. A Low Rank Promoting Prior for Unsupervised Contrastive Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:2667-2681. [PMID: 35679387 DOI: 10.1109/tpami.2022.3180995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Unsupervised learning is just at a tipping point where it could really take off. Among these approaches, contrastive learning has led to state-of-the-art performance. In this paper, we construct a novel probabilistic graphical model that effectively incorporates the low rank promoting prior into the framework of contrastive learning, referred to as LORAC. In contrast to the existing conventional self-supervised approaches that only considers independent learning, our hypothesis explicitly requires that all the samples belonging to the same instance class lie on the same subspace with small dimension. This heuristic poses particular joint learning constraints to reduce the degree of freedom of the problem during the search of the optimal network parameterization. Most importantly, we argue that the low rank prior employed here is not unique, and many different priors can be invoked in a similar probabilistic way, corresponding to different hypotheses about underlying truth behind the contrastive features. Empirical evidences show that the proposed algorithm clearly surpasses the state-of-the-art approaches on multiple benchmarks, including image classification, object detection, instance segmentation and keypoint detection. Code is available: https://github.com/ssl-codelab/lorac.
Collapse
|
8
|
Peng S, Yin B, Yang Q, He Q, Wang L. Exploring density rectification and domain adaption method for crowd counting. Neural Comput Appl 2023; 35:3551-3569. [PMID: 36267471 PMCID: PMC9568950 DOI: 10.1007/s00521-022-07917-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 09/30/2022] [Indexed: 01/31/2023]
Abstract
Crowd counting has received increasing attention due to its important roles in multiple fields, such as social security, commercial applications, epidemic prevention and control. To this end, we explore two critical issues that seriously affect the performance of crowd counting including nonuniform crowd density distribution and cross-domain problems. Aiming at the nonuniform crowd density distribution issue, we propose a density rectifying network (DRNet) that consists of several dual-layer pyramid fusion modules (DPFM) and a density rectification map (DRmap) auxiliary learning module. The proposed DPFM is embedded into DRNet to integrate multi-scale crowd density features through dual-layer pyramid fusion. The devised DRmap auxiliary learning module further rectifies the incorrect crowd density estimation by adaptively weighting the initial crowd density maps. With respect to the cross-domain issue, we develop a domain adaptation method of randomly cutting mixed dual-domain images, which learns domain-invariance features and decreases the domain gap between the source domain and the target domain from global and local perspectives. Experimental results indicate that the devised DRNet achieves the best mean absolute error (MAE) and competitive mean squared error (MSE) compared with other excellent methods on four benchmark datasets. Additionally, a series of cross-domain experiments are conducted to demonstrate the effectiveness of the proposed domain adaption method. Significantly, when the A and B parts of the Shanghaitech dataset are the source domain and target domain respectively, the proposed domain adaption method decreases the MAE of DRNet by 47.6 % .
Collapse
Affiliation(s)
- Sifan Peng
- grid.59053.3a0000000121679639Department of Automation, University of Science and Technology of China, Huangshan Road, Hefei, 230027 Anhui China
| | - Baoqun Yin
- grid.59053.3a0000000121679639Department of Automation, University of Science and Technology of China, Huangshan Road, Hefei, 230027 Anhui China
| | - Qianqian Yang
- grid.59053.3a0000000121679639Department of Automation, University of Science and Technology of China, Huangshan Road, Hefei, 230027 Anhui China
| | - Qing He
- grid.59053.3a0000000121679639Department of Automation, University of Science and Technology of China, Huangshan Road, Hefei, 230027 Anhui China
| | - Luyang Wang
- grid.59053.3a0000000121679639Department of Automation, University of Science and Technology of China, Huangshan Road, Hefei, 230027 Anhui China
| |
Collapse
|
9
|
Liu W, Salzmann M, Fua P. Counting People by Estimating People Flows. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:8151-8166. [PMID: 34351854 DOI: 10.1109/tpami.2021.3102690] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Modern methods for counting people in crowded scenes rely on deep networks to estimate people densities in individual images. As such, only very few take advantage of temporal consistency in video sequences, and those that do only impose weak smoothness constraints across consecutive frames. In this paper, we advocate estimating people flows across image locations between consecutive images and inferring the people densities from these flows instead of directly regressing them. This enables us to impose much stronger constraints encoding the conservation of the number of people. As a result, it significantly boosts performance without requiring a more complex architecture. Furthermore, it allows us to exploit the correlation between people flow and optical flow to further improve the results. We also show that leveraging people conservation constraints in both a spatial and temporal manner makes it possible to train a deep crowd counting model in an active learning setting with much fewer annotations. This significantly reduces the annotation cost while still leading to similar performance to the full supervision case.
Collapse
|
10
|
WSNet: A local–global consistent traffic density estimation method based on weakly supervised learning. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
11
|
Learning to rank method combining multi-head self-attention with conditional generative adversarial nets. ARRAY 2022. [DOI: 10.1016/j.array.2022.100205] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
12
|
Wang R, Cheung CF, Wang C, Cheng MN. Deep learning characterization of surface defects in the selective laser melting process. COMPUT IND 2022. [DOI: 10.1016/j.compind.2022.103662] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
13
|
Luo T, Liu Y. Machine truth serum: a surprisingly popular approach to improving ensemble methods. Mach Learn 2022. [DOI: 10.1007/s10994-022-06183-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
14
|
Madhusudana PC, Birkbeck N, Wang Y, Adsumilli B, Bovik AC. Image Quality Assessment Using Contrastive Learning. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:4149-4161. [PMID: 35700254 DOI: 10.1109/tip.2022.3181496] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
We consider the problem of obtaining image quality representations in a self-supervised manner. We use prediction of distortion type and degree as an auxiliary task to learn features from an unlabeled image dataset containing a mixture of synthetic and realistic distortions. We then train a deep Convolutional Neural Network (CNN) using a contrastive pairwise objective to solve the auxiliary problem. We refer to the proposed training framework and resulting deep IQA model as the CONTRastive Image QUality Evaluator (CONTRIQUE). During evaluation, the CNN weights are frozen and a linear regressor maps the learned representations to quality scores in a No-Reference (NR) setting. We show through extensive experiments that CONTRIQUE achieves competitive performance when compared to state-of-the-art NR image quality models, even without any additional fine-tuning of the CNN backbone. The learned representations are highly robust and generalize well across images afflicted by either synthetic or authentic distortions. Our results suggest that powerful quality representations with perceptual relevance can be obtained without requiring large labeled subjective image quality datasets. The implementations used in this paper are available at https://github.com/pavancm/CONTRIQUE.
Collapse
|
15
|
Chen H, Zhou Y, Li J, Wei XS, Xiao L. Self-Supervised Multi-Category Counting Networks for Automatic Check-Out. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:3004-3016. [PMID: 35380962 DOI: 10.1109/tip.2022.3163527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The practical task of Automatic Check-Out (ACO) is to accurately predict the presence and count of each product in an arbitrary product combination. Beyond the large-scale and the fine-grained nature of product categories as its main challenges, products are always continuously updated in realistic check-out scenarios, which is also required to be solved in an ACO system. Previous work in this research line almost depends on the supervisions of labor-intensive bounding boxes of products by performing a detection paradigm. While, in this paper, we propose a Self-Supervised Multi-Category Counting (S2MC2) network to leverage the point-level supervisions of products in check-out images to both lower the labeling cost and be able to return ACO predictions in a class incremental setting. Specifically, as a backbone, our S2MC2 is built upon a counting module in a class-agnostic counting fashion. Also, it consists of several crucial components including an attention module for capturing fine-grained patterns and a domain adaptation module for reducing the domain gap between single product images as training and check-out images as test. Furthermore, a self-supervised approach is utilized in S2MC2 to initialize the parameters of its backbone for better performance. By conducting comprehensive experiments on the large-scale automatic check-out dataset RPC, we demonstrate that our proposed S2MC2 achieves superior accuracy in both traditional and incremental settings of ACO tasks over the competing baselines.
Collapse
|
16
|
RoiSeg: An Effective Moving Object Segmentation Approach Based on Region-of-Interest with Unsupervised Learning. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12052674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Traditional video object segmentation often has low detection speed and inaccurate results due to the jitter caused by the pan-and-tilt or hand-held devices. Deep neural network (DNN) has been widely adopted to address these problems; however, it relies on a large number of annotated data and high-performance computing units. Therefore, DNN is not suitable for some special scenarios (e.g., no prior knowledge or powerful computing ability). In this paper, we propose RoiSeg, an effective moving object segmentation approach based on Region-of-Interest (ROI), which utilizes unsupervised learning method to achieve automatic segmentation of moving objects. Specifically, we first hypothesize that the central n × n pixels of images act as the ROI to represent the features of the segmented moving object. Second, we pool the ROI to a central point of the foreground to simplify the segmentation problem into a classification problem based on ROI. Third but not the least, we implement a trajectory-based classifier and an online updating mechanism to address the classification problem and the compensation of class imbalance, respectively. We conduct extensive experiments to evaluate the performance of RoiSeg and the experimental results demonstrate that RoiSeg is more accurate and faster compared with other segmentation algorithms. Moreover, RoiSeg not only effectively handles ambient lighting changes, fog, salt and pepper noise, but also has a good ability to deal with camera jitter and windy scenes.
Collapse
|
17
|
Fan Z, Zhang H, Zhang Z, Lu G, Zhang Y, Wang Y. A survey of crowd counting and density estimation based on convolutional neural network. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.02.103] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
18
|
Song R, Zhu C, Zhang L, Zhang T, Luo Y, Liu J, Yang J. Dual-branch network via pseudo-label training for thyroid nodule detection in ultrasound image. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02967-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
19
|
|
20
|
Diers J, Pigorsch C. Self‐supervised learning for outlier detection. Stat (Int Stat Inst) 2021. [DOI: 10.1002/sta4.322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Jan Diers
- Economic and Social Statistics Friedrich‐Schiller‐University Jena Fürstengraben 1 Jena 07743 Germany
| | - Christian Pigorsch
- Economic and Social Statistics Friedrich‐Schiller‐University Jena Fürstengraben 1 Jena 07743 Germany
| |
Collapse
|
21
|
Applying Self-Supervised Learning to Medicine: Review of the State of the Art and Medical Implementations. INFORMATICS 2021. [DOI: 10.3390/informatics8030059] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Machine learning has become an increasingly ubiquitous technology, as big data continues to inform and influence everyday life and decision-making. Currently, in medicine and healthcare, as well as in most other industries, the two most prevalent machine learning paradigms are supervised learning and transfer learning. Both practices rely on large-scale, manually annotated datasets to train increasingly complex models. However, the requirement of data to be manually labeled leaves an excess of unused, unlabeled data available in both public and private data repositories. Self-supervised learning (SSL) is a growing area of machine learning that can take advantage of unlabeled data. Contrary to other machine learning paradigms, SSL algorithms create artificial supervisory signals from unlabeled data and pretrain algorithms on these signals. The aim of this review is two-fold: firstly, we provide a formal definition of SSL, divide SSL algorithms into their four unique subsets, and review the state of the art published in each of those subsets between the years of 2014 and 2020. Second, this work surveys recent SSL algorithms published in healthcare, in order to provide medical experts with a clearer picture of how they can integrate SSL into their research, with the objective of leveraging unlabeled data.
Collapse
|
22
|
Wen Z, Liu Z, Zhang S, Pan Q. Rotation Awareness Based Self-Supervised Learning for SAR Target Recognition With Limited Training Samples. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:7266-7279. [PMID: 34403341 DOI: 10.1109/tip.2021.3104179] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The scattering signatures of a synthetic aperture radar (SAR) target image will be highly sensitive to different azimuth angles/poses, which aggravates the demand for training samples in learning-based SAR image automatic target recognition (ATR) algorithms, and makes SAR ATR a more challenging task. This paper develops a novel rotation awareness-based learning framework termed RotANet for SAR ATR under the condition of limited training samples. First, we propose an encoding scheme to characterize the rotational pattern of pose variations among intra-class targets. These targets will constitute several ordered sequences with different rotational patterns via permutations. By further exploiting the intrinsic relation constraints among these sequences as the supervision, we develop a novel self-supervised task which makes RotANet learn to predict the rotational pattern of a baseline sequence and then autonomously generalize this ability to the others without external supervision. Therefore, this task essentially contains a learning and self-validation process to achieve human-like rotation awareness, and it serves as a task-induced prior to regularize the learned feature domain of RotANet in conjunction with an individual target recognition task to improve the generalization ability of the features. Extensive experiments on moving and stationary target acquisition and recognition benchmark database demonstrate the effectiveness of our proposed framework. Compared with other state-of-the-art SAR ATR algorithms, RotANet will remarkably improve the recognition accuracy especially in the case of very limited training samples without performing any other data augmentation strategy.
Collapse
|
23
|
Sam DB, Peri SV, Sundararaman MN, Kamath A, Babu RV. Locate, Size, and Count: Accurately Resolving People in Dense Crowds via Detection. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:2739-2751. [PMID: 32086197 DOI: 10.1109/tpami.2020.2974830] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We introduce a detection framework for dense crowd counting and eliminate the need for the prevalent density regression paradigm. Typical counting models predict crowd density for an image as opposed to detecting every person. These regression methods, in general, fail to localize persons accurate enough for most applications other than counting. Hence, we adopt an architecture that locates every person in the crowd, sizes the spotted heads with bounding box and then counts them. Compared to normal object or face detectors, there exist certain unique challenges in designing such a detection system. Some of them are direct consequences of the huge diversity in dense crowds along with the need to predict boxes contiguously. We solve these issues and develop our LSC-CNN model, which can reliably detect heads of people across sparse to dense crowds. LSC-CNN employs a multi-column architecture with top-down feature modulation to better resolve persons and produce refined predictions at multiple resolutions. Interestingly, the proposed training regime requires only point head annotation, but can estimate approximate size information of heads. We show that LSC-CNN not only has superior localization than existing density regressors, but outperforms in counting as well. The code for our approach is available at https://github.com/val-iisc/lsc-cnn.
Collapse
|
24
|
Wang Q, Gao J, Lin W, Li X. NWPU-Crowd: A Large-Scale Benchmark for Crowd Counting and Localization. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:2141-2149. [PMID: 32750840 DOI: 10.1109/tpami.2020.3013269] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
In the last decade, crowd counting and localization attract much attention of researchers due to its wide-spread applications, including crowd monitoring, public safety, space design, etc. Many convolutional neural networks (CNN) are designed for tackling this task. However, currently released datasets are so small-scale that they can not meet the needs of the supervised CNN-based algorithms. To remedy this problem, we construct a large-scale congested crowd counting and localization dataset, NWPU-Crowd, consisting of 5,109 images, in a total of 2,133,375 annotated heads with points and boxes. Compared with other real-world datasets, it contains various illumination scenes and has the largest density range ( 0 ∼ 20,033). Besides, a benchmark website is developed for impartially evaluating the different methods, which allows researchers to submit the results of the test set. Based on the proposed dataset, we further describe the data characteristics, evaluate the performance of some mainstream state-of-the-art (SOTA) methods, and analyze the new problems that arise on the new data. What's more, the benchmark is deployed at https://www.crowdbenchmark.com/, and the dataset/code/models/results are available at https://gjy3035.github.io/NWPU-Crowd-Sample-Code/.
Collapse
|
25
|
Few-labeled visual recognition for self-driving using multi-view visual-semantic representation. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.02.128] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
26
|
|
27
|
Celona L, Schettini R. A Genetic Algorithm to Combine Deep Features for the Aesthetic Assessment of Images Containing Faces. SENSORS 2021; 21:s21041307. [PMID: 33673052 PMCID: PMC7918760 DOI: 10.3390/s21041307] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 02/08/2021] [Accepted: 02/09/2021] [Indexed: 11/30/2022]
Abstract
The automatic assessment of the aesthetic quality of a photo is a challenging and extensively studied problem. Most of the existing works focus on the aesthetic quality assessment of photos regardless of the depicted subject and mainly use features extracted from the entire image. It has been observed that the performance of generic content aesthetic assessment methods significantly decreases when it comes to images depicting faces. This paper introduces a method for evaluating the aesthetic quality of images with faces by encoding both the properties of the entire image and specific aspects of the face. Three different convolutional neural networks are exploited to encode information regarding perceptual quality, global image aesthetics, and facial attributes; then, a model is trained to combine these features to explicitly predict the aesthetics of images containing faces. Experimental results show that our approach outperforms existing methods for both binary, i.e., low/high, and continuous aesthetic score prediction on four different image databases in the state-of-the-art.
Collapse
|
28
|
Guo Q, Zeng X, Hu S, Phoummixay S, Ye Y. Learning a deep network with cross-hierarchy aggregation for crowd counting. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2020.106691] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
29
|
Zhang K, Wang H, Liu W, Li M, Lu J, Liu Z. An efficient semi-supervised manifold embedding for crowd counting. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106634] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
30
|
Wang Y, Zhang W, Liu Y, Zhu J. Two-branch fusion network with attention map for crowd counting. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.06.034] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
31
|
Anvarjon T, Mustaqeem, Kwon S. Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features. SENSORS (BASEL, SWITZERLAND) 2020; 20:E5212. [PMID: 32932723 PMCID: PMC7570673 DOI: 10.3390/s20185212] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 09/09/2020] [Accepted: 09/10/2020] [Indexed: 01/09/2023]
Abstract
Artificial intelligence (AI) and machine learning (ML) are employed to make systems smarter. Today, the speech emotion recognition (SER) system evaluates the emotional state of the speaker by investigating his/her speech signal. Emotion recognition is a challenging task for a machine. In addition, making it smarter so that the emotions are efficiently recognized by AI is equally challenging. The speech signal is quite hard to examine using signal processing methods because it consists of different frequencies and features that vary according to emotions, such as anger, fear, sadness, happiness, boredom, disgust, and surprise. Even though different algorithms are being developed for the SER, the success rates are very low according to the languages, the emotions, and the databases. In this paper, we propose a new lightweight effective SER model that has a low computational complexity and a high recognition accuracy. The suggested method uses the convolutional neural network (CNN) approach to learn the deep frequency features by using a plain rectangular filter with a modified pooling strategy that have more discriminative power for the SER. The proposed CNN model was trained on the extracted frequency features from the speech data and was then tested to predict the emotions. The proposed SER model was evaluated over two benchmarks, which included the interactive emotional dyadic motion capture (IEMOCAP) and the berlin emotional speech database (EMO-DB) speech datasets, and it obtained 77.01% and 92.02% recognition results. The experimental results demonstrated that the proposed CNN-based SER system can achieve a better recognition performance than the state-of-the-art SER systems.
Collapse
Affiliation(s)
- Tursunov Anvarjon
- Interaction Technology Laboratory, Department of Software, Sejong University, Seoul 05006, Korea
| | - Mustaqeem
- Interaction Technology Laboratory, Department of Software, Sejong University, Seoul 05006, Korea
| | - Soonil Kwon
- Interaction Technology Laboratory, Department of Software, Sejong University, Seoul 05006, Korea
| |
Collapse
|
32
|
Sun Y, Jin J, Wu X, Ma T, Yang J. Counting Crowds with Perspective Distortion Correction via Adaptive Learning. SENSORS 2020; 20:s20133781. [PMID: 32640552 PMCID: PMC7374275 DOI: 10.3390/s20133781] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Revised: 06/18/2020] [Accepted: 07/03/2020] [Indexed: 12/01/2022]
Abstract
The goal of crowd counting is to estimate the number of people in the image. Presently, use regression to count people number became a mainstream method. It is worth noting that, with the development of convolutional neural networks (CNN), methods that are based on CNN have become a research hotspot. It is a more interesting topic that how to locate the site of the person in the image than simply predicting the number of people in the image. The perspective transformation present is still a challenge, because perspective distortion will cause differences in the size of the crowd in the image. To devote perspective distortion and locate the site of the person more accuracy, we design a novel framework named Adaptive Learning Network (CAL). We use the VGG as the backbone. After each pooling layer is output, we collect the 1/2, 1/4, 1/8, and 1/16 features of the original image and combine them with the weights learned by an adaptive learning branch. The object of our adaptive learning branch is each image in the datasets. By combining the output features of different sizes of each image, the challenge of drastic changes in the size of the image crowd due to perspective transformation is reduced. We conducted experiments on four population counting data sets (i.e., ShanghaiTech Part A, ShanghaiTech Part B, UCF_CC_50 and UCF-QNRF), and the results show that our model has a good performance.
Collapse
|
33
|
|
34
|
Multi-level feature fusion based Locality-Constrained Spatial Transformer network for video crowd counting. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.01.087] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|