1
|
Phan MH, Phung SL, Luu K, Bouzerdoum A. Efficient Hyperspectral Image Segmentation for Biosecurity Scanning Using Knowledge Distillation from Multi-head Teacher. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.06.095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
2
|
Peng X, Bouzerdoum A, Phung SL. A Trajectory-Based Method for Dynamic Scene Recognition. INT J PATTERN RECOGN 2021. [DOI: 10.1142/s0218001421500294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Existing methods for dynamic scene recognition mostly use global features extracted from the entire video frame or a video segment. In this paper, a trajectory-based dynamic scene recognition method is proposed. A trajectory is formed by a pixel moving across consecutive frames of a video segment. The local regions surrounding the trajectory provide useful appearance and motion information about a portion of the video segment. The proposed method works at several stages. First, dense and evenly distributed trajectories are extracted from a video segment. Then, the fully-connected-layer features are extracted from each trajectory using a pre-trained Convolutional Neural Networks (CNNs) model, forming a feature sequence. Next, these feature sequences are fed into a Long-Short-Term-Memory (LSTM) network to learn their temporal behavior. Finally, by aggregating the information of the trajectories, a global representation of the video segment can be obtained for classification purposes. The LSTM is trained using synthetic trajectory feature sequences instead of real ones. The synthetic feature sequences are generated with a series of generative adversarial networks (GANs). In addition to classification, category-specific discriminative trajectories are located in a video segment, which help reveal what portions of a video segment are more important than others. This is achieved by formulating an optimization problem to learn discriminative part detectors for all categories simultaneously. Experimental results on two benchmark dynamic scene datasets show that the proposed method is very competitive with six other methods.
Collapse
Affiliation(s)
- Xiaoming Peng
- School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, Wollongong, New South Wales 2500, Australia
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, P. R. China
| | - Abdesselam Bouzerdoum
- School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, Wollongong, New South Wales 2500, Australia
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Son Lam Phung
- School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, Wollongong, New South Wales 2500, Australia
| |
Collapse
|
3
|
Peng X, Bouzerdoum A, Phung SL. A part-based spatial and temporal aggregation method for dynamic scene recognition. Neural Comput Appl 2021. [DOI: 10.1007/s00521-020-05415-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
4
|
Nguyen TNA, Phung SL, Bouzerdoum A. Hybrid Deep Learning-Gaussian Process Network for Pedestrian Lane Detection in Unstructured Scenes. IEEE Trans Neural Netw Learn Syst 2020; 31:5324-5338. [PMID: 32071001 DOI: 10.1109/tnnls.2020.2966246] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Pedestrian lane detection is an important task in many assistive and autonomous navigation systems. This article presents a new approach for pedestrian lane detection in unstructured environments, where the pedestrian lanes can have arbitrary surfaces with no painted markers. In this approach, a hybrid deep learning-Gaussian process (DL-GP) network is proposed to segment a scene image into lane and background regions. The network combines a compact convolutional encoder-decoder net and a powerful nonparametric hierarchical GP classifier. The resulting network with a smaller number of trainable parameters helps mitigate the overfitting problem while maintaining the modeling power. In addition to the segmentation output for each test image, the network also generates a map of uncertainty-a measure that is negatively correlated with the confidence level with which we can trust the segmentation. This measure is important for pedestrian lane-detection applications, since its prediction affects the safety of its users. We also introduce a new data set of 5000 images for training and evaluating the pedestrian lane-detection algorithms. This data set is expected to facilitate research in pedestrian lane detection, especially the application of DL in this area. Evaluated on this data set, the proposed network shows significant performance improvements compared with several existing methods.
Collapse
|
5
|
Le AT, Tran LC, Huang X, Ritz C, Dutkiewicz E, Phung SL, Bouzerdoum A, Franklin D. Unbalanced Hybrid AOA/RSSI Localization for Simplified Wireless Sensor Networks. Sensors (Basel) 2020; 20:s20143838. [PMID: 32660069 PMCID: PMC7411761 DOI: 10.3390/s20143838] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Revised: 07/01/2020] [Accepted: 07/06/2020] [Indexed: 11/22/2022]
Abstract
Source positioning using hybrid angle-of-arrival (AOA) estimation and received signal strength indicator (RSSI) is attractive because no synchronization is required among unknown nodes and anchors. Conventionally, hybrid AOA/RSSI localization combines the same number of these measurements to estimate the agents’ locations. However, since AOA estimation requires anchors to be equipped with large antenna arrays and complicated signal processing, this conventional combination makes the wireless sensor network (WSN) complicated. This paper proposes an unbalanced integration of the two measurements, called 1AOA/nRSSI, to simplify the WSN. Instead of using many anchors with large antenna arrays, the proposed method only requires one master anchor to provide one AOA estimation, while other anchors are simple single-antenna transceivers. By simply transforming the 1AOA/1RSSI information into two corresponding virtual anchors, the problem of integrating one AOA and N RSSI measurements is solved using the least square and subspace methods. The solutions are then evaluated to characterize the impact of angular and distance measurement errors. Simulation results show that the proposed network achieves the same level of precision as in a fully hybrid nAOA/nRSSI network with a slightly higher number of simple anchors.
Collapse
Affiliation(s)
- Anh Tuyen Le
- School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, Wollongong 2522, Australia; (A.T.L.); (C.R.); (S.L.P.); (A.B.)
| | - Le Chung Tran
- School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, Wollongong 2522, Australia; (A.T.L.); (C.R.); (S.L.P.); (A.B.)
- Correspondence:
| | - Xiaojing Huang
- School of Electrical and Data Engineering, University of Technology Sydney, Ultimo 2007, Australia; (X.H.); (E.D.); (D.F.)
| | - Christian Ritz
- School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, Wollongong 2522, Australia; (A.T.L.); (C.R.); (S.L.P.); (A.B.)
| | - Eryk Dutkiewicz
- School of Electrical and Data Engineering, University of Technology Sydney, Ultimo 2007, Australia; (X.H.); (E.D.); (D.F.)
| | - Son Lam Phung
- School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, Wollongong 2522, Australia; (A.T.L.); (C.R.); (S.L.P.); (A.B.)
| | - Abdesselam Bouzerdoum
- School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, Wollongong 2522, Australia; (A.T.L.); (C.R.); (S.L.P.); (A.B.)
- Division of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, Qatar
| | - Daniel Franklin
- School of Electrical and Data Engineering, University of Technology Sydney, Ultimo 2007, Australia; (X.H.); (E.D.); (D.F.)
| |
Collapse
|
6
|
Duong STM, Phung SL, Bouzerdoum A, Schira MM. An unsupervised deep learning technique for susceptibility artifact correction in reversed phase-encoding EPI images. Magn Reson Imaging 2020; 71:1-10. [PMID: 32407764 DOI: 10.1016/j.mri.2020.04.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Revised: 03/17/2020] [Accepted: 04/11/2020] [Indexed: 10/24/2022]
Abstract
Echo planar imaging (EPI) is a fast and non-invasive magnetic resonance imaging technique that supports data acquisition at high spatial and temporal resolutions. However, susceptibility artifacts, which cause the misalignment to the underlying structural image, are unavoidable distortions in EPI. Traditional susceptibility artifact correction (SAC) methods estimate the displacement field by optimizing an objective function that involves one or more pairs of reversed phase-encoding (PE) images. The estimated displacement field is then used to unwarp the distorted images and produce the corrected images. Since this conventional approach is time-consuming, we propose an end-to-end deep learning technique, named S-Net, to correct the susceptibility artifacts the reversed-PE image pair. The proposed S-Net consists of two components: (i) a convolutional neural network to map a reversed-PE image pair to the displacement field; and (ii) a spatial transform unit to unwarp the input images and produce the corrected images. The S-Net is trained using a set of reversed-PE image pairs and an unsupervised loss function, without ground-truth data. For a new image pair of reversed-PE images, the displacement field and corrected images are obtained simultaneously by evaluating the trained S-Net directly. Evaluations on three different datasets demonstrate that S-Net can correct the susceptibility artifacts in the reversed-PE images. Compared with two state-of-the-art SAC methods (TOPUP and TISAC), the proposed S-Net runs significantly faster: 20 times faster than TISAC and 369 times faster than TOPUP, while achieving a similar correction accuracy. Consequently, S-Net accelerates the medical image processing pipelines and makes the real-time correction for MRI scanners feasible. Our proposed technique also opens up a new direction in learning-based SAC.
Collapse
Affiliation(s)
- Soan T M Duong
- School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, Australia.
| | - Son L Phung
- School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, Australia
| | - Abdesselam Bouzerdoum
- School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, Australia; ICT Division, College of Science and Engineering, Hamad Bin Khalifa University, Qatar
| | - Mark M Schira
- School of Psychology, University of Wollongong, Australia
| |
Collapse
|
7
|
Duong STM, Phung SL, Bouzerdoum A, Boyd Taylor HG, Puckett AM, Schira MM. Susceptibility artifact correction for sub-millimeter fMRI using inverse phase encoding registration and T1 weighted regularization. J Neurosci Methods 2020; 336:108625. [PMID: 32061690 DOI: 10.1016/j.jneumeth.2020.108625] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Revised: 01/13/2020] [Accepted: 02/03/2020] [Indexed: 11/27/2022]
Abstract
BACKGROUND Functional magnetic resonance imaging (fMRI) enables non-invasive examination of both the structure and the function of the human brain. The prevalence of high spatial-resolution (sub-millimeter) fMRI has triggered new research on the intra-cortex, such as cortical columns and cortical layers. At present, echo-planar imaging (EPI) is used exclusively to acquire fMRI data; however, susceptibility artifacts are unavoidable. These distortions are especially severe in high spatial-resolution images and can lead to misrepresentation of brain function in fMRI experiments. NEW METHOD This paper presents a new method for correcting susceptibility artifacts by combining a T1-weighted (T1w) image and inverse phase-encoding (PE) based registration. The latter uses two EPI images acquired using identical sequences but with inverse-PE directions. In the proposed method, the T1w image is used to regularize the registration, and to select the regularization parameters automatically. The motivation is that the T1w image is considered to reflect the anatomical structure of the brain. RESULTS Our proposed method is evaluated on two sub-millimeter EPI-fMRI datasets, acquired using 3T and 7T scanners. Experiments show that the proposed method provides improved corrections that are well-aligned to the T1w image. COMPARISON WITH EXISTING METHODS The proposed method provides more robust and sharper corrections and runs faster compared with two other state-of-the-art inverse-PE based correction methods, i.e. HySCO and TOPUP. CONCLUSIONS The proposed correction method used the T1w image as a reference in the inverse-PE registration. Results show its promising performance. Our proposed method is timely, as sub-millimeter fMRI has become increasingly popular.
Collapse
Affiliation(s)
- S T M Duong
- School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, Australia.
| | - S L Phung
- School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, Australia
| | - A Bouzerdoum
- School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, Australia; College of Science and Engineering, Hamad Bin Khalifa University, Qatar
| | | | - A M Puckett
- School of Psychology, University of Queensland, Australia; Queensland Brain Institute, University of Queensland, Australia
| | - M M Schira
- School of Psychology, University of Wollongong, Australia.
| |
Collapse
|
8
|
Tang VH, Bouzerdoum A, Phung SL. Compressive Radar Imaging of Stationary Indoor Targets with Low-rank plus Jointly Sparse and Total Variation Regularizations. IEEE Trans Image Process 2020; 29:4598-4613. [PMID: 32092003 DOI: 10.1109/tip.2020.2973819] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This paper addresses the problem of wall clutter mitigation and image reconstruction for through-wall radar imaging (TWRI) of stationary targets by seeking a model that incorporates low-rank (LR), joint sparsity (JS), and total variation (TV) regularizers. The motivation of the proposed model is that LR regularizer captures the low-dimensional structure of wall clutter; JS guarantees a small fraction of target occupancy and the similarity of sparsity profile among channel images; TV regularizer promotes the spatial continuity of target regions and mitigates background noise. The task of wall clutter mitigation and target image reconstruction is formulated as an optimization problem comprising LR, JS, and TV regularization terms. To handle this problem efficiently, an iterative algorithm based on the forward-backward proximal gradient splitting technique is introduced, which captures wall clutter and yields target images simultaneously. Extensive experiments are conducted on real radar data under compressive sensing scenarios. The results show that the proposed model enhances target localization and clutter mitigation even when radar measurements are significantly reduced.
Collapse
|
9
|
Alam T, Islam MT, Househ M, Bouzerdoum A, Kawsar FA. DeepDSSR: Deep Learning Structure for Human Donor Splice Sites Recognition. Stud Health Technol Inform 2019; 262:236-239. [PMID: 31349311 DOI: 10.3233/shti190062] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Human genes often, through alternative splicing of pre-messenger RNAs, produce multiple mRNAs and protein isoforms that may have similar or completely different functions. Identification of splice sites is, therefore, crucial to understand the gene structure and variants of mRNA and protein isoforms produced by the primary RNA transcripts. Although many computational methods have been developed to detect the splice sites in humans, this is still substantially a challenging problem and further improvement of the computational model is still foreseeable. Accordingly, we developed DeepDSSR (deep donor splice site recognizer), a novel deep learning based architecture, for predicting human donor splice sites. The proposed method, built upon publicly available and highly imbalanced benchmark dataset, is comparable with the leading deep learning based methods for detecting human donor splice sites. Performance evaluation metrics show that DeepDSSR outperformed the existing deep learning based methods. Future work will improve the predictive capabilities of our model, and we will build a model for the prediction of acceptor splice sites.
Collapse
Affiliation(s)
- Tanvir Alam
- Information and Computing Technology Division, College of Science and Engineering, Hamad Bin Khalifa University (HBKU), Doha, Qatar
| | | | - Mowafa Househ
- Information and Computing Technology Division, College of Science and Engineering, Hamad Bin Khalifa University (HBKU), Doha, Qatar
| | - Abdesselam Bouzerdoum
- Information and Computing Technology Division, College of Science and Engineering, Hamad Bin Khalifa University (HBKU), Doha, Qatar
- School of Electrical, Computer and Telecommunications Engineering University of Wollongong, Wollongong, NSW, Australia
| | | |
Collapse
|
10
|
Tang VH, Bouzerdoum A, Phung SL. Multipolarization Through-Wall Radar Imaging Using Low-Rank and Jointly-Sparse Representations. IEEE Trans Image Process 2018; 27:1763-1776. [PMID: 29346093 DOI: 10.1109/tip.2017.2786462] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Compressed sensing techniques have been applied to through-the-wall radar imaging (TWRI) and multipolarization TWRI for fast data acquisition and enhanced target localization. The studies so far in this area have either assumed effective wall clutter removal prior to image formation or performed signal estimation, wall clutter mitigation, and image formation independently. This paper proposes a low-rank and sparse imaging model for jointly addressing the problem of wall clutter mitigation and image formation in multichannel TWRI. The proposed model exploits two important structures of through-wall radar signals: low-rank structure of the wall reflections and jointly-sparse structure among the different polarization images. The task of removing wall clutter and reconstructing multichannel images of the same scene behind-the-wall is formulated as a regularized least squares problem, where low-rank regularization is enforced for the wall components, and joint-sparsity penalty is imposed on channel images. To solve the optimization problem, an iterative algorithm based on the proximal gradient technique is introduced, which simultaneously estimates the wall interferences and yields multichannel images of the indoor targets. Experiments on real and simulated radar data are conducted under full measurements and compressive sensing scenarios. The results show that the proposed model is very effective at removing unwanted wall clutter and enhancing the stationary targets, even under considerable reduction in measurements.
Collapse
|
11
|
Chen ATY, Biglari-Abhari M, Wang KIK, Bouzerdoum A, Tivive FHC. Convolutional neural network acceleration with hardware/software co-design. APPL INTELL 2017. [DOI: 10.1007/s10489-017-1007-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
12
|
|
13
|
Seng CH, Bouzerdoum A, Amin MG, Phung SL. Probabilistic fuzzy image fusion approach for radar through wall sensing. IEEE Trans Image Process 2013; 22:4938-4951. [PMID: 23996561 DOI: 10.1109/tip.2013.2279953] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
This paper addresses the problem of combining multiple radar images of the same scene to produce a more informative composite image. The proposed approach for probabilistic fuzzy logic-based image fusion automatically forms fuzzy membership functions using the Gaussian-Rayleigh mixture distribution. It fuses the input pixel values directly without requiring fuzzification and defuzzification, thereby removing the subjective nature of the existing fuzzy logic methods. In this paper, the proposed approach is applied to through-the-wall radar imaging in urban sensing and evaluated on real multi-view and polarimetric data. Experimental results show that the proposed approach yields improved image contrast and enhances target detection.
Collapse
|
14
|
Abstract
A recurrent neural network is presented which performs quadratic optimization subject to bound constraints on each of the optimization variables. The network is shown to be globally convergent, and conditions on the quadratic problem and the network parameters are established under which exponential asymptotic stability is achieved. Through suitable choice of the network parameters, the system of differential equations governing the network activations is preconditioned in order to reduce its sensitivity to noise and to roundoff errors. The optimization method employed by the neural network is shown to fall into the general class of gradient methods for constrained nonlinear optimization and, in contrast with penalty function methods, is guaranteed to yield only feasible solutions.
Collapse
Affiliation(s)
- A Bouzerdoum
- Dept. of Electr. and Electron. Eng., Adelaide Univ., SA
| | | |
Collapse
|
15
|
Seng CH, Demirli R, Amin MG, Seachrist JL, Bouzerdoum A. Automatic left ventricle detection in echocardiographic images for deformable contour initialization. Annu Int Conf IEEE Eng Med Biol Soc 2012; 2011:7215-8. [PMID: 22256003 DOI: 10.1109/iembs.2011.6091823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The accurate left ventricular boundary detection in echocardiographic images allow cardiologists to study and assess cardiomyopathy in patients. Due to the tedious and time consuming manner of manually tracing the borders, deformable models are generally used for left ventricle segmentations. However, most deformable models require a good initialization, which is usually outlined manually by the user. In this paper, we propose an automated left ventricle detection method for two-dimensional echocardiographic images that could serve as an initialization for deformable models. The proposed approach consists of pre-processing and post-processing stages, coupled with the watershed segmentation. The pre-processing stage enhances the overall contrast and reduces speckle noise, whereas the post-processing enhances the segmented region and avoids the papillary muscles. The performance of the proposed method is evaluated on real data. Experimental results show that it is suitable for automatic contour initialization since no prior assumptions nor human interventions are required. Besides, the computational time taken is also lower compared to an existing method.
Collapse
Affiliation(s)
- Cher Hau Seng
- School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, Wollongong, NSW 2522, Australia.
| | | | | | | | | |
Collapse
|
16
|
Abstract
We propose a new hierarchical architecture for visual pattern classification. The new architecture consists of a set of fixed, directional filters and a set of adaptive filters arranged in a cascade structure. The fixed filters are used to extract primitive features such as orientations and edges that are present in a wide range of objects, whereas the adaptive filters can be trained to find complex features that are specific to a given object. Both types of filter are based on the biological mechanism of shunting inhibition. The proposed architecture is applied to two problems: pedestrian detection and car detection. Evaluation results on benchmark data sets demonstrate that the proposed architecture outperforms several existing ones.
Collapse
Affiliation(s)
- Fok H C Tivive
- School of Electrical, Computer, and Telecommunications Engineering, University of Wollongong, Wollongong, NSW 2522, Australia
| | | | | | | |
Collapse
|
17
|
|
18
|
Abstract
In this paper, we propose a new neural architecture for classification of visual patterns that is motivated by the two concepts of image pyramids and local receptive fields. The new architecture, called pyramidal neural network (PyraNet), has a hierarchical structure with two types of processing layers: Pyramidal layers and one-dimensional (1-D) layers. In the new network, nonlinear two-dimensional (2-D) neurons are trained to perform both image feature extraction and dimensionality reduction. We present and analyze five training methods for PyraNet [gradient descent (GD), gradient descent with momentum, resilient back-propagation (RPROP), Polak-Ribiere conjugate gradient (CG), and Levenberg-Marquadrt (LM)] and two choices of error functions [mean-square-error (mse) and cross-entropy (CE)]. In this paper, we apply PyraNet to determine gender from a facial image, and compare its performance on the standard facial recognition technology (FERET) database with three classifiers: The convolutional neural network (NN), the k-nearest neighbor (k-NN), and the support vector machine (SVM).
Collapse
Affiliation(s)
- Son Lam Phung
- School of Electrical, Computer and Telecommunications Engineering, University of Wollongong, Wollongong, NSW 2522, Australia.
| | | |
Collapse
|
19
|
Abstract
This article presents some efficient training algorithms, based on first-order, second-order, and conjugate gradient optimization methods, for a class of convolutional neural networks (CoNNs), known as shunting inhibitory convolution neural networks. Furthermore, a new hybrid method is proposed, which is derived from the principles of Quickprop, Rprop, SuperSAB, and least squares (LS). Experimental results show that the new hybrid method can perform as well as the Levenberg-Marquardt (LM) algorithm, but at a much lower computational cost and less memory storage. For comparison sake, the visual pattern recognition task of face/nonface discrimination is chosen as a classification problem to evaluate the performance of the training algorithms. Sixteen training algorithms are implemented for the three different variants of the proposed CoNN architecture: binary-, Toeplitz- and fully connected architectures. All implemented algorithms can train the three network architectures successfully, but their convergence speed vary markedly. In particular, the combination of LS with the new hybrid method and LS with the LM method achieve the best convergence rates in terms of number of training epochs. In addition, the classification accuracies of all three architectures are assessed using ten-fold cross validation. The results show that the binary- and Toeplitz-connected architectures outperform slightly the fully connected architecture: the lowest error rates across all training algorithms are 1.95% for Toeplitz-connected, 2.10% for the binary-connected, and 2.20% for the fully connected network. In general, the modified Broyden-Fletcher-Goldfarb-Shanno (BFGS) methods, the three variants of LM algorithm, and the new hybrid/LS method perform consistently well, achieving error rates of less than 3% averaged across all three architectures.
Collapse
Affiliation(s)
- Fok Hing Chi Tivive
- School of Electrical, Computer, and Telecommunications Engineering, University of Wollongong, Wollongong, NSW 2522, Australia.
| | | |
Collapse
|
20
|
Phung SL, Bouzerdoum A, Chai D. Skin segmentation using color pixel classification: analysis and comparison. IEEE Trans Pattern Anal Mach Intell 2005; 27:148-154. [PMID: 15628277 DOI: 10.1109/tpami.2005.17] [Citation(s) in RCA: 93] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
This paper presents a study of three important issues of the color pixel classification approach to skin segmentation: color representation, color quantization, and classification algorithm. Our analysis of several representative color spaces using the Bayesian classifier with the histogram technique shows that skin segmentation based on color pixel classification is largely unaffected by the choice of the color space. However, segmentation performance degrades when only chrominance channels are used in classification. Furthermore, we find that color quantization can be as low as 64 bins per channel, although higher histogram sizes give better segmentation performance. The Bayesian classifier with the histogram technique and the multilayer perceptron classifier are found to perform better compared to other tested classifiers, including three piecewise linear classifiers, three unimodal Gaussian classifiers, and a Gaussian mixture classifier.
Collapse
Affiliation(s)
- Son Lam Phung
- School of Engineering and Mathematics, Edith Cowan University, WA 6027, Australia.
| | | | | |
Collapse
|
21
|
Abstract
This article presents a new generalized feedforward neural network (GFNN) architecture for pattern classification and regression. The GFNN architecture uses as the basic computing unit a generalized shunting neuron (GSN) model, which includes as special cases the perceptron and the shunting inhibitory neuron. GSNs are capable of forming complex, nonlinear decision boundaries. This allows the GFNN architecture to easily learn some complex pattern classification problems. In this article the GFNNs are applied to several benchmark classification problems, and their performance is compared to the performances of SIANNs and multilayer perceptrons. Experimental results show that a single GSN can outperform both the SIANN and MLP networks.
Collapse
Affiliation(s)
- Ganesh Arulampalam
- Edith Cowan University, 100 Joondalup Drive, Joondalup, WA 6027, Australia.
| | | |
Collapse
|
22
|
|