1
|
Multiblock data applied in organic grape juice authentication by one-class classification OC-PLS. Food Chem 2024; 436:137695. [PMID: 37857206 DOI: 10.1016/j.foodchem.2023.137695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 09/27/2023] [Accepted: 10/06/2023] [Indexed: 10/21/2023]
Abstract
A new strategy has been developed to enhance the assessment of the authenticity of whole grape juice within the organic class. This approach is based on the analysis of data from different analytical sources. The novel method employs a multiblock regression technique, specifically the one-class partial least squares (OC-PLS) classifier, to establish a relationship between each predictor block and the response variable. Sequential calculations are performed after orthogonalization with respect to the preceding regression scores. The proposed method has demonstrated effectiveness in detecting targeted samples. The results achieved of the best models for the test set had rates of up to 100 % sensitivity, 89 % specificity, and 83 % accuracy. To compare with the multiblock models, the DD-SIMCA method was employed, but it yielded inferior results when applied to visible data. The multiblock approach proved to be efficient in evaluating from different datasets of varied sources to classification of organic grape juice.
Collapse
|
2
|
Learning image representations for anomaly detection: Application to discovery of histological alterations in drug development. Med Image Anal 2024; 92:103067. [PMID: 38141454 DOI: 10.1016/j.media.2023.103067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 12/01/2023] [Accepted: 12/19/2023] [Indexed: 12/25/2023]
Abstract
We present a system for anomaly detection in histopathological images. In histology, normal samples are usually abundant, whereas anomalous (pathological) cases are scarce or not available. Under such settings, one-class classifiers trained on healthy data can detect out-of-distribution anomalous samples. Such approaches combined with pre-trained Convolutional Neural Network (CNN) representations of images were previously employed for anomaly detection (AD). However, pre-trained off-the-shelf CNN representations may not be sensitive to abnormal conditions in tissues, while natural variations of healthy tissue may result in distant representations. To adapt representations to relevant details in healthy tissue we propose training a CNN on an auxiliary task that discriminates healthy tissue of different species, organs, and staining reagents. Almost no additional labeling workload is required, since healthy samples come automatically with aforementioned labels. During training we enforce compact image representations with a center-loss term, which further improves representations for AD. The proposed system outperforms established AD methods on a published dataset of liver anomalies. Moreover, it provided comparable results to conventional methods specifically tailored for quantification of liver anomalies. We show that our approach can be used for toxicity assessment of candidate drugs at early development stages and thereby may reduce expensive late-stage drug attrition.
Collapse
|
3
|
Geographical origin identification of camellia oil based on fatty acid profiles combined with one-class classification. Food Chem 2024; 433:137306. [PMID: 37696091 DOI: 10.1016/j.foodchem.2023.137306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 08/16/2023] [Accepted: 08/26/2023] [Indexed: 09/13/2023]
Abstract
Geographical Indication (GI) agricultural products possess specific geographical origins and high qualities, which require an effective geographical origin traceability method for the important protective trademarks. In this study, authentication models for Changshan camellia oil were developed by fatty acid profiles and one-class classification methods including data-driven soft independent modeling of class analogy (DD-SIMCA) and one-class partial least squares (OCPLS), and compared with traditional two-class classification models. The results indicated that the prediction errors of three two-class classification models were 63.8%, 12.1%, and 65.2% for the samples out of targeted geographical origins, respectively. By contrast, the one-class classification models could completely differentiate Changshan from non-Changshan camellia oils, even from the adjacent counties. Moreover, compared with traditional indicators of mineral elements, the model built by fatty acid profiles possessed higher sensitivity and specificity. It also offered a reference strategy for the geographical origin identification of other high-value oils or foods.
Collapse
|
4
|
Self-supervised pseudo multi-class pre-training for unsupervised anomaly detection and segmentation in medical images. Med Image Anal 2023; 90:102930. [PMID: 37657364 DOI: 10.1016/j.media.2023.102930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2022] [Revised: 07/12/2023] [Accepted: 08/07/2023] [Indexed: 09/03/2023]
Abstract
Unsupervised anomaly detection (UAD) methods are trained with normal (or healthy) images only, but during testing, they are able to classify normal and abnormal (or disease) images. UAD is an important medical image analysis (MIA) method to be applied in disease screening problems because the training sets available for those problems usually contain only normal images. However, the exclusive reliance on normal images may result in the learning of ineffective low-dimensional image representations that are not sensitive enough to detect and segment unseen abnormal lesions of varying size, appearance, and shape. Pre-training UAD methods with self-supervised learning, based on computer vision techniques, can mitigate this challenge, but they are sub-optimal because they do not explore domain knowledge for designing the pretext tasks, and their contrastive learning losses do not try to cluster the normal training images, which may result in a sparse distribution of normal images that is ineffective for anomaly detection. In this paper, we propose a new self-supervised pre-training method for MIA UAD applications, named Pseudo Multi-class Strong Augmentation via Contrastive Learning (PMSACL). PMSACL consists of a novel optimisation method that contrasts a normal image class from multiple pseudo classes of synthesised abnormal images, with each class enforced to form a dense cluster in the feature space. In the experiments, we show that our PMSACL pre-training improves the accuracy of SOTA UAD methods on many MIA benchmarks using colonoscopy, fundus screening and Covid-19 Chest X-ray datasets.
Collapse
|
5
|
A green method for the authentication of sugarcane spirit and prediction of density and alcohol content based on near infrared spectroscopy and chemometric tools. Food Res Int 2023; 170:112830. [PMID: 37316036 DOI: 10.1016/j.foodres.2023.112830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 04/10/2023] [Accepted: 04/12/2023] [Indexed: 06/16/2023]
Abstract
Cachaça is a Brazilian beverage obtained from the fermentation of sugarcane juice (sugarcane spirit) and is considered one of the most consumed alcoholic beverages in the world with a strong economic impact on the northeastern Brazil, more specifically in the Brejo. This microregion produces sugarcane spirits with high quality associated to edaphoclimatic conditions. In this sense, analysis for sample authentication and quality control that uses solvent-free, environmentally friendly, rapid and non-destructive methods is advantageous for cachaça producers and production chain. Thus, in this work commercial cachaça samples using near-infrared spectroscopy (NIRS) were classified based on geographical origin using one-class classification Data-Driven in Soft Independent Modelling of Class Analogy (DD-SIMCA) and One-Class Partial Least Squares (OCPLS) and predicted quality parameters of alcohol content and density based on different chemometric algorithms. A total of 150 sugarcane spirits samples were purchased from the Brazilian retail market being 100 from Brejo and 50 from other regions of Brazil. The one-class chemometric classification model was obtained with DD-SIMCA using the Savitzky-Golay derivative with first derivative, 9-point window and 1st degree polynomial as preprocessing algorithm and sensibility was 96.70 % and specificity 100 % in the spectral range 7,290-11,726 cm-1. Satisfactory results were obtained in the model constructs for density and the chemometric model, iSPA-PLS algorithm with baseline offset as preprocessing, obtained root mean square errors of prediction (RMSEP) of 0.0011 mg/L and Relative Error of Prediction (REP) of 0.12 %. The chemometric model for alcohol content prediction used the iSPA-PLS algorithm with Savitzky-Golay derivative with first derivative, 9-point window and 1st degree polynomial as algorithm as preprocessing obtaining RMSEP and REP of 0.69 and 1.81 % (v/v), respectively. Both models used the spectral range from 7,290-11,726 cm-1. The results reflected the potential of vibrational spectroscopy coupled with chemometrics to build reliable models for identifying the geographical origin of cachaça samples for predicting quality parameters in cachaça samples.
Collapse
|
6
|
Mobile authentication of copy detection patterns. EURASIP JOURNAL ON INFORMATION SECURITY 2023; 2023:4. [PMID: 37292064 PMCID: PMC10244288 DOI: 10.1186/s13635-023-00140-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2022] [Accepted: 05/23/2023] [Indexed: 06/10/2023] Open
Abstract
In the recent years, the copy detection patterns (CDP) attracted a lot of attention as a link between the physical and digital worlds, which is of great interest for the internet of things and brand protection applications. However, the security of CDP in terms of their reproducibility by unauthorized parties or clonability remains largely unexplored. In this respect, this paper addresses a problem of anti-counterfeiting of physical objects and aims at investigating the authentication aspects and the resistances to illegal copying of the modern CDP from machine learning perspectives. A special attention is paid to a reliable authentication under the real-life verification conditions when the codes are printed on an industrial printer and enrolled via modern mobile phones under regular light conditions. The theoretical and empirical investigation of authentication aspects of CDP is performed with respect to four types of copy fakes from the point of view of (i) multi-class supervised classification as a baseline approach and (ii) one-class classification as a real-life application case. The obtained results show that the modern machine-learning approaches and the technical capacities of modern mobile phones allow to reliably authenticate CDP on end-user mobile phones under the considered classes of fakes.
Collapse
|
7
|
On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles. Data Min Knowl Discov 2023; 37:1473-1517. [PMID: 37424877 PMCID: PMC10326160 DOI: 10.1007/s10618-023-00931-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Accepted: 02/28/2023] [Indexed: 07/11/2023]
Abstract
It has been shown that unsupervised outlier detection methods can be adapted to the one-class classification problem (Janssens and Postma, in: Proceedings of the 18th annual Belgian-Dutch on machine learning, pp 56-64, 2009; Janssens et al. in: Proceedings of the 2009 ICMLA international conference on machine learning and applications, IEEE Computer Society, pp 147-153, 2009. 10.1109/ICMLA.2009.16). In this paper, we focus on the comparison of one-class classification algorithms with such adapted unsupervised outlier detection methods, improving on previous comparison studies in several important aspects. We study a number of one-class classification and unsupervised outlier detection methods in a rigorous experimental setup, comparing them on a large number of datasets with different characteristics, using different performance measures. In contrast to previous comparison studies, where the models (algorithms, parameters) are selected by using examples from both classes (outlier and inlier), here we also study and compare different approaches for model selection in the absence of examples from the outlier class, which is more realistic for practical applications since labeled outliers are rarely available. Our results showed that, overall, SVDD and GMM are top-performers, regardless of whether the ground truth is used for parameter selection or not. However, in specific application scenarios, other methods exhibited better performance. Combining one-class classifiers into ensembles showed better performance than individual methods in terms of accuracy, as long as the ensemble members are properly selected. Supplementary Information The online version contains supplementary material available at 10.1007/s10618-023-00931-x.
Collapse
|
8
|
RNA Modification Detection Using Nanopore Direct RNA Sequencing and nanoDoc2. Methods Mol Biol 2023; 2632:299-319. [PMID: 36781737 DOI: 10.1007/978-1-0716-2996-3_21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2023]
Abstract
RNA modifications regulate multiple aspects of cellular function including RNA splicing, translation, export, decay, stability, and phase separation. One of the comprehensive ways to detect such modifications is by the recent advancement of direct RNA sequencing from Oxford Nanopore Technologies (ONT). However, this method obtains a large amount of data with high complexity in the form of raw current signal that poses a new informatics challenge to accurately detect those modifications. Here, we provide nanoDoc2, a software to detect multiple types of RNA modification from nanopore direct RNA sequencing data. The nanoDoc2 includes a novel signal segmentation algorithm based on the trace value-a base probability feature that is added by the Guppy basecalling program from ONT during processing of the raw signal. The core of nanoDoc2 includes a machine learning algorithm in which a 6-mer segmented raw current signal is analyzed by deep one-class classification using a WaveNet-based neural network. As an output, an RNA modification is detected by a statistical score in each candidate position. Herein, we describe the detailed instructions on how to use nanoDoc2 for signal segmentation, train/test the neural network, and finally predict RNA modifications present in nanopore direct RNA sequencing data.
Collapse
|
9
|
A novel temporal generative adversarial network for electrocardiography anomaly detection. Artif Intell Med 2023; 136:102489. [PMID: 36710067 DOI: 10.1016/j.artmed.2023.102489] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Revised: 11/28/2022] [Accepted: 01/09/2023] [Indexed: 01/15/2023]
Abstract
Cardiac abnormality detection from Electrocardiogram (ECG) signals is a common task for cardiologists. To facilitate efficient and objective detection, automated ECG classification by using deep learning based methods have been developed in recent years. Despite their impressive performance, these methods perform poorly when presented with cardiac abnormalities that are not well represented, or absent, in the training data. To this end, we propose a novel one-class classification based ECG anomaly detection generative adversarial network (GAN). Specifically, we embedded a Bi-directional Long-Short Term Memory (Bi-LSTM) layer into a GAN architecture and used a mini-batch discrimination training strategy in the discriminator to synthesis ECG signals. Our method generates samples to match the data distribution from normal signals of healthy group so that a generalised anomaly detector can be built reliably. The experimental results demonstrate our method outperforms several state-of-the-art semi-supervised learning based ECG anomaly detection algorithms and robustly detects the unknown anomaly class in the MIT-BIH arrhythmia database. Experiments show that our method achieves the accuracy of 95.5% and AUC of 95.9% which outperforms the most competitive baseline by 0.7% and 1.7% respectively. Our method may prove to be a helpful diagnostic method for helping cardiologists identify arrhythmias.
Collapse
|
10
|
Optimizing the soft independent modeling of class analogy (SIMCA) using statistical prediction regions. Anal Chim Acta 2022; 1229:340339. [PMID: 36156218 DOI: 10.1016/j.aca.2022.340339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 08/29/2022] [Accepted: 08/30/2022] [Indexed: 11/26/2022]
Abstract
The ultimate goal of a one-class classifier like the "rigorous" soft independent modeling of class analogy (SIMCA) is to predict with a certain confidence probability, the conformity of future objects with a given reference class. However, the SIMCA model, as currently implemented often suffers from an undercoverage problem, meaning that its observed sensitivity often falls far below the desired theoretical confidence probability, hence undermining its intended use as a predictive tool. To overcome the issue, the most reported strategy in the literature, involves incrementing the nominal confidence probability until the desired sensitivity is obtained in cross-validation. This article proposes a statistical prediction interval-based strategy as an alternative strategy to properly overcome this undercoverage issue. The strategy uses the concept of predictive distributions sensu stricto to construct statistical prediction regions for the metrics. Firstly, a procedure based on goodness-of-fit criteria is used to select the best-fitting family of probability models for each metric or its monotonic transformation, among several plausible candidate families of right-skewed probability distributions for positive random variables, including the gamma and the lognormal families. Secondly, assuming the best-fitting distribution, a generalized linear model is fitted to each metric data using the Bayesian method. This method enables to conveniently estimate uncertainties about the parameters of the selected distribution. Propagating these uncertainties to the best-fitting probability model of the metric enables to derive its so-called posterior predictive distribution, which is then used to set its critical limit. Overall, the evaluation of the proposed approach on a diversity of real datasets shows that it yields unbiased and more accurate sensitivities than existing methods which are not based on predictive densities. It can even yield better specificities than the strategy that attempts to improve sensitivities of existing methods by "optimizing" the type 1 error, especially in low sample sizes' contexts.
Collapse
|
11
|
Using one-class autoencoder for adulteration detection of milk powder by infrared spectrum. Food Chem 2022; 372:131219. [PMID: 34601417 DOI: 10.1016/j.foodchem.2021.131219] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 09/17/2021] [Accepted: 09/22/2021] [Indexed: 12/11/2022]
Abstract
Food adulteration detection requires quick and simple methods. Spectral detection can significantly reduce the analysis time, but it needs to construct a detection model. In this study, a one-class classification method based on an autoencoder is proposed for the detection of food adulteration by spectroscopy. In the proposed method, the autoencoder is constructed to extract low-dimensional features from high-dimensional spectral data and reconstruct the original spectrum. Then the coding error and reconstruction error are used to determine the food sample is adulterated or not. The infrared spectral data of milk powder and its adulterated forms are used to verify the performance of the proposed model. Experimental results show that the proposed method has similar effects to soft independent modeling of class analogy and one-class partial least squares, and is significantly better than support vector data description. The proposed method can be flexibly applied to the spectral detection of food adulteration.
Collapse
|
12
|
Geographical origin authentication of southern Brazilian red wines by means of EEM-pH four-way data modelling coupled with one class classification approach. Food Chem 2021; 362:130087. [PMID: 34139571 DOI: 10.1016/j.foodchem.2021.130087] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Revised: 04/28/2021] [Accepted: 05/08/2021] [Indexed: 11/20/2022]
Abstract
EEM data recorded at different pH values was exploited by MCR-ALS in order to determine qualitative information about Brazilian red wines. In addition, the geographical traceability of wines produced in the Serra Gaúcha (Rio Grande do Sul) was carried out by DD-SIMCA considering 53 samples from the target class and 20 from other producing regions. The fluorescence signal corresponds to 9 EEMs recorded at different pH (3-11), generating four-way data. By MCR-ALS decomposition, eight factors were retrieved and related to typical chemical compounds found in red wine. In addition, the EEM pH data was used to build a one-class classification model, considering that MCR scores and all samples of the target class were properly recognised as belonging to the target class, with maximal sensitivity equal to 1. Samples of the non-target class were also adequately rejected by the model, and the specificity was found to be 0.97.
Collapse
|
13
|
Target specific mining of COVID-19 scholarly articles using one-class approach. CHAOS, SOLITONS, AND FRACTALS 2020; 140:110155. [PMID: 32834643 PMCID: PMC7392081 DOI: 10.1016/j.chaos.2020.110155] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2020] [Accepted: 07/23/2020] [Indexed: 05/21/2023]
Abstract
The novel coronavirus disease 2019 (COVID-19) began as an outbreak from epicentre Wuhan, People's Republic of China in late December 2019, and till June 27, 2020 it caused 9,904,906 infections and 496,866 deaths worldwide. The world health organization (WHO) already declared this disease a pandemic. Researchers from various domains are putting their efforts to curb the spread of coronavirus via means of medical treatment and data analytics. In recent years, several research articles have been published in the field of coronavirus caused diseases like severe acute respiratory syndrome (SARS), middle east respiratory syndrome (MERS) and COVID-19. In the presence of numerous research articles, extracting best-suited articles is time-consuming and manually impractical. The objective of this paper is to extract the activity and trends of coronavirus related research articles using machine learning approaches to help the research community for future exploration concerning COVID-19 prevention and treatment techniques. The COVID-19 open research dataset (CORD-19) is used for experiments, whereas several target-tasks along with explanations are defined for classification, based on domain knowledge. Clustering techniques are used to create the different clusters of available articles, and later the task assignment is performed using parallel one-class support vector machines (OCSVMs). These defined tasks describes the behavior of clusters to accomplish target-class guided mining. Experiments with original and reduced features validate the performance of the approach. It is evident that the k-means clustering algorithm, followed by parallel OCSVMs, outperforms other methods for both original and reduced feature space.
Collapse
|
14
|
Investigating the impact of supervoxel segmentation for unsupervised abnormal brain asymmetry detection. Comput Med Imaging Graph 2020; 85:101770. [PMID: 32854021 DOI: 10.1016/j.compmedimag.2020.101770] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Revised: 07/27/2020] [Accepted: 07/31/2020] [Indexed: 11/26/2022]
Abstract
Several brain disorders are associated with abnormal brain asymmetries (asymmetric anomalies). Several computer-based methods aim to detect such anomalies automatically. Recent advances in this area use automatic unsupervised techniques that extract pairs of symmetric supervoxels in the hemispheres, model normal brain asymmetries for each pair from healthy subjects, and treat outliers as anomalies. Yet, there is no deep understanding of the impact of the supervoxel segmentation quality for abnormal asymmetry detection, especially for small anomalies, nor of the added value of using a specialized model for each supervoxel pair instead of a single global appearance model. We aim to answer these questions by a detailed evaluation of different scenarios for supervoxel segmentation and classification for detecting abnormal brain asymmetries. Experimental results on 3D MR-T1 brain images of stroke patients confirm the importance of high-quality supervoxels fit anomalies and the use of a specific classifier for each supervoxel. Next, we present a refinement of the detection method that reduces the number of false-positive supervoxels, thereby making the detection method easier to use for visual inspection and analysis of the found anomalies.
Collapse
|
15
|
Automated active fault detection in fouled dissolved oxygen sensors. WATER RESEARCH 2019; 166:115029. [PMID: 31541793 DOI: 10.1016/j.watres.2019.115029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Revised: 08/22/2019] [Accepted: 08/28/2019] [Indexed: 06/10/2023]
Abstract
Biofilm formation causes bias in dissolved oxygen (DO) sensors, which hamper their usage for automatic control and thereby balancing energy- and treatment efficiency. We analysed if a dataset that was generated with deliberate perturbations, can automatically be interpreted to detect bias caused by biofilm formation. We used a challenging set-up with realistic conditions that are required for a full-scale application. This included automated training (adapting to changing normal conditions) and automated tuning (setting an alarm threshold) to assure that the fault detection (FD)-methods are accessible to the operators. The results showed that automatic usage of FD-methods is difficult, especially in terms of automatic tuning of alarm thresholds when small training datasets only represent the normal conditions, i.e. clean sensors. Despite the challenging set-up, two FD-methods successfully improved the detection limit to 0.5 mg DO/L bias caused by biofilm formation. We showed that the studied dataset could be interpreted equally well by simpler FD-methods, as by advanced machine learning algorithms. This in turn indicates that the information contained in the actively generated data was more vital than its interpretation by advanced algorithms.
Collapse
|
16
|
Predicting combinative drug pairs via multiple classifier system with positive samples only. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019; 168:1-10. [PMID: 30527128 DOI: 10.1016/j.cmpb.2018.11.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Revised: 10/24/2018] [Accepted: 11/12/2018] [Indexed: 06/09/2023]
Abstract
BACKGROUND AND OBJECTIVE Due to the synergistic effects of drugs, drug combination is one of the effective approaches for treating complex diseases. However, the identification of drug combinations by dose-response methods is still costly. It is promising to develop supervised learning-based approaches to predict potential drug combinations on a large scale. Nevertheless, these approaches have the inadequate utilization of heterogeneous features, which causes the loss of information useful to classification. Moreover, they have an intrinsic bias, because they assume unknown drug pairs as non-combinations, of which some could be real drug combinations in practice. METHODS To address above issues, this work first designs a two-layer multiple classifier system (TLMCS) to effectively integrate heterogeneous features involving anatomical therapeutic chemical codes of drugs, drug-drug interactions, drug-target interactions, gene ontology of drug targets, and side effects. To avoid the bias caused by labelling unknown samples as negative, it then utilizes the one-class support vector machines, (which requires no negative instance and only labels approved drug combinations as positive instances), as the member classifiers in TLMCS. Last, both a 10-fold cross validation (10-CV) and a novel prediction are performed to validate the performance of TLMCS. RESULTS The comparison with three state-of-the-art approaches under 10-CV exhibits the superiority of TLMCS, which achieves the area under the receiver operating characteristic curve = 0.824 and the area under the precision-recall curve = 0.372. Moreover, the experiment under the novel prediction demonstrates its ability, where 9 out of the top-20 predicted combinative drug pairs are validated by checking the published literature. Furthermore, for each of the newly-validated drug combinations, this work analyses the combining mode of the member drugs and investigates their relationship in terms of drug targeting pathways. CONCLUSIONS The proposed TLMCS provides an effective framework to integrate those heterogeneous features and is trained by only positive samples such that the bias of taking unknown drug pairs as negative samples can be avoided. Furthermore, its results in the novel prediction reveal five types of drug combinations and three types of drug relationships in terms of pathways.
Collapse
|
17
|
Detection of adulterants in grape nectars by attenuated total reflectance Fourier-transform mid-infrared spectroscopy and multivariate classification strategies. Food Chem 2018; 266:254-261. [PMID: 30381184 DOI: 10.1016/j.foodchem.2018.06.006] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Revised: 05/28/2018] [Accepted: 06/03/2018] [Indexed: 10/14/2022]
Abstract
There is no any doubt about the importance of food fraud control, as it has implications in food safety and in consumer health. Focusing on fruit beverages, some types of adulterations have been detected more frequently, such as substitution with less expensive fruits. A methodology based on attenuated total reflectance Fourier-transform mid-infrared spectroscopy (ATR-FTIR) and multivariate classification was applied to detect whether grape nectars were adulterated by substitution with apple juice or cashew juice. A total of 126 samples were obtained and analyzed. Two strategies were proposed: one-class and multiclass approaches. Soft independent modeling of class analogy (SIMCA), partial least squares discriminant analysis (PLS-DA) and partial least squares density modeling (PLS-DM) were used to build the models. Among them, PLS-DA presented the best performance with a sensitivity and specificity of nearly 100%. The multiclass strategy was preferred if the adulterants to be studied are known because it provides additional information.
Collapse
|
18
|
Class-modelling in food analytical chemistry: Development, sampling, optimisation and validation issues - A tutorial. Anal Chim Acta 2017; 982:9-19. [PMID: 28734370 DOI: 10.1016/j.aca.2017.05.013] [Citation(s) in RCA: 105] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2016] [Revised: 05/11/2017] [Accepted: 05/16/2017] [Indexed: 11/22/2022]
Abstract
Qualitative data modelling is a fundamental branch of pattern recognition, with many applications in analytical chemistry, and embraces two main families: discriminant and class-modelling methods. The first strategy is appropriate when at least two classes are meaningfully defined in the problem under study, while the second strategy is the right choice when the focus is on a single class. For this reason, class-modelling methods are also referred to as one-class classifiers. Although, in the food analytical field, most of the issues would be properly addressed by class-modelling strategies, the use of such techniques is rather limited and, in many cases, discriminant methods are forcedly used for one-class problems, introducing a bias in the outcomes. Key aspects related to the development, optimisation and validation of suitable class models for the characterisation of food products are critically analysed and discussed.
Collapse
|
19
|
Review of fall detection techniques: A data availability perspective. Med Eng Phys 2016; 39:12-22. [PMID: 27889391 DOI: 10.1016/j.medengphy.2016.10.014] [Citation(s) in RCA: 136] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2016] [Revised: 09/16/2016] [Accepted: 10/30/2016] [Indexed: 11/26/2022]
Abstract
A fall is an abnormal activity that occurs rarely; however, missing to identify falls can have serious health and safety implications on an individual. Due to the rarity of occurrence of falls, there may be insufficient or no training data available for them. Therefore, standard supervised machine learning methods may not be directly applied to handle this problem. In this paper, we present a taxonomy for the study of fall detection from the perspective of availability of fall data. The proposed taxonomy is independent of the type of sensors used and specific feature extraction/selection methods. The taxonomy identifies different categories of classification methods for the study of fall detection based on the availability of their data during training the classifiers. Then, we present a comprehensive literature review within those categories and identify the approach of treating a fall as an abnormal activity to be a plausible research direction. We conclude our paper by discussing several open research problems in the field and pointers for future research.
Collapse
|
20
|
The impact of feature selection on one and two-class classification performance for plant microRNAs. PeerJ 2016; 4:e2135. [PMID: 27366641 PMCID: PMC4924126 DOI: 10.7717/peerj.2135] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2016] [Accepted: 05/25/2016] [Indexed: 11/23/2022] Open
Abstract
MicroRNAs (miRNAs) are short nucleotide sequences that form a typical hairpin structure which is recognized by a complex enzyme machinery. It ultimately leads to the incorporation of 18–24 nt long mature miRNAs into RISC where they act as recognition keys to aid in regulation of target mRNAs. It is involved to determine miRNAs experimentally and, therefore, machine learning is used to complement such endeavors. The success of machine learning mostly depends on proper input data and appropriate features for parameterization of the data. Although, in general, two-class classification (TCC) is used in the field; because negative examples are hard to come by, one-class classification (OCC) has been tried for pre-miRNA detection. Since both positive and negative examples are currently somewhat limited, feature selection can prove to be vital for furthering the field of pre-miRNA detection. In this study, we compare the performance of OCC and TCC using eight feature selection methods and seven different plant species providing positive pre-miRNA examples. Feature selection was very successful for OCC where the best feature selection method achieved an average accuracy of 95.6%, thereby being ∼29% better than the worst method which achieved 66.9% accuracy. While the performance is comparable to TCC, which performs up to 3% better than OCC, TCC is much less affected by feature selection and its largest performance gap is ∼13% which only occurs for two of the feature selection methodologies. We conclude that feature selection is crucially important for OCC and that it can perform on par with TCC given the proper set of features.
Collapse
|
21
|
Multi-scale segmentation of neurons based on one-class classification. J Neurosci Methods 2016; 266:94-106. [PMID: 27038663 DOI: 10.1016/j.jneumeth.2016.03.019] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2015] [Revised: 03/12/2016] [Accepted: 03/29/2016] [Indexed: 12/17/2022]
Abstract
BACKGROUND High resolution multiphoton and confocal microscopy has allowed the acquisition of large amounts of data to be analyzed by neuroscientists. However, manual processing of these images has become infeasible. Thus, there is a need to create automatic methods for the morphological reconstruction of 3D neuronal image stacks. NEW METHOD An algorithm to extract the 3D morphology from a neuron is presented. The main contribution of the paper is the segmentation of the neuron from the background. Our segmentation method is based on one-class classification where the 3D image stack is analyzed at different scales. First, a multi-scale approach is proposed to compute the Laplacian of the 3D image stack. The Laplacian is used to select a training set consisting of background points. A decision function is learned for each scale from the training set that allows determining how similar an unlabeled point is to the points in the background class. Foreground points (dendrites and axons) are assigned as those points that are rejected as background. Finally, the morphological reconstruction of the neuron is extracted by applying a state-of-the-art centerline tracing algorithm on the segmentation. RESULTS Quantitative and qualitative results on several datasets demonstrate the ability of our algorithm to accurately and robustly segment and trace neurons. COMPARISON WITH EXISTING METHOD(S) Our method was compared to state-of-the-art neuron tracing algorithms. CONCLUSIONS Our approach allows segmentation of thin and low contrast dendrites that are usually difficult to segment. Compared to our previous approach, this algorithm is more accurate and much faster.
Collapse
|