1
|
Radiomics analysis for distinctive identification of COVID-19 pulmonary nodules from other benign and malignant counterparts. Sci Rep 2024; 14:7079. [PMID: 38528100 DOI: 10.1038/s41598-024-57899-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 03/22/2024] [Indexed: 03/27/2024] Open
Abstract
This observational study investigated the potential of radiomics as a non-invasive adjunct to CT in distinguishing COVID-19 lung nodules from other benign and malignant lung nodules. Lesion segmentation, feature extraction, and machine learning algorithms, including decision tree, support vector machine, random forest, feed-forward neural network, and discriminant analysis, were employed in the radiomics workflow. Key features such as Idmn, skewness, and long-run low grey level emphasis were identified as crucial in differentiation. The model demonstrated an accuracy of 83% in distinguishing COVID-19 from other benign nodules and 88% from malignant nodules. This study concludes that radiomics, through machine learning, serves as a valuable tool for non-invasive discrimination between COVID-19 and other benign and malignant lung nodules. The findings suggest the potential complementary role of radiomics in patients with COVID-19 pneumonia exhibiting lung nodules and suspicion of concurrent lung pathologies. The clinical relevance lies in the utilization of radiomics analysis for feature extraction and classification, contributing to the enhanced differentiation of lung nodules, particularly in the context of COVID-19.
Collapse
|
2
|
Analysis of genetic diversity in patients with major psychiatric disorders versus healthy controls: A molecular-genetic study of 1698 subjects genotyped for 100 candidate genes (549 SNPs). Psychiatry Res 2024; 333:115720. [PMID: 38224633 DOI: 10.1016/j.psychres.2024.115720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 12/11/2023] [Accepted: 01/03/2024] [Indexed: 01/17/2024]
Abstract
BACKGROUND This study analyzed the extent to which irregularities in genetic diversity separate psychiatric patients from healthy controls. METHODS Genetic diversity was quantified through multidimensional "gene vectors" assembled from 4 to 8 polymorphic SNPs located within each of 100 candidate genes. The number of different genotypic patterns observed per gene was called the gene's "diversity index". RESULTS The diversity indices were found to be only weakly correlated with their constituent number of SNPs (20.5 % explained variance), thus suggesting that genetic diversity is an intrinsic gene property that has evolved over the course of evolution. Significant deviations from "normal" diversity values were found for (1) major depression; (2) Alzheimer's disease; and (3) schizoaffective disorders. Almost one third of the genes were correlated with each other, with correlations ranging from 0.0303 to 0.7245. The central finding of this study was the discovery of "singular genes" characterized by distinctive genotypic patterns that appeared exclusively in patients but not in healthy controls. Neural Nets yielded nonlinear classifiers that correctly identified up to 90 % of patients. Overlaps between diagnostic subgroups on the genotype level suggested that (1) diagnoses-crossing vulnerabilities are likely involved in the pathogenesis of major psychiatric disorders; (2) clinically defined diagnoses may not constitute etiological entities. CONCLUSION Detailed analyses of the variation of genotypic patterns in genes along with the correlation between genes lead to nonlinear classifiers that enable very robust separation between psychiatric patients and healthy controls on the genotype level.
Collapse
|
3
|
ALPACA: A machine Learning Platform for Affinity and selectivity profiling of CAnnabinoids receptors modulators. Comput Biol Med 2023; 164:107314. [PMID: 37572442 DOI: 10.1016/j.compbiomed.2023.107314] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 07/10/2023] [Accepted: 08/07/2023] [Indexed: 08/14/2023]
Abstract
The development of small molecules that selectively target the cannabinoid receptor subtype 2 (CB2R) is emerging as an intriguing therapeutic strategy to treat neurodegeneration, as well as to contrast the onset and progression of cancer. In this context, in-silico tools able to predict CB2R affinity and selectivity with respect to the subtype 1 (CB1R), whose modulation is responsible for undesired psychotropic effects, are highly desirable. In this work, we developed a series of machine learning classifiers trained on high-quality bioactivity data of small molecules acting on CB2R and/or CB1R extracted from ChEMBL v30. Our classifiers showed strong predictive power in accurately determining CB2R affinity, CB1R affinity, and CB2R/CB1R selectivity. Among the built models, those obtained using random forest as algorithm proved to be the top-performing ones (AUC in validation ≥0.96) and were made freely accessible through a user-friendly web platform developed ad hoc and called ALPACA (https://www.ba.ic.cnr.it/softwareic/alpaca/). Due to its user-friendly interface and robust predictive power, ALPACA can be a valuable tool in saving both time and resources involved in the design of selective CB2R modulators.
Collapse
|
4
|
STIF: Intuitionistic fuzzy Gaussian membership function with statistical transformation weight of evidence and information value for private information preservation. DISTRIBUTED AND PARALLEL DATABASES 2023; 41:1-34. [PMID: 37359982 PMCID: PMC10121075 DOI: 10.1007/s10619-023-07423-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 04/10/2023] [Indexed: 06/28/2023]
Abstract
Data sharing to the multiple organizations are essential for analysis in many situations. The shared data contains the individual's private and sensitive information and results in privacy breach. To overcome the privacy challenges, privacy preserving data mining (PPDM) has progressed as a solution. This work addresses the problem of PPDM by proposing statistical transformation with intuitionistic fuzzy (STIF) algorithm for data perturbation. The STIF algorithm contains statistical methods weight of evidence, information value and intuitionistic fuzzy Gaussian membership function. The STIF algorithm is applied on three benchmark datasets adult income, bank marketing and lung cancer. The classifier models decision tree, random forest, extreme gradient boost and support vector machines are used for accuracy and performance analysis. The results show that the STIF algorithm achieves 99% of accuracy for adult income dataset and 100% accuracy for both bank marketing and lung cancer datasets. Further, the results highlights that the STIF algorithm outperforms in data perturbation capacity and privacy preserving capacity than the state-of-art algorithms without any information loss on both numerical and categorical data.
Collapse
|
5
|
Quantitative Structure Activity/Toxicity Relationship through Neural Networks for Drug Discovery or Regulatory Use. Curr Top Med Chem 2023; 23:2792-2804. [PMID: 37867278 DOI: 10.2174/0115680266251327231017053718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 08/19/2023] [Accepted: 08/28/2023] [Indexed: 10/24/2023]
Abstract
Quantitative structure - activity relationship (QSAR) modelling is widely used in medicinal chemistry and regulatory decision making. The large amounts of data collected in recent years in materials and life sciences projects provide a solid foundation for data-driven modelling approaches that have fostered the development of machine learning and artificial intelligence tools. An overview and discussion of the principles of QSAR modelling focus on the assembly and curation of data, computation of molecular descriptor, optimization, validation, and definition of the scope of the developed QSAR models. In this review, some examples of (QSAR) models based on artificial neural networks are given to demonstrate the effectiveness of nonlinear methods for extracting information from large data sets to classify new chemicals and predict their biological properties.
Collapse
|
6
|
Advanced deep learning approaches to predict supply chain risks under COVID-19 restrictions. EXPERT SYSTEMS WITH APPLICATIONS 2023; 211:118604. [PMID: 35999828 PMCID: PMC9389854 DOI: 10.1016/j.eswa.2022.118604] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2022] [Revised: 08/04/2022] [Accepted: 08/14/2022] [Indexed: 05/29/2023]
Abstract
The ongoing COVID-19 pandemic has created an unprecedented predicament for global supply chains (SCs). Shipments of essential and life-saving products, ranging from pharmaceuticals, agriculture, and healthcare, to manufacturing, have been significantly impacted or delayed, making the global SCs vulnerable. A better understanding of the shipment risks can substantially reduce that nervousness. Thenceforth, this paper proposes a few Deep Learning (DL) approaches to mitigate shipment risks by predicting "if a shipment can be exported from one source to another", despite the restrictions imposed by the COVID-19 pandemic. The proposed DL methodologies have four main stages: data capturing, de-noising or pre-processing, feature extraction, and classification. The feature extraction stage depends on two main variants of DL models. The first variant involves three recurrent neural networks (RNN) structures (i.e., long short-term memory (LSTM), Bidirectional long short-term memory (BiLSTM), and gated recurrent unit (GRU)), and the second variant is the temporal convolutional network (TCN). In terms of the classification stage, six different classifiers are applied to test the entire methodology. These classifiers are SoftMax, random trees (RT), random forest (RF), k-nearest neighbor (KNN), artificial neural network (ANN), and support vector machine (SVM). The performance of the proposed DL models is evaluated based on an online dataset (taken as a case study). The numerical results show that one of the proposed models (i.e., TCN) is about 100% accurate in predicting the risk of shipment to a particular destination under COVID-19 restrictions. Unarguably, the aftermath of this work will help the decision-makers to predict supply chain risks proactively to increase the resiliency of the SCs.
Collapse
|
7
|
Vietnamese children with and without DLD: Classifier use and grammaticality over time. JOURNAL OF COMMUNICATION DISORDERS 2023; 101:106297. [PMID: 36587459 PMCID: PMC10162499 DOI: 10.1016/j.jcomdis.2022.106297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 12/23/2022] [Accepted: 12/24/2022] [Indexed: 05/07/2023]
Abstract
INTRODUCTION One way to identify Developmental Language Disorder (DLD) is to establish clinical markers in a language to serve as reliable indicators of the disorder. This study embarks on the search for clinical markers for Vietnamese using longitudinal data from children with and without DLD. METHODS We matched ten children previously classified with DLD to ten with typical development (TD) by age and gender. Participants completed a story generation task at three time points: kindergarten, first, and second grade. Overall grammatical development was measured using mean length of utterance, MLU, and proportion of grammatical utterances, PGU. We examined a language-specific feature, classifiers, in terms of accuracy (omission errors), diversity (number of different classifiers), and productivity, or the use of classifiers in constructions of two-to-three elements (classifier+noun, numeral+classifier+noun). Longitudinal change and group differences were examined using linear mixed modeling, supplemented by linguistic analysis. RESULTS Both groups increased in MLU and PGU over time. The DLD group performed lower in kindergarten and continued to show lower performance over time on these measures. Classifier omission errors decreased over time with no group differences. Classifier diversity increased across groups, with lower performance by the DLD group in kindergarten and over time. For classifier productivity, TD children used classifiers in multiple constructions in kindergarten and maintained the same level over time. In contrast, children with DLD had minimal use of three-element constructions in kindergarten but increased in productivity over time. CONCLUSIONS Children with DLD produce shorter utterances with relatively more grammatical errors compared to their TD peers in the early school years. Though no longer committing classifier omission errors, children with DLD showed more restricted use of classifiers in terms of the number of different classifiers and constructions produced. Findings inform the search for Vietnamese clinical markers of DLD.
Collapse
|
8
|
Diabetes disease detection and classification on Indian demographic and health survey data using machine learning methods. Diabetes Metab Syndr 2023; 17:102690. [PMID: 36527769 DOI: 10.1016/j.dsx.2022.102690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Revised: 11/30/2022] [Accepted: 12/02/2022] [Indexed: 12/07/2022]
Abstract
BACKGROUND & AIM Diabetes mellitus has become one of the out brakes causing major health issues in developing countries like India. The need for leveraging technology is felt in diabetes management. The main objective of this work is to deploy machine learning methods for the detection and classification of diabetes having clinical relevance. METHODS Indian demographic and health survey-2016 dataset is considered and determined the risk factors for continuous and categorical data. Kernel entropy component analysis is used for the dimensionality reduction of the feature set. Predictive exploration-based machine learning methods like logistic regression, gaussian naive Bayes, linear discriminant analysis, support vector classifier, k-nearest neighbor, decision tree, extreme gradient boosting, kernel entropy component analysis, and random forest are deployed in the work. The deployed methodology has three phases: feature extraction, classification, and prediction. RESULTS Random Forest gave the maximum classification accuracy of 99.84% and 96.75% for imbalanced and kernel entropy component analysis-induced balanced datasets (using synthetic minority oversampling technique) respectively. The maximum precision of 99.64% is obtained using a support vector classifier on the balanced dataset. The area under the curve is 99%, which is observed from kernel entropy component analysis induced random forest on the balanced dataset. All other models performed moderately when applied to kernel entropy component analysis trained dataset. CONCLUSIONS Random Forest model performed better in comparison with other models. The overall performance of the machine learning models can be improved by training the diabetes dataset using kernel entropy component analysis.
Collapse
|
9
|
Artificial intelligence assisted tools for the detection of anxiety and depression leading to suicidal ideation in adolescents: a review. Cogn Neurodyn 2022:1-22. [PMID: 36467993 PMCID: PMC9684805 DOI: 10.1007/s11571-022-09904-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 09/26/2022] [Accepted: 10/17/2022] [Indexed: 11/24/2022] Open
Abstract
Epidemiological studies report high levels of anxiety and depression amongst adolescents. These psychiatric conditions and complex interplays of biological, social and environmental factors are important risk factors for suicidal behaviours and suicide, which show a peak in late adolescence and early adulthood. Although deaths by suicide have fallen globally in recent years, suicide deaths are increasing in some countries, such as the US. Suicide prevention is a challenging global public health problem. Currently, there aren't any validated clinical biomarkers for suicidal diagnosis, and traditional methods exhibit limitations. Artificial intelligence (AI) is budding in many fields, including in the diagnosis of medical conditions. This review paper summarizes recent studies (past 8 years) that employed AI tools for the automated detection of depression and/or anxiety disorder and discusses the limitations and effects of some modalities. The studies assert that AI tools produce promising results and could overcome the limitations of traditional diagnostic methods. Although using AI tools for suicidal ideation exhibits limitations, these are outweighed by the advantages. Thus, this review article also proposes extracting a fusion of features such as facial images, speech signals, and visual and clinical history features from deep models for the automated detection of depression and/or anxiety disorder in individuals, for future work. This may pave the way for the identification of individuals with suicidal thoughts.
Collapse
|
10
|
Classifying the difficulty levels of working memory tasks by using pupillary response. PeerJ 2022; 10:e12864. [PMID: 35368339 PMCID: PMC8973468 DOI: 10.7717/peerj.12864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 01/10/2022] [Indexed: 01/10/2023] Open
Abstract
Knowing the difficulty of a given task is crucial for improving the learning outcomes. This paper studies the difficulty level classification of memorization tasks from pupillary response data. Developing a difficulty level classifier from pupil size features is challenging because of the inter-subject variability of pupil responses. Eye-tracking data used in this study was collected while students solved different memorization tasks divided as low-, medium-, and high-level. Statistical analysis shows that values of pupillometric features (as peak dilation, pupil diameter change, and suchlike) differ significantly for different difficulty levels. We used a wrapper method to select the pupillometric features that work the best for the most common classifiers; Support Vector Machine (SVM), Decision Tree (DT), Linear Discriminant Analysis (LDA), and Random Forest (RF). Despite the statistical difference, experiments showed that a random forest classifier trained with five features obtained the best F1-score (82%). This result is essential because it describes a method to evaluate the cognitive load of a subject performing a task using only pupil size features.
Collapse
|
11
|
Benchmarking metagenomics classifiers on ancient viral DNA: a simulation study. PeerJ 2022; 10:e12784. [PMID: 35356467 PMCID: PMC8958974 DOI: 10.7717/peerj.12784] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 12/21/2021] [Indexed: 01/10/2023] Open
Abstract
Owing to technological advances in ancient DNA, it is now possible to sequence viruses from the past to track down their origin and evolution. However, ancient DNA data is considerably more degraded and contaminated than modern data making the identification of ancient viral genomes particularly challenging. Several methods to characterise the modern microbiome (and, within this, the virome) have been developed; in particular, tools that assign sequenced reads to specific taxa in order to characterise the organisms present in a sample of interest. While these existing tools are routinely used in modern data, their performance when applied to ancient microbiome data to screen for ancient viruses remains unknown. In this work, we conducted an extensive simulation study using public viral sequences to establish which tool is the most suitable to screen ancient samples for human DNA viruses. We compared the performance of four widely used classifiers, namely Centrifuge, Kraken2, DIAMOND and MetaPhlAn2, in correctly assigning sequencing reads to the corresponding viruses. To do so, we simulated reads by adding noise typical of ancient DNA to a set of publicly available human DNA viral sequences and to the human genome. We fragmented the DNA into different lengths, added sequencing error and C to T and G to A deamination substitutions at the read termini. Then we measured the resulting sensitivity and precision for all classifiers. Across most simulations, more than 228 out of the 233 simulated viruses were recovered by Centrifuge, Kraken2 and DIAMOND, in contrast to MetaPhlAn2 which recovered only around one third. Overall, Centrifuge and Kraken2 had the best performance with the highest values of sensitivity and precision. We found that deamination damage had little impact on the performance of the classifiers, less than the sequencing error and the length of the reads. Since Centrifuge can handle short reads (in contrast to DIAMOND and Kraken2 with default settings) and since it achieve the highest sensitivity and precision at the species level across all the simulations performed, it is our recommended tool. Regardless of the tool used, our simulations indicate that, for ancient human studies, users should use strict filters to remove all reads of potential human origin. Finally, we recommend that users verify which species are present in the database used, as it might happen that default databases lack sequences for viruses of interest.
Collapse
|
12
|
Ongoing neural oscillations predict the post-stimulus outcome of closed loop auditory stimulation during slow-wave sleep. Neuroimage 2022; 253:119055. [PMID: 35276365 DOI: 10.1016/j.neuroimage.2022.119055] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Revised: 02/26/2022] [Accepted: 03/01/2022] [Indexed: 10/18/2022] Open
Abstract
Large slow oscillations (SO, 0.5-2 Hz) characterise slow-wave sleep and are crucial to memory consolidation and other physiological functions. Manipulating slow oscillations may enhance sleep and memory, as well as benefitting the immune system. Closed-loop auditory stimulation (CLAS) has been demonstrated to increase the SO amplitude and to boost fast sleep spindle activity (11-16 Hz). Nevertheless, not all such stimuli are effective in evoking SOs, even when they are precisely phase locked. Here, we studied what factors of the ongoing activity patterns may help to determine what oscillations to stimulate to effectively enhance SOs or SO-locked spindle activity. Hence, we trained classifiers using the morphological characteristics of the ongoing SO, as measured by electroencephalography (EEG), to predict whether stimulation would lead to a benefit in terms of the resulting SO and spindle amplitude. Separate classifiers were trained using trials from spontaneous control and stimulated datasets, and we evaluated their performance by applying them to held-out data both within and across conditions. We were able to predict both when large SOs occurred spontaneously, and whether a phase-locked auditory click effectively enlarged them with good accuracy for predicting the SO trough (∼70%) and SO peak values (∼80%). Also, we were able to predict when stimulation would elicit spindle activity with an accuracy of ∼60%. Finally, we evaluate the importance of the various SO features used to make these predictions. Our results offer new insight into SO and spindle dynamics and may suggest techniques for developing future methods for online optimization of stimulation.
Collapse
|
13
|
Feeding the machine: Challenges to reproducible predictive modeling in resting-state connectomics. Netw Neurosci 2022; 6:29-48. [PMID: 35350584 PMCID: PMC8942606 DOI: 10.1162/netn_a_00212] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 10/08/2021] [Indexed: 11/04/2022] Open
Abstract
In this critical review, we examine the application of predictive models, for example, classifiers, trained using machine learning (ML) to assist in interpretation of functional neuroimaging data. Our primary goal is to summarize how ML is being applied and critically assess common practices. Our review covers 250 studies published using ML and resting-state functional MRI (fMRI) to infer various dimensions of the human functional connectome. Results for holdout ("lockbox") performance was, on average, ∼13% less accurate than performance measured through cross-validation alone, highlighting the importance of lockbox data, which was included in only 16% of the studies. There was also a concerning lack of transparency across the key steps in training and evaluating predictive models. The summary of this literature underscores the importance of the use of a lockbox and highlights several methodological pitfalls that can be addressed by the imaging community. We argue that, ideally, studies are motivated both by the reproducibility and generalizability of findings as well as the potential clinical significance of the insights. We offer recommendations for principled integration of machine learning into the clinical neurosciences with the goal of advancing imaging biomarkers of brain disorders, understanding causative determinants for health risks, and parsing heterogeneous patient outcomes.
Collapse
|
14
|
Diagnosis and grading of vesicoureteral reflux on voiding cystourethrography images in children using a deep hybrid model. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021; 210:106369. [PMID: 34474195 DOI: 10.1016/j.cmpb.2021.106369] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 08/17/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND AND OBJECTIVE Vesicoureteral reflux is the leakage of urine from the bladder into the ureter. As a result, urinary tract infections and kidney scarring can occur in children. Voiding cystourethrography is the primary radiological imaging method used to diagnose vesicoureteral reflux in children with a history of recurrent urinary tract infection. Besides the diagnosis of reflux, it is graded with voiding cystourethrography. In this study, we aimed to diagnose and grade vesicoureteral reflux in Voiding cystourethrography images using hybrid CNN in deep learning methods. METHODS Images of pediatric patients diagnosed with VUR between 2016 and 2021 in our hospital (Firat University Hospital) were graded according to the international vesicoureteral reflux radiographic grading system. VCUG images of 236 normal and 992 with vesicoureteral reflux pediatric patients were available. A total of 6 classes were created as normal and graded 1-5 patients. RESULTS In this study, a hybrid-based mRMR (Minimum Redundancy Maximum Relevance) using CNN (Convolutional Neural Networks) model is developed for the diagnosis and grading of vesicoureteral reflux on voiding cystourethrography images. Googlenet, MobilenetV2, and Densenet201 models are used as a part of the hybrid architecture. The obtained features from these architectures are examined in concatenating process. Then, these features are classified in machine learning classifiers after optimizing with the mRMR method. Among the models used in the study, the highest accuracy value was obtained in the proposed model with an accuracy rate of 96.9%. CONCLUSIONS It shows that the hybrid model developed according to the findings of our study can be used in the diagnosis and grading of vesicoureteral reflux in voiding cystourethrography images.
Collapse
|
15
|
Prediction of antischistosomal small molecules using machine learning in the era of big data. Mol Divers 2021; 26:1597-1607. [PMID: 34351547 DOI: 10.1007/s11030-021-10288-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 07/24/2021] [Indexed: 12/13/2022]
Abstract
Schistosomiasis is a neglected tropical disease caused by helminths of the Schistosoma genus. Despite its high morbidity and socio-economic burden, therapeutics are just a handful with praziquantel being the main drug. Praziquantel is an old drug registered for human use in 1982 and has since been administered en masse for chemotherapy, risking the development of resistance, thus the need for new drugs with different mechanisms of action. This review examines the use of machine learning (ML) in this era of big data to aid in the prediction of novel antischistosomal molecules. It first discusses the challenges of drug discovery in schistosomiasis. Explanations are then offered for big data, its characteristics and then, some open databases where large biochemical data on schistosomiasis can be obtained for ML model development are examined. The concepts of artificial intelligence, ML, and deep learning and their drug applications are explored in schistosomiasis. The use of binary classification in predicting antischistosomal compounds and some algorithms that have been applied including random forest and naive Bayesian are discussed. For this review, some deep learning algorithms (deep neural networks) are proposed as novel algorithms for predicting antischistosomal molecules via binary classification. Databases specifically designed for housing bioactivity data on antischistosomal molecules enriched with functional genomic datasets and ontologies are thus urgently needed for developing predictive ML models. This shows the application of machine learning techniques for the discovery of novel antischistosomal small molecules via binary classification in the era of big data.
Collapse
|
16
|
The accuracy of quantitative EEG biomarker algorithms depends upon seizure onset dynamics. Epilepsy Res 2021; 176:106702. [PMID: 34229226 DOI: 10.1016/j.eplepsyres.2021.106702] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 06/05/2021] [Accepted: 06/22/2021] [Indexed: 01/01/2023]
Abstract
OBJECTIVE To compare the performance of different ictal quantitative biomarkers of the seizure onset zone (SOZ) across many seizures in a cohort of consecutive patients with a variety of seizure onset patterns. METHODS The Epileptogenicity Index (EI, a measure of fast activity) and Slow Polarizing Shift index (SPS, a measure of infraslow activity) were calculated for 212 seizures (22 patients). After stratification by onset pattern, median index values inside and outside the SOZ were compared in aggregate and for each of the onset patterns. Receiver Operating Characteristic (ROC) curves were constructed to compare the performance of each index. RESULTS Median values of EI (0.056 vs 0.0087), SPS (0.27 vs 0.19), and CI (0.21 vs 0.12) were significantly higher for contacts inside the SOZ, all p < 0.0001. Analysis of AUC showed variable performance of these indices across seizure types, although AUC for EI and SPS was generally greatest for seizures with fast activity at onset. CONCLUSIONS All indices were significantly higher for contacts inside the SOZ; however, the performance of these indices varied depending on the pattern of seizure onset. SIGNIFICANCE These findings suggest that future studies of quantitative biomarkers of the SOZ should account for seizure onset pattern.
Collapse
|
17
|
Maintaining proper health records improves machine learning predictions for novel 2019-nCoV. BMC Med Inform Decis Mak 2021; 21:172. [PMID: 34044839 PMCID: PMC8159067 DOI: 10.1186/s12911-021-01537-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2020] [Accepted: 05/23/2021] [Indexed: 11/19/2022] Open
Abstract
Background An ongoing outbreak of a novel coronavirus (2019-nCoV) pneumonia continues to affect the whole world including major countries such as China, USA, Italy, France and the United Kingdom. We present outcome (‘recovered’, ‘isolated’ or ‘death’) risk estimates of 2019-nCoV over ‘early’ datasets. A major consideration is the likelihood of death for patients with 2019-nCoV. Method Accounting for the impact of the variations in the reporting rate of 2019-nCoV, we used machine learning techniques (AdaBoost, bagging, extra-trees, decision trees and k-nearest neighbour classifiers) on two 2019-nCoV datasets obtained from Kaggle on March 30, 2020. We used ‘country’, ‘age’ and ‘gender’ as features to predict outcome for both datasets. We included the patient’s ‘disease’ history (only present in the second dataset) to predict the outcome for the second dataset. Results The use of a patient’s ‘disease’ history improves the prediction of ‘death’ by more than sevenfold. The models ignoring a patent’s ‘disease’ history performed poorly in test predictions. Conclusion Our findings indicate the potential of using a patient’s ‘disease’ history as part of the feature set in machine learning techniques to improve 2019-nCoV predictions. This development can have a positive effect on predictive patient treatment and can result in easing currently overburdened healthcare systems worldwide, especially with the increasing prevalence of second and third wave re-infections in some countries.
Collapse
|
18
|
Discrimination between healthy and patients with Parkinson's disease from hand resting activity using inertial measurement unit. Biomed Eng Online 2021; 20:50. [PMID: 34022895 PMCID: PMC8141164 DOI: 10.1186/s12938-021-00888-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 05/11/2021] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Parkinson's disease (PD) is a neurological disease that affects the motor system. The associated motor symptoms are muscle rigidity or stiffness, bradykinesia, tremors, and gait disturbances. The correct diagnosis, especially in the initial stages, is fundamental to the life quality of the individual with PD. However, the methods used for diagnosis of PD are still based on subjective criteria. As a result, the objective of this study is the proposal of a method for the discrimination of individuals with PD (in the initial stages of the disease) from healthy groups, based on the inertial sensor recordings. METHODS A total of 27 participants were selected, 15 individuals previously diagnosed with PD and 12 healthy individuals. The data collection was performed using inertial sensors (positioned on the back of the hand and on the back of the forearm). Different numbers of features were used to compare the values of sensitivity, specificity, precision, and accuracy of the classifiers. For group classification, 4 classifiers were used and compared, those being [Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN), and Naive Bayes (NB)]. RESULTS When all individuals with PD were analyzed, the best performance for sensitivity and accuracy (0.875 and 0.800, respectively) was found in the SVM classifier, fed with 20% and 10% of the features, respectively, while the best performance for specificity and precision (0.933 and 0.917, respectively) was associated with the RF classifier fed with 20% of all the features. When only individuals with PD and score 1 on the Hoehn and Yahr scale (HY) were analyzed, the best performances for sensitivity, precision and accuracy (0.933, 0.778 and 0.848, respectively) were from the SVM classifier, fed with 40% of all features, and the best result for precision (0.800) was connected to the NB classifier, fed with 20% of all features. CONCLUSION Through an analysis of all individuals in this study with PD, the best classifier for the detection of PD (sensitivity) was the SVM fed with 20% of the features and the best classifier for ruling out PD (specificity) was the RF classifier fed with 20% of the features. When analyzing individuals with PD and score HY = 1, the SVM classifier was superior across the sensitivity, precision, and accuracy, and the NB classifier was superior in the specificity. The obtained result indicates that objective methods can be applied to help in the evaluation of PD.
Collapse
|
19
|
Computed tomography-based radiomic model at node level for the prediction of normal-sized lymph node metastasis in cervical cancer. Transl Oncol 2021; 14:101113. [PMID: 33975178 PMCID: PMC8131712 DOI: 10.1016/j.tranon.2021.101113] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Revised: 04/22/2021] [Accepted: 04/22/2021] [Indexed: 12/14/2022] Open
Abstract
The metastatic status of lymph nodes in cervical cancer patients can be predicted. Computed tomography-based radiomic model can identify the status of the normal-sized lymph node singly. The model may help doctors to make staging and clinical decision, and realize individualized treatment.
Purpose Radiomic models have been demonstrated to have acceptable discrimination capability for detecting lymph node metastasis (LNM). We aimed to develop a computed tomography–based radiomic model and validate its usefulness in the prediction of normal-sized LNM at node level in cervical cancer. Methods A total of 273 LNs of 219 patients from 10 centers were evaluated in this study. We randomly divided the LNs from the 2 centers with the largest number of LNs into the training and internal validation cohorts, and the rest as the external validation cohort. Radiomic features were extracted from the arterial and venous phase images. We trained an artificial neural network (ANN) to develop two single-phase models. A radiomic model reflecting the features of two-phase images was also built for directly predicting LNM in cervical cancer. Moreover, four state-of-the-art methods were used for comparison. The performance of all models was assessed using the area under the receiver operating characteristic curve (AUC). Results Among the models we built, the models combining the features of two phases surpassed the single-phase models, and the models generated by ANN had better performance than the others. We found that the radiomic model achieved the highest AUCs of 0.912 and 0.859 in the training and internal validation cohorts, respectively. In the external validation cohort, the AUC of the radiomic model was 0.800. Conclusion We constructed a radiomic model that exhibited great ability in the prediction of LNM. The application of the model could optimize clinical staging and decision-making.
Collapse
|
20
|
Automated interpretation of biopsy images for the detection of celiac disease using a machine learning approach. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021; 203:106010. [PMID: 33831693 DOI: 10.1016/j.cmpb.2021.106010] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 02/15/2021] [Indexed: 06/12/2023]
Abstract
BACKGROUND AND OBJECTIVES Celiac disease is an autoimmune disease occurring in about 1 in 100 people worldwide. Early diagnosis and efficient treatment are crucial in mitigating the complications that are associated with untreated celiac disease, such as intestinal lymphoma and malignancy, and the subsequent high morbidity. The current diagnostic methods using small intestinal biopsy histopathology, endoscopy, and video capsule endoscopy (VCE) involve manual interpretation of photomicrographs or images, which can be time-consuming and difficult, with inter-observer variability. In this paper, a machine learning technique was developed for the automation of biopsy image analysis to detect and classify villous atrophy based on modified Marsh scores. This is one of the first studies to employ conventional machine learning to automate the use of biopsy images for celiac disease detection and classification. METHODS The Steerable Pyramid Transform (SPT) method was used to obtain sub bands from which various types of entropy and nonlinear features were computed. All extracted features were automatically classified into two-class and multi-class, using six classifiers. RESULTS An accuracy of 88.89%, was achieved for the classification of two-class villous abnormalities based on analysis of Hematoxylin and Eosin (H&E) stained biopsy images. Similarly, an accuracy of 82.92% was achieved for the two-class classification of red-green-blue (RGB) biopsy images. Also, an accuracy of 72% was achieved in the classification of multi-class biopsy images. CONCLUSION The results obtained are promising, and demonstrate the possibility of automating biopsy image interpretation using machine learning. This can assist pathologists in accelerating the diagnostic process without bias, resulting in greater accuracy, and ultimately, earlier access to treatment.
Collapse
|
21
|
Evaluation of supervised machine-learning methods for predicting appearance traits from DNA. Forensic Sci Int Genet 2021; 53:102507. [PMID: 33831816 DOI: 10.1016/j.fsigen.2021.102507] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Revised: 02/26/2021] [Accepted: 03/17/2021] [Indexed: 11/20/2022]
Abstract
The prediction of human externally visible characteristics (EVCs) based solely on DNA information has become an established approach in forensic and anthropological genetics in recent years. While for a large set of EVCs, predictive models have already been established using multinomial logistic regression (MLR), the prediction performances of other possible classification methods have not been thoroughly investigated thus far. Motivated by the question to identify a potential classifier that outperforms these specific trait models, we conducted a systematic comparison between the widely used MLR and three popular machine learning (ML) classifiers, namely support vector machines (SVM), random forest (RF) and artificial neural networks (ANN), that have shown good performance outside EVC prediction. As examples, we used eye, hair and skin color categories as phenotypes and genotypes based on the previously established IrisPlex, HIrisPlex, and HIrisPlex-S DNA markers. We compared and assessed the performances of each of the four methods, complemented by detailed hyperparameter tuning that was applied to some of the methods in order to maximize their performance. Overall, we observed that all four classification methods showed rather similar performance, with no method being substantially superior to the others for any of the traits, although performances varied slightly across the different traits and more so across the trait categories. Hence, based on our findings, none of the ML methods applied here provide any advantage on appearance prediction, at least when it comes to the categorical pigmentation traits and the selected DNA markers used here.
Collapse
|
22
|
Automated detection of conduct disorder and attention deficit hyperactivity disorder using decomposition and nonlinear techniques with EEG signals. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021; 200:105941. [PMID: 33486340 DOI: 10.1016/j.cmpb.2021.105941] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Accepted: 01/10/2021] [Indexed: 05/22/2023]
Abstract
BACKGROUND AND OBJECTIVES Attention deficit hyperactivity disorder (ADHD) is often presented with conduct disorder (CD). There is currently no objective laboratory test or diagnostic method to discern between ADHD and CD, and diagnosis is further made difficult as ADHD is a common neuro-developmental disorder often presenting with other co-morbid difficulties; and in particular with conduct disorder which has a high degree of associated behavioural challenges. A novel automated system (AS) is proposed as a convenient supplementary tool to support clinicians in their diagnostic decisions. To the best of our knowledge, we are the first group to develop an automated classification system to classify ADHD, CD and ADHD+CD classes using brain signals. METHODS The empirical mode decomposition (EMD) and discrete wavelet transform (DWT) methods were employed to decompose the electroencephalogram (EEG) signals. Autoregressive modelling coefficients and relative wavelet energy were then computed on the signals. Various nonlinear features were extracted from the decomposed coefficients. Adaptive synthetic sampling (ADASYN) was then employed to balance the dataset. The significant features were selected using sequential forward selection method. The highly discriminatory features were subsequently fed to an array of classifiers. RESULTS The highest accuracy of 97.88% was achieved with the K-Nearest Neighbour (KNN) classifier. The proposed system was developed using ten-fold validation strategy on EEG data from 123 children. To the best of our knowledge this is the first study to develop an AS for the classification of ADHD, CD and ADHD+CD classes using EEG signals. POTENTIAL APPLICATION Our AS can potentially be used as a web-based application with cloud system to aid the clinical diagnosis of ADHD and/or CD, thus supporting faster and accurate treatment for the children. It is important to note that testing with larger data is required before the AS can be employed for clinical applications.
Collapse
|
23
|
Identification of marker genes in Alzheimer's disease using a machine-learning model. Bioinformation 2021; 17:348-355. [PMID: 34234395 PMCID: PMC8225597 DOI: 10.6026/97320630017348] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2020] [Revised: 02/24/2021] [Accepted: 02/27/2021] [Indexed: 11/23/2022] Open
Abstract
Alzheimer's Disease (AD) is one of the most common causes of dementia, mostly affecting the elderly population. Currently, there is no proper diagnostic tool or method available for the detection of AD. The present study used two distinct data sets of AD genes,
which could be potential biomarkers in the diagnosis. The differentially expressed genes (DEGs) curated from both datasets were used for machine learning classification, tissue expression annotation and co-expression analysis. Further, CNPY3, GPR84, HIST1H2AB, HIST1H2AE,
IFNAR1, LMO3, MYO18A, N4BP2L1, PML, SLC4A4, ST8SIA4, TLE1 and N4BP2L1 were identified as highly significant DEGs and exhibited co-expression with other query genes. Moreover, a tissue expression study found that these genes are also expressed in the brain tissue.
In addition to the earlier studies for marker gene identification, we have considered a different set of machine learning classifiers to improve the accuracy rate from the analysis. Amongst all the six classification algorithms, J48 emerged as the best classifier,
which could be used for differentiating healthy and diseased samples. SMO/SVM and Logit Boost further followed J48 to achieve the classification accuracy.
Collapse
|
24
|
Towards an anxiety and stress recognition system for academic environments based on physiological features. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2020; 190:105408. [PMID: 32139112 DOI: 10.1016/j.cmpb.2020.105408] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2019] [Revised: 11/17/2019] [Accepted: 02/18/2020] [Indexed: 06/10/2023]
Abstract
BACKGROUND AND OBJECTIVE Traditional methods to determine stress and anxiety in academic environments consist of the application of questionnaires, but the main disadvantage is that the results depend on the students' self-perception. Being able to detect anxiety-related stress levels in a simple and objective way contributes greatly to dealing with low performance and school drop-out by students. METHODS The main contribution of this study is to identify the physiological features that could be used as predictors of stressful activities and states of anxiety in academic environments using an Arduino board and low-cost sensors. A test with 21 students was conducted, and a stress-inducing protocol was proposed and 21 physiological features of five signals were analyzed. In addition, the State-Trait Anxiety Inventory (STAI) was used to assess the level of anxiety for each student. Four classifiers were compared to find the physiological feature subset that provides the best accuracy to identify states of stress and anxiety. RESULTS The stress due to activities performed by students can be identified with an accuracy greater than 90% (Kappa = 0.84) using the k-Nearest Neighbors classifier, using data from heart rate, skin temperature and oximetry signals and four physiological features. Meanwhile, the identification of anxiety was achieved with an accuracy greater than 95% (Kappa = 0.90) using the SVM classifier with data from the galvanic skin response (GSR) signal and three physiological features. CONCLUSIONS The results provide a clue that anxiety detection in academic environments could be done using the analysis of physiological signals instead of STAI test scores. Besides, the results suggest that physiological features could be used to develop stress recognition systems to help teachers to identify the stressful tasks in an academic environment or to develop anxiety recognition systems to help students to control their level of anxiety when they are performing either academic tasks or exams.
Collapse
|
25
|
Spatial attention affects the early processing of neutral versus fearful faces when they are task-irrelevant: a classifier study of the EEG C1 component. COGNITIVE AFFECTIVE & BEHAVIORAL NEUROSCIENCE 2020; 19:123-137. [PMID: 30341623 DOI: 10.3758/s13415-018-00650-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
EEG studies suggest that the emotional content of visual stimuli is processed rapidly. In particular, the C1 component, which occurs up to 100 ms after stimulus onset and likely reflects activity in primary visual cortex V1, has been reported to be sensitive to emotional faces. However, difficulties replicating these results have been reported. We hypothesized that the nature of the task and attentional condition are key to reconcile the conflicting findings. We report three experiments of EEG activity during the C1 time range elicited by peripherally presented neutral and fearful faces under various attentional conditions: the faces were spatially attended or unattended and were either task-relevant or not. Using traditional event-related potential analysis, we found that the early activity changed depending on facial expression, attentional condition, and task. In addition, we trained classifiers to discriminate the different conditions from the EEG signals. Although the classifiers were not able to discriminate between facial expressions in any condition, they uncovered differences between spatially attended and unattended faces but solely when these were task-irrelevant. In addition, this effect was only present for neutral faces. Our study provides further indication that attention and task are key parameters when measuring early differences between emotional and neutral visual stimuli.
Collapse
|
26
|
Robust automated computational approach for classifying frontotemporal neurodegeneration: Multimodal/multicenter neuroimaging. ALZHEIMER'S & DEMENTIA (AMSTERDAM, NETHERLANDS) 2019; 11:588-598. [PMID: 31497638 PMCID: PMC6719282 DOI: 10.1016/j.dadm.2019.06.002] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
INTRODUCTION Timely diagnosis of behavioral variant frontotemporal dementia (bvFTD) remains challenging because it depends on clinical expertise and potentially ambiguous diagnostic guidelines. Recent recommendations highlight the role of multimodal neuroimaging and machine learning methods as complementary tools to address this problem. METHODS We developed an automatic, cross-center, multimodal computational approach for robust classification of patients with bvFTD and healthy controls. We analyzed structural magnetic resonance imaging and resting-state functional connectivity from 44 patients with bvFTD and 60 healthy controls (across three imaging centers with different acquisition protocols) using a fully automated processing pipeline, including site normalization, native space feature extraction, and a random forest classifier. RESULTS Our method successfully combined multimodal imaging information with high accuracy (91%), sensitivity (83.7%), and specificity (96.6%). DISCUSSION This multimodal approach enhanced the system's performance and provided a clinically informative method for neuroimaging analysis. This underscores the relevance of combining multimodal imaging and machine learning as a gold standard for dementia diagnosis.
Collapse
|
27
|
Automatic discovery of 100-miRNA signature for cancer classification using ensemble feature selection. BMC Bioinformatics 2019; 20:480. [PMID: 31533612 PMCID: PMC6751684 DOI: 10.1186/s12859-019-3050-8] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Accepted: 08/22/2019] [Indexed: 12/16/2022] Open
Abstract
Background MicroRNAs (miRNAs) are noncoding RNA molecules heavily involved in human tumors, in which few of them circulating the human body. Finding a tumor-associated signature of miRNA, that is, the minimum miRNA entities to be measured for discriminating both different types of cancer and normal tissues, is of utmost importance. Feature selection techniques applied in machine learning can help however they often provide naive or biased results. Results An ensemble feature selection strategy for miRNA signatures is proposed. miRNAs are chosen based on consensus on feature relevance from high-accuracy classifiers of different typologies. This methodology aims to identify signatures that are considerably more robust and reliable when used in clinically relevant prediction tasks. Using the proposed method, a 100-miRNA signature is identified in a dataset of 8023 samples, extracted from TCGA. When running eight-state-of-the-art classifiers along with the 100-miRNA signature against the original 1046 features, it could be detected that global accuracy differs only by 1.4%. Importantly, this 100-miRNA signature is sufficient to distinguish between tumor and normal tissues. The approach is then compared against other feature selection methods, such as UFS, RFE, EN, LASSO, Genetic Algorithms, and EFS-CLA. The proposed approach provides better accuracy when tested on a 10-fold cross-validation with different classifiers and it is applied to several GEO datasets across different platforms with some classifiers showing more than 90% classification accuracy, which proves its cross-platform applicability. Conclusions The 100-miRNA signature is sufficiently stable to provide almost the same classification accuracy as the complete TCGA dataset, and it is further validated on several GEO datasets, across different types of cancer and platforms. Furthermore, a bibliographic analysis confirms that 77 out of the 100 miRNAs in the signature appear in lists of circulating miRNAs used in cancer studies, in stem-loop or mature-sequence form. The remaining 23 miRNAs offer potentially promising avenues for future research. Electronic supplementary material The online version of this article (10.1186/s12859-019-3050-8) contains supplementary material, which is available to authorized users.
Collapse
|
28
|
Comparing radiomic classifiers and classifier ensembles for detection of peripheral zone prostate tumors on T2-weighted MRI: a multi-site study. BMC Med Imaging 2019; 19:22. [PMID: 30819131 PMCID: PMC6396464 DOI: 10.1186/s12880-019-0308-6] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2018] [Accepted: 01/10/2019] [Indexed: 11/10/2022] Open
Abstract
Background For most computer-aided diagnosis (CAD) problems involving prostate cancer detection via medical imaging data, the choice of classifier has been largely ad hoc, or been motivated by classifier comparison studies that have involved large synthetic datasets. More significantly, it is currently unknown how classifier choices and trends generalize across multiple institutions, due to heterogeneous acquisition and intensity characteristics (especially when considering MR imaging data). In this work, we empirically evaluate and compare a number of different classifiers and classifier ensembles in a multi-site setting, for voxel-wise detection of prostate cancer (PCa) using radiomic texture features derived from high-resolution in vivo T2-weighted (T2w) MRI. Methods Twelve different supervised classifier schemes: Quadratic Discriminant Analysis (QDA), Support Vector Machines (SVMs), naïve Bayes, Decision Trees (DTs), and their ensemble variants (bagging, boosting), were compared in terms of classification accuracy as well as execution time. Our study utilized 85 prostate cancer T2w MRI datasets acquired from across 3 different institutions (1 for discovery, 2 for independent validation), from patients who later underwent radical prostatectomy. Surrogate ground truth for disease extent on MRI was established by expert annotation of pre-operative MRI through spatial correlation with corresponding ex vivo whole-mount histology sections. Classifier accuracy in detecting PCa extent on MRI on a per-voxel basis was evaluated via area under the ROC curve. Results The boosted DT classifier yielded the highest cross-validated AUC (= 0.744) for detecting PCa in the discovery cohort. However, in independent validation, the boosted QDA classifier was identified as the most accurate and robust for voxel-wise detection of PCa extent (AUCs of 0.735, 0.683, 0.768 across the 3 sites). The next most accurate and robust classifier was the single QDA classifier, which also enjoyed the advantage of significantly lower computation times compared to any of the other methods. Conclusions Our results therefore suggest that simpler classifiers (such as QDA and its ensemble variants) may be more robust, accurate, and efficient for prostate cancer CAD problems, especially in the context of multi-site validation.
Collapse
|
29
|
Proposal of a Machine Learning Approach to Differentiate Mild and Alzheimer's Condition in MR Images Using Shape Changes in Corpus Callosum. Stud Health Technol Inform 2019; 258:243-244. [PMID: 30942758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The brain ventricles are surrounded by periventricular structures that are affected by dementia which results in neurodegenerative disorder such as Alzheimer's Disease (AD). The change in morphology of these structures must effect the shape and volume of Corpus Callosum (CC). These alterations in morphology of CC are considered to be a significant image biomarker for the early diagnosis of Mild Cognitive Impairment (MCI) and Alzheimer's disease (AD) subjects. Shape descriptors provide useful information about change in morphology of various brain structures during disease progression. In this work, Lattice Boltzmann criterion based hybrid level set method (LSM) is used to segment CC. Geometric and pseudo-Zernike moment measures are extracted from the segmented area of CC and are statistically analyzed using Statistical Package for Social Science (SPSS). The performance metric of significant moments is validated using machine learning algorithms. Results demonstrate that, hybrid level set is able to delineate CC and the segmented images are in high correlation with ground truth images. High accuracy value of 85.0% has been achieved using Multilayer Perceptron (MLP) classifier for Healthy Control (HC) versus AD subjects. Thus, moments are able to classify MCI from HC and AD subjects with high accuracy and hence the results are found to be clinically significant.
Collapse
|
30
|
LS-GSNO and CWSNO Enhancement Processes Using PCA Algorithm with LOOCV of R-SM Technique for Effective Face Recognition Approach. J Med Syst 2018; 43:12. [PMID: 30535633 DOI: 10.1007/s10916-018-1128-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2018] [Accepted: 11/22/2018] [Indexed: 10/27/2022]
Abstract
The eminence of image under test is identified with different methods of Face Recognition (FR) which results in failure due to rapid change in pixel intensity. The identification of similar face with inter class similarity is very difficult in imaging. The imaging technology faces difficult in the mounting of intra class variability because of accommodate, intra-class variability because of head pose, illumination conditions, expressions, facial accessories, aging effects and cartoon faces. In the earliest approach, gradient with Zernike momemts were used to regonize the faces, the performance is low to overcome this a new approach is introduced. Many features of FR are affected by the outcome and low occurrence of performance is observed which is applicable only for data sets that are smaller. The introduction of a new approach can overcome the above stated limitations. This paper describes a novel approach for LS enhancement technique using GSNO and CWSNO, and extracts the PCA features with three ways such as mean, median and mode which are then classified with MD classifier using LOOCV of R-SM to recognize the faces. The performance metrics is also computed and compared. Performance metrics of the proposed approach and the current approach are computed and compared. Thus, the suggested method is useful for increasing the visibility of facial recognition, and overcoming a pose, similarity and illumination problem, which provides a more accurate investigation of the required recognition procedures.
Collapse
|
31
|
Analysis and prediction of animal toxins by various Chou's pseudo components and reduced amino acid compositions. J Theor Biol 2018; 462:221-229. [PMID: 30452961 DOI: 10.1016/j.jtbi.2018.11.010] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Revised: 11/06/2018] [Accepted: 11/15/2018] [Indexed: 01/19/2023]
Abstract
The animal toxin proteins are one of the disulfide rich small peptides that detected in venomous species. They are used as pharmacological tools and therapeutic agents in medicine for the high specificity of their targets. The successful analysis and prediction of toxin proteins may have important signification for the pharmacological and therapeutic researches of toxins. In this study, significant differences were found between the toxins and the non-toxins in amino acid compositions and several important biological properties. The random forest was firstly proposed to predict the animal toxin proteins by selecting 400 pseudo amino acid compositions and the dipeptide compositions of reduced amino acid alphabet as the input parameters. Based on dipeptide composition of reduced amino acid alphabet with 13 reduced amino acids, the best overall accuracy of 85.71% was obtained. These results indicated that our algorithm was an efficient tool for the animal toxin prediction.
Collapse
|
32
|
New perspectives on multilocus ancestry informativeness. Math Biosci 2018; 306:60-81. [PMID: 30385120 DOI: 10.1016/j.mbs.2018.10.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Revised: 10/24/2018] [Accepted: 10/25/2018] [Indexed: 10/28/2022]
Abstract
We present an axiomatic approach for multilocus informativeness measures for determining the amount of information that a set of polymorphic genetic markers provides about individual ancestry. We then reveal several surprising properties of a decision-theoretic based measure that is consistent with the set of proposed criteria for multilocus informativeness. In particular, these properties highlight the interplay between information originating from population priors and the information extractable from the population genetic variants. This analysis then reveals a certain deficiency of mutual information based multilocus informativeness measures when such population priors are incorporated. Finally, we analyse and quantify the inevitable inherent decrease in informativeness due to learning from finite population samples.
Collapse
|
33
|
Quantitative or qualitative transcriptional diagnostic signatures? A case study for colorectal cancer. BMC Genomics 2018; 19:99. [PMID: 29378509 PMCID: PMC5789529 DOI: 10.1186/s12864-018-4446-y] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Accepted: 01/11/2018] [Indexed: 12/20/2022] Open
Abstract
Background Due to experimental batch effects, the application of a quantitative transcriptional signature for disease diagnoses commonly requires inter-sample data normalization, which would be hardly applicable under common clinical settings. Many cancers might have qualitative differences with the non-cancer states in the gene expression pattern. Therefore, it is reasonable to explore the power of qualitative diagnostic signatures which are robust against experimental batch effects and other random factors. Results Firstly, using data of technical replicate samples from the MicroArray Quality Control (MAQC) project, we demonstrated that the low-throughput PCR-based technologies also exist large measurement variations for gene expression even when the samples were measured in the same test site. Then, we demonstrated the critical limitation of low stability for classifiers based on quantitative transcriptional signatures in applications to individual samples through a case study using a support vector machine and a naïve Bayesian classifier to discriminate colorectal cancer tissues from normal tissues. To address this problem, we identified a signature consisting of three gene pairs for discriminating colorectal cancer tissues from non-cancer (normal and inflammatory bowel disease) tissues based on within-sample relative expression orderings (REOs) of these gene pairs. The signature was well verified using 22 independent datasets measured by different microarray and RNA_seq platforms, obviating the need of inter-sample data normalization. Conclusions Subtle quantitative information of gene expression measurements tends to be unstable under current technical conditions, which will introduce uncertainty to clinical applications of the quantitative transcriptional diagnostic signatures. For diagnosis of disease states with qualitative transcriptional characteristics, the qualitative REO-based signatures could be robustly applied to individual samples measured by different platforms. Electronic supplementary material The online version of this article (10.1186/s12864-018-4446-y) contains supplementary material, which is available to authorized users.
Collapse
|
34
|
Abstract
Molecular modeling frequently constructs classification models for the prediction of two‐class entities, such as compound bio(in)activity, chemical property (non)existence, protein (non)interaction, and so forth. The models are evaluated using well known metrics such as accuracy or true positive rates. However, these frequently used metrics applied to retrospective and/or artificially generated prediction datasets can potentially overestimate true performance in actual prospective experiments. Here, we systematically consider metric value surface generation as a consequence of data balance, and propose the computation of an inverse cumulative distribution function taken over a metric surface. The proposed distribution analysis can aid in the selection of metrics when formulating study design. In addition to theoretical analyses, a practical example in chemogenomic virtual screening highlights the care required in metric selection and interpretation.
Collapse
|
35
|
Comparative Analysis of Algorithmic Approaches for Auto-Coding with ICD-10-AM and ACHI. Stud Health Technol Inform 2018; 252:73-79. [PMID: 30040686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Clinical coding is done using ICD-10-AM (International Classification of Diseases, version 10, Australian Modification) and ACHI (Australian Classification of Health Interventions) in acute and sub-acute hospitals in Australia for funding, insurance claims processing and research. The task of assigning a code to an episode of care is a manual process. This has posed challenges due to increase set of codes, the complexity of care episodes, and large training and recruitment costs of clinical coders. Use of Natural Language Processing (NLP) and Machine Learning (ML) techniques is considered as a solution to this problem. This paper carries out a comparative analysis on a selected set of NLP and ML techniques to identify the most efficient algorithm for clinical coding based on a set of standard metrics: precision, recall, F-score, accuracy, Hamming loss and Jaccard similarity.
Collapse
|
36
|
Abstract
This article reports two ERP studies that exploited the classifier system of Mandarin Chinese to investigate semantic prediction. In Mandarin, in certain contexts, a noun has to be preceded by a classifier, which has to match the noun in semantically-defined features. In both experiments, an N400 effect was elicited in response to a classifier that mismatched an up-coming predictable noun, relative to a matching classifier. Among the mismatching classifiers, the N400 effect was graded, being smaller for classifiers that were semantically related to the predicted word, relative to classifiers that were semantically unrelated to the predicted word. Given that the classifier occurred before the predicted word, this result shows that fine-grained semantic features of nouns can be pre-activated in advance of bottom-up input. The studies thus extend previous findings based on a more restricted range of highly grammaticalized features such as gender or animacy in Indo-European languages (Szewczyk & Schriefers, 2013; Van Berkum, Brown, Zwitserlood, Kooijman, & Hagoort, 2005; Wicha, Bates, Moreno, & Kutas, 2003).
Collapse
|
37
|
Detection of Cardiac Abnormalities from Multilead ECG using Multiscale Phase Alternation Features. J Med Syst 2016; 40:143. [PMID: 27118009 DOI: 10.1007/s10916-016-0505-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Accepted: 04/19/2016] [Indexed: 11/30/2022]
Abstract
The cardiac activities such as the depolarization and the relaxation of atria and ventricles are observed in electrocardiogram (ECG). The changes in the morphological features of ECG are the symptoms of particular heart pathology. It is a cumbersome task for medical experts to visually identify any subtle changes in the morphological features during 24 hours of ECG recording. Therefore, the automated analysis of ECG signal is a need for accurate detection of cardiac abnormalities. In this paper, a novel method for automated detection of cardiac abnormalities from multilead ECG is proposed. The method uses multiscale phase alternation (PA) features of multilead ECG and two classifiers, k-nearest neighbor (KNN) and fuzzy KNN for classification of bundle branch block (BBB), myocardial infarction (MI), heart muscle defect (HMD) and healthy control (HC). The dual tree complex wavelet transform (DTCWT) is used to decompose the ECG signal of each lead into complex wavelet coefficients at different scales. The phase of the complex wavelet coefficients is computed and the PA values at each wavelet scale are used as features for detection and classification of cardiac abnormalities. A publicly available multilead ECG database (PTB database) is used for testing of the proposed method. The experimental results show that, the proposed multiscale PA features and the fuzzy KNN classifier have better performance for detection of cardiac abnormalities with sensitivity values of 78.12 %, 80.90 % and 94.31 % for BBB, HMD and MI classes. The sensitivity value of proposed method for MI class is compared with the state-of-art techniques from multilead ECG.
Collapse
|
38
|
Aid decision algorithms to estimate the risk in congenital heart surgery. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2016; 126:118-127. [PMID: 26774238 DOI: 10.1016/j.cmpb.2015.12.021] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/04/2015] [Revised: 12/01/2015] [Accepted: 12/16/2015] [Indexed: 06/05/2023]
Abstract
BACKGROUND AND OBJECTIVE In this paper, we have tested the suitability of using different artificial intelligence-based algorithms for decision support when classifying the risk of congenital heart surgery. In this sense, classification of those surgical risks provides enormous benefits as the a priori estimation of surgical outcomes depending on either the type of disease or the type of repair, and other elements that influence the final result. This preventive estimation may help to avoid future complications, or even death. METHODS We have evaluated four machine learning algorithms to achieve our objective: multilayer perceptron, self-organizing map, radial basis function networks and decision trees. The architectures implemented have the aim of classifying among three types of surgical risk: low complexity, medium complexity and high complexity. RESULTS Accuracy outcomes achieved range between 80% and 99%, being the multilayer perceptron method the one that offered a higher hit ratio. CONCLUSIONS According to the results, it is feasible to develop a clinical decision support system using the evaluated algorithms. Such system would help cardiology specialists, paediatricians and surgeons to forecast the level of risk related to a congenital heart disease surgery.
Collapse
|
39
|
Iteratively refining breast cancer intrinsic subtypes in the METABRIC dataset. BioData Min 2016; 9:2. [PMID: 26770261 PMCID: PMC4712506 DOI: 10.1186/s13040-015-0078-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2015] [Accepted: 12/25/2015] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND Multi-gene lists and single sample predictor models have been currently used to reduce the multidimensional complexity of breast cancers, and to identify intrinsic subtypes. The perceived inability of some models to deal with the challenges of processing high-dimensional data, however, limits the accurate characterisation of these subtypes. Towards the development of robust strategies, we designed an iterative approach to consistently discriminate intrinsic subtypes and improve class prediction in the METABRIC dataset. FINDINGS In this study, we employed the CM1 score to identify the most discriminative probes for each group, and an ensemble learning technique to assess the ability of these probes on assigning subtype labels using 24 different classifiers. Our analysis is comprised of an iterative computation of these methods and statistical measures performed on a set of over 2000 samples. The refined labels assigned using this iterative approach revealed to be more consistent and in better agreement with clinicopathological markers and patients' overall survival than those originally provided by the PAM50 method. CONCLUSIONS The assignment of intrinsic subtypes has a significant impact in translational research for both understanding and managing breast cancer. The refined labelling, therefore, provides more accurate and reliable information by improving the source of fundamental science prior to clinical applications in medicine.
Collapse
|
40
|
A strategy focused on MAPT, APP, NCSTN and BACE1 to build blood classifiers for Alzheimer's disease. J Theor Biol 2015; 376:32-8. [PMID: 25863267 DOI: 10.1016/j.jtbi.2015.03.039] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2014] [Revised: 02/21/2015] [Accepted: 03/31/2015] [Indexed: 11/23/2022]
Abstract
BACKGROUND Although Alzheimer's disease (AD) is a brain disorder, a number of peripheral alterations have been found in these patients, including differences in leukocyte gene expression; however, the key genes involved in plaque and tangle formation have shown a relatively small potential as diagnostic markers. We focused on MAPT, APP, NCSTN and BACE1 as the basis to build and compare blood classifiers for AD. METHODS We used a combined model to build disease classifiers, using measures of blood pressure and serum glucose, cholesterol and triglyceride levels as well as RT-PCR expression levels of APP, NCSTN and BACE1 in peripheral blood mononuclear cells (PBMCs) from an independent cohort of 36 individuals of cognitively-normal controls, AD and other neuropathologies. Also, a set of genes was carefully selected by molecular interactions with MAPT, APP, NCSTN and BACE1 to test an expression-based classifier in a public microarray dataset of 40 samples (AD and controls). A series of discriminant analyses and classification and regression trees (C&RTs) were used to perform classification tasks. RESULTS Using C&RTs, the combined model showed potential to differentially diagnose AD with up to 94.4% accuracy and 100% specificity for our independent sample. Furthermore, a subset of 16 genes showed the best diagnostic potential using a minimum number of expression variables, correctly classifying up to 100% of samples in the public dataset. CONCLUSIONS Our unique method of variable selection proves that even elements showing no significant differences between controls and AD, but that have somehow been linked to AD or AD-related elements, still hold a potential to be used in its diagnosis. Sample size and inherent methodological limitations of this study need to be kept in mind. Our classifiers require careful further testing in larger cohorts. Nonetheless, we believe these results provide evidence for the utility of our innovative method, which contributes a different approach to generate promising diagnostic tools for neuropsychiatric disorders.
Collapse
|
41
|
Watching language grow in the manual modality: nominals, predicates, and handshapes. Cognition 2015; 136:381-95. [PMID: 25546342 PMCID: PMC4308574 DOI: 10.1016/j.cognition.2014.11.029] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2013] [Revised: 11/09/2014] [Accepted: 11/17/2014] [Indexed: 11/18/2022]
Abstract
All languages, both spoken and signed, make a formal distinction between two types of terms in a proposition--terms that identify what is to be talked about (nominals) and terms that say something about this topic (predicates). Here we explore conditions that could lead to this property by charting its development in a newly emerging language--Nicaraguan Sign Language (NSL). We examine how handshape is used in nominals vs. predicates in three Nicaraguan groups: (1) homesigners who are not part of the Deaf community and use their own gestures, called homesigns, to communicate; (2) NSL cohort 1 signers who fashioned the first stage of NSL; (3) NSL cohort 2 signers who learned NSL from cohort 1. We compare these three groups to a fourth: (4) native signers of American Sign Language (ASL), an established sign language. We focus on handshape in predicates that are part of a productive classifier system in ASL; handshape in these predicates varies systematically across agent vs. no-agent contexts, unlike handshape in the nominals we study, which does not vary across these contexts. We found that all four groups, including homesigners, used handshape differently in nominals vs. predicates--they displayed variability in handshape form across agent vs. no-agent contexts in predicates, but not in nominals. Variability thus differed in predicates and nominals: (1) In predicates, the variability across grammatical contexts (agent vs. no-agent) was systematic in all four groups, suggesting that handshape functioned as a productive morphological marker on predicate signs, even in homesign. This grammatical use of handshape can thus appear in the earliest stages of an emerging language. (2) In nominals, there was no variability across grammatical contexts (agent vs. no-agent), but there was variability within- and across-individuals in the handshape used in the nominal for a particular object. This variability was striking in homesigners (an individual homesigner did not necessarily use the same handshape in every nominal he produced for a particular object), but decreased in the first cohort of NSL and remained relatively constant in the second cohort. Stability in the lexical use of handshape in nominals thus does not seem to emerge unless there is pressure from a peer linguistic community. Taken together, our findings argue that a community of users is essential to arrive at a stable nominal lexicon, but not to establish a productive morphological marker in predicates. Examining the steps a manual communication system takes as it moves toward becoming a fully-fledged language offers a unique window onto factors that have made human language what it is.
Collapse
|
42
|
Signers and co-speech gesturers adopt similar strategies for portraying viewpoint in narratives. Top Cogn Sci 2014; 7:12-35. [PMID: 25348839 DOI: 10.1111/tops.12120] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2013] [Revised: 03/04/2014] [Accepted: 05/02/2014] [Indexed: 11/29/2022]
Abstract
Gestural viewpoint research suggests that several dimensions determine which perspective a narrator takes, including properties of the event described. Events can evoke gestures from the point of view of a character (CVPT), an observer (OVPT), or both perspectives. CVPT and OVPT gestures have been compared to constructed action (CA) and classifiers (CL) in signed languages. We ask how CA and CL, as represented in ASL productions, compare to previous results for CVPT and OVPT from English-speaking co-speech gesturers. Ten ASL signers described cartoon stimuli from Parrill (2010). Events shown by Parrill to elicit a particular gestural strategy (CVPT, OVPT, both) were coded for signers' instances of CA and CL. CA was divided into three categories: CA-torso, CA-affect, and CA-handling. Signers used CA-handling the most when gesturers used CVPT exclusively. Additionally, signers used CL the most when gesturers used OVPT exclusively and CL the least when gesturers used CVPT exclusively.
Collapse
|
43
|
From gesture to sign language: conventionalization of classifier constructions by adult hearing learners of British Sign Language. Top Cogn Sci 2014; 7:61-80. [PMID: 25329326 DOI: 10.1111/tops.12118] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2013] [Revised: 10/28/2014] [Accepted: 02/17/2014] [Indexed: 11/30/2022]
Abstract
There has long been interest in why languages are shaped the way they are, and in the relationship between sign language and gesture. In sign languages, entity classifiers are handshapes that encode how objects move, how they are located relative to one another, and how multiple objects of the same type are distributed in space. Previous studies have shown that hearing adults who are asked to use only manual gestures to describe how objects move in space will use gestures that bear some similarities to classifiers. We investigated how accurately hearing adults, who had been learning British Sign Language (BSL) for 1-3 years, produce and comprehend classifiers in (static) locative and distributive constructions. In a production task, learners of BSL knew that they could use their hands to represent objects, but they had difficulty choosing the same, conventionalized, handshapes as native signers. They were, however, highly accurate at encoding location and orientation information. Learners therefore show the same pattern found in sign-naïve gesturers. In contrast, handshape, orientation, and location were comprehended with equal (high) accuracy, and testing a group of sign-naïve adults showed that they too were able to understand classifiers with higher than chance accuracy. We conclude that adult learners of BSL bring their visuo-spatial knowledge and gestural abilities to the tasks of understanding and producing constructions that contain entity classifiers. We speculate that investigating the time course of adult sign language acquisition might shed light on how gesture became (and, indeed, becomes) conventionalized during the genesis of sign languages.
Collapse
|
44
|
Processing classifier-noun agreement in a long distance: an ERP study on Mandarin Chinese. BRAIN AND LANGUAGE 2014; 137:14-28. [PMID: 25151544 DOI: 10.1016/j.bandl.2014.07.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2013] [Revised: 06/05/2014] [Accepted: 07/15/2014] [Indexed: 06/03/2023]
Abstract
The classifier system categorizes nouns on a semantic basis. By inserting an object-gap relative clause (RC) between a classifier and its associate noun, we examined how temporary classifier-noun semantic incongruity and long-distance classifier-noun dependency are processed. Instead of a typical N400 effect, a midline anterior negativity was elicited by the temporary semantic incongruity, suggesting that the anticipation of coming words influences semantic processing and that metacognitive processes are involved in resolving the conflict. The lack of reduced P600 effects at the RC marker suggests that classifier-noun mismatch may not be effective in RC prediction. The N400 observed at the head noun suggests that the parser retains the temporary incongruity in the memory and computes the classifier-noun semantic agreement over a long distance. In addition, both successful and unsuccessful long-distance integration elicited P600 effects, supporting the view that P600 indexes more than just syntactic processing. Detailed discussion and implications are provided.
Collapse
|
45
|
Overfitting in prediction models - is it a problem only in high dimensions? Contemp Clin Trials 2013; 36:636-41. [PMID: 23811117 DOI: 10.1016/j.cct.2013.06.011] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2013] [Revised: 05/29/2013] [Accepted: 06/17/2013] [Indexed: 02/07/2023]
Abstract
The growing recognition that human diseases are molecularly heterogeneous has stimulated interest in the development of prognostic and predictive classifiers for patient selection and stratification. In the process of classifier development, it has been repeatedly emphasized that in situations where the number of candidate predictor variables is much larger than the number of observations, the apparent (training set, resubstitution) accuracy of the classifiers can be highly optimistically biased and hence, classification accuracy should be reported based on evaluation of the classifier on a separate test set or using complete cross-validation. Such evaluation methods have however not been the norm in the case of low-dimensional, p<n data that arise, for example, in clinical trials when a classifier is developed on a combination of clinico-pathological variables and a small number of genetic biomarkers selected from an understanding of the biology of the disease. We undertook simulation studies to investigate the existence and extent of the problem of overfitting with low-dimensional data. The results indicate that overfitting can be a serious problem even for low-dimensional data, especially if the relationship of outcome to the set of predictor variables is not strong. We hence encourage the adoption of either a separate test set or complete cross-validation to evaluate classifier accuracy, even when the number of candidate predictor variables is substantially smaller than the number of cases.
Collapse
|
46
|
Learning that classifiers count: Mandarin-speaking children's acquisition of sortal and mensural classifiers. JOURNAL OF EAST ASIAN LINGUISTICS 2010; 19:207-230. [PMID: 23532340 PMCID: PMC3606901 DOI: 10.1007/s10831-010-9060-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Two experiments explored two-to five-year-old Mandarin-speaking children's acquisition of classifiers, mandatory morphemes for expressing quantities in many Asian languages. Classifiers are similar to measure words in English (e.g., a piece of apple; a cup of apples), with the main difference being that classifiers are also required when counting sortals (e.g., yi ge pinguo or "one unit apple" in Mandarin means "one apple"). The current study extended prior studies (e.g., Chien et al., J East Asian Linguist 12:91-120, 2003) to examine Mandarin-speaking children's understanding of classifiers as indicating units of quantification. Children were also tested on their knowledge of numerals to assess the relationship between children's acquisition of numerals and classifiers. The findings suggest that children first notice that sortal classifiers specify properties such as shape. Only after learning some numerals do they begin to work out how classifiers indicate units of quantification. By age four, children scored above chance on most classifiers tested.
Collapse
|