201
|
|
202
|
Fedačko J, Pella D, Gavurová B, Koróny S. Influence of Demographic Determinants on the Number of Deaths Caused by Circulatory System Diseases in Comparison to the Number of Deaths Caused by Neoplasms in Slovak Regions from 1996-2014. Cent Eur J Public Health 2018. [PMID: 29524373 DOI: 10.21101/cejph.a5053] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
OBJECTIVES The objective of our study was to evaluate the influence of available demographic determinants on the number of deaths caused by circulatory system diseases as compared to deaths caused by neoplasms in Slovakia in 1996-2014. METHODS Mortality data were kindly provided by the National Health Information Centre in Slovakia. The first method was trend curve fitting of death ratios caused by circulatory system diseases (Chapter IX) and of deaths caused by neoplasms (Chapter II) as a function of age for both sexes. The second method comprised a decision tree for classification between deaths caused by Chapter IX and Chapter II diseases. Input variables were available demographic indicators: age, sex, marital status, region, and calendar year of death. Statistical data analyses were performed by IBM SPSS version 19 statistical software. RESULTS We found that the odds ratios of deaths caused by circulatory system diseases (Chapter IX) in comparison with deaths caused by neoplasms (Chapter II) were non-decreasing. At first, the values of odds ratios are constant until they reach a critical sex-dependent value with a subsequent steady increase. In the case of men the odds ratio was greater than in the 60 years age-group where the odds ratio value increased slowly (from 1.14 at age 60 to 7.25 at age 90 years). The relative increase was 6.36 (7.25/1.14). The odds ratio in the women group was smaller but increased more rapidly (from 0.81 at age 60 to 12.27 at age 90 years). The relative increase was 15.15 in women (12.27/0.81). Hence, the odds ratio of death caused by Chapter IX diseases vs. Chapter II was greater in the older women group (i.e. higher age values). Utilizing the decision tree model, we have found that the most significant demographic determinant of death counts in both ICD Chapters was the age of the deceased, followed by marital status and finally gender. The last two predictors (year and region) were relatively negligible though formally significant. CONCLUSIONS The proposed method could be useful for prognostic classification of patients and primarily beneficial for hospitals in human or financial resources planning.
Collapse
Affiliation(s)
- Ján Fedačko
- 1st Department of Internal Medicine, Louis Pasteur University Hospital, Pavol Jozef Šafárik University in Košice, Košice, Slovak Republic
| | - Daniel Pella
- 1st Department of Internal Medicine, Louis Pasteur University Hospital, Pavol Jozef Šafárik University in Košice, Košice, Slovak Republic
| | - Beáta Gavurová
- Faculty of Economics, Technical University of Košice, Košice, Slovak Republic
| | - Samuel Koróny
- Research and Innovation Centre, Faculty of Economics, Matej Bel University, Banská Bystrica, Slovak Republic
| |
Collapse
|
203
|
A novel effective diagnosis model based on optimized least squares support machine for gene microarray. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2018.02.009] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
204
|
|
205
|
Das AK, Sengupta S, Bhattacharyya S. A group incremental feature selection for classification using rough set theory based genetic algorithm. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2018.01.040] [Citation(s) in RCA: 67] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
206
|
|
207
|
Lai CM. Multi-objective simplified swarm optimization with weighting scheme for gene selection. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2017.12.049] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
208
|
Distance-based classifier by data transformation for high-dimension, strongly spiked eigenvalue models. ANN I STAT MATH 2018. [DOI: 10.1007/s10463-018-0655-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
209
|
Li P, Zhou W, Huang X, Zhu X, Liu H, Ma T, Guo D, Yao D, Xu P. Improved Graph Embedding for Robust Recognition with outliers. Sci Rep 2018. [PMID: 29523793 PMCID: PMC5844917 DOI: 10.1038/s41598-018-22207-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Artifacts in biomedical signal recordings, such as gene expression, sonar image and electroencephalogram, have a great influence on the related research because the artifacts with large value usually break the neighbor structure in the datasets. However, the conventional graph embedding (GE) used for dimension reduction such as linear discriminant analysis, principle component analysis and locality preserving projections is essentially defined in the L2 norm space and is prone to the presence of artifacts, resulting in biased sub-structural features. In this work, we defined graph embedding in the L1 norm space and used the maximization strategy to solve this model with the aim of restricting the influence of outliers on the dimension reduction of signals. The quantitative evaluation with different outlier conditions demonstrates that an L1 norm-based GE structure can estimate hyperplanes, which are more stable than those of conventional GE-based methods. The applications to a variety of datasets also show that the proposed L1 GE is more robust to outlier influence with higher classification accuracy estimated. The proposed L1 GE may be helpful for capturing reliable mapping information from the datasets that have been contaminated with outliers.
Collapse
Affiliation(s)
- Peiyang Li
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for Neuroinformation, University of Electronic Science and Technology of China, Chengdu, China.,School of life Science and technology, center for information in medicine, University of Electronic Science and Technology of China, Chengdu, China
| | - Weiwei Zhou
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for Neuroinformation, University of Electronic Science and Technology of China, Chengdu, China.,School of life Science and technology, center for information in medicine, University of Electronic Science and Technology of China, Chengdu, China
| | - Xiaoye Huang
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for Neuroinformation, University of Electronic Science and Technology of China, Chengdu, China.,School of life Science and technology, center for information in medicine, University of Electronic Science and Technology of China, Chengdu, China
| | - Xuyang Zhu
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for Neuroinformation, University of Electronic Science and Technology of China, Chengdu, China.,School of life Science and technology, center for information in medicine, University of Electronic Science and Technology of China, Chengdu, China
| | - Huan Liu
- School of life Science and technology, center for information in medicine, University of Electronic Science and Technology of China, Chengdu, China
| | - Teng Ma
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for Neuroinformation, University of Electronic Science and Technology of China, Chengdu, China.,School of life Science and technology, center for information in medicine, University of Electronic Science and Technology of China, Chengdu, China
| | - Daqing Guo
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for Neuroinformation, University of Electronic Science and Technology of China, Chengdu, China.,School of life Science and technology, center for information in medicine, University of Electronic Science and Technology of China, Chengdu, China
| | - Dezhong Yao
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for Neuroinformation, University of Electronic Science and Technology of China, Chengdu, China.,School of life Science and technology, center for information in medicine, University of Electronic Science and Technology of China, Chengdu, China
| | - Peng Xu
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for Neuroinformation, University of Electronic Science and Technology of China, Chengdu, China. .,School of life Science and technology, center for information in medicine, University of Electronic Science and Technology of China, Chengdu, China.
| |
Collapse
|
210
|
Spurrier J, Shukla AK, McLinden K, Johnson K, Giniger E. Altered expression of the Cdk5 activator-like protein, Cdk5α, causes neurodegeneration, in part by accelerating the rate of aging. Dis Model Mech 2018; 11:dmm031161. [PMID: 29469033 PMCID: PMC5897722 DOI: 10.1242/dmm.031161] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Accepted: 02/02/2018] [Indexed: 12/11/2022] Open
Abstract
Aging is the greatest risk factor for neurodegeneration, but the connection between the two processes remains opaque. This is in part for want of a rigorous way to define physiological age, as opposed to chronological age. Here, we develop a comprehensive metric for physiological age in Drosophila, based on genome-wide expression profiling. We applied this metric to a model of adult-onset neurodegeneration, increased or decreased expression of the activating subunit of the Cdk5 protein kinase, encoded by the gene Cdk5α, the ortholog of mammalian p35. Cdk5α-mediated degeneration was associated with a 27-150% acceleration of the intrinsic rate of aging, depending on the tissue and genetic manipulation. Gene ontology analysis and direct experimental tests revealed that affected age-associated processes included numerous core phenotypes of neurodegeneration, including enhanced oxidative stress and impaired proteostasis. Taken together, our results suggest that Cdk5α-mediated neurodegeneration results from accelerated aging, in combination with cell-autonomous neuronal insults. These data fundamentally recast our picture of the relationship between neurodegeneration and its most prominent risk factor, natural aging.
Collapse
Affiliation(s)
- Joshua Spurrier
- National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 02892, USA
- The Johns Hopkins University/National Institutes of Health Graduate Partnership Program, National Institutes of Health, Bethesda, MD 02892, USA
| | - Arvind Kumar Shukla
- National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 02892, USA
| | - Kristina McLinden
- National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 02892, USA
| | - Kory Johnson
- National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 02892, USA
| | - Edward Giniger
- National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 02892, USA
| |
Collapse
|
211
|
Patrick E, Schramm SJ, Ormerod JT, Scolyer RA, Mann GJ, Mueller S, Yang JYH. A multi-step classifier addressing cohort heterogeneity improves performance of prognostic biomarkers in three cancer types. Oncotarget 2018; 8:2807-2815. [PMID: 27833072 PMCID: PMC5356843 DOI: 10.18632/oncotarget.13203] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2016] [Accepted: 09/26/2016] [Indexed: 02/06/2023] Open
Abstract
Cancer research continues to highlight the extensive genetic diversity that exists both between and within tumors. This intrinsic heterogeneity poses one of the central challenges to predicting patient clinical outcome and the personalization of treatments. Despite progress in some individual tumor types, it is not yet possible to prospectively, accurately classify patients by expected survival. One hypothesis proposed to explain this is that the prognostic classifiers developed to date are insufficiently sensitive and specific; however it is also possible that patients are not equally easy to classify by any given biomarker. We demonstrate in a cohort of 45 AJCC stage III melanoma patients that clinico-pathologic biomarkers can identify those patients that are most likely to be misclassified by a molecular biomarker. The process of modelling the classifiability of patients was then replicated in a cohort of 49 stage II breast cancer patients and 53 stage III colon cancer patients. A multi-step procedure incorporating this information not only improved classification accuracy but also indicated the specific clinical attributes that had made classification problematic in each cohort. These findings show that, even when cohorts are of moderate size, including features that explain the patient-specific performance of a prognostic biomarker in a classification framework can improve the modelling and estimation of survival.
Collapse
Affiliation(s)
- Ellis Patrick
- School of Mathematics and Statistics, The University of Sydney, Sydney, Australia.,Program in Translational NeuroPsychiatric Genomics, Institute for the Neurosciences, Department of Neurology, Brigham and Women's Hospital, Boston, MA, USA.,Harvard Medical School, Boston, MA, USA.,Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, USA.,Program in Translational NeuroPsychiatric Genomics, Institute for the Neurosciences, Department of Psychiatry, Brigham and Women's Hospital, Boston, MA, USA
| | - Sarah-Jane Schramm
- The Westmead Millennium Institute for Medical Research, The University of Sydney, Sydney, Australia.,Melanoma Institute Australia, The University of Sydney, Sydney, Australia
| | - John T Ormerod
- School of Mathematics and Statistics, The University of Sydney, Sydney, Australia.,ARC Centre of Excellence for Mathematical & Statistical Frontiers
| | - Richard A Scolyer
- Tissue Pathology and Diagnostic Oncology, Royal Prince Alfred Hospital, Sydney, Australia.,Discipline Pathology, Sydney Medical School, The University of Sydney, Sydney, Australia
| | - Graham J Mann
- The Westmead Millennium Institute for Medical Research, The University of Sydney, Sydney, Australia.,Melanoma Institute Australia, The University of Sydney, Sydney, Australia
| | - Samuel Mueller
- School of Mathematics and Statistics, The University of Sydney, Sydney, Australia
| | - Jean Y H Yang
- School of Mathematics and Statistics, The University of Sydney, Sydney, Australia.,Melanoma Institute Australia, The University of Sydney, Sydney, Australia
| |
Collapse
|
212
|
Blagus R, Goeman JJ. What (not) to expect when classifying rare events. Brief Bioinform 2018; 19:341-349. [PMID: 27881432 DOI: 10.1093/bib/bbw107] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Indexed: 01/03/2025] Open
Abstract
When building classifiers, it is natural to require that the classifier correctly estimates the event probability (Constraint 1), that it has equal sensitivity and specificity (Constraint 2) or that it has equal positive and negative predictive values (Constraint 3). We prove that in the balanced case, where there is equal proportion of events and non-events, any classifier that satisfies one of these constraints will always satisfy all. Such unbiasedness of events and non-events is much more difficult to achieve in the case of rare events, i.e. the situation in which the proportion of events is (much) smaller than 0.5. Here, we prove that it is impossible to meet all three constraints unless the classifier achieves perfect predictions. Any non-perfect classifier can only satisfy at most one constraint, and satisfying one constraint implies violating the other two constraints in a specific direction. Our results have implications for classifiers optimized using g-means or F1-measure, which tend to satisfy Constraints 2 and 1, respectively. Our results are derived from basic probability theory and illustrated with simulations based on some frequently used classifiers.
Collapse
Affiliation(s)
- Rok Blagus
- Univerza v Ljubljani Medicinska Fakulteta, Institute for Biostatistics and Medical Informatics, Leiden, The Netherlands
| | - Jelle J Goeman
- Leiden University Medical Center, Department of Medical Statistics and Bioinformatics, Leiden, The Netherlands
| |
Collapse
|
213
|
Crow M, Paul A, Ballouz S, Huang ZJ, Gillis J. Characterizing the replicability of cell types defined by single cell RNA-sequencing data using MetaNeighbor. Nat Commun 2018; 9:884. [PMID: 29491377 PMCID: PMC5830442 DOI: 10.1038/s41467-018-03282-0] [Citation(s) in RCA: 179] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Accepted: 02/02/2018] [Indexed: 12/19/2022] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) technology provides a new avenue to discover and characterize cell types; however, the experiment-specific technical biases and analytic variability inherent to current pipelines may undermine its replicability. Meta-analysis is further hampered by the use of ad hoc naming conventions. Here we demonstrate our replication framework, MetaNeighbor, that quantifies the degree to which cell types replicate across datasets, and enables rapid identification of clusters with high similarity. We first measure the replicability of neuronal identity, comparing results across eight technically and biologically diverse datasets to define best practices for more complex assessments. We then apply this to novel interneuron subtypes, finding that 24/45 subtypes have evidence of replication, which enables the identification of robust candidate marker genes. Across tasks we find that large sets of variably expressed genes can identify replicable cell types with high accuracy, suggesting a general route forward for large-scale evaluation of scRNA-seq data.
Collapse
Affiliation(s)
- Megan Crow
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, NY, 11724, USA
| | - Anirban Paul
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, NY, 11724, USA
| | - Sara Ballouz
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, NY, 11724, USA
| | - Z Josh Huang
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, NY, 11724, USA
| | - Jesse Gillis
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, NY, 11724, USA.
| |
Collapse
|
214
|
Ma Y, Ding Y, Zheng T. Feature subspace learning based on local point processes patterns. Stat Anal Data Min 2018. [DOI: 10.1002/sam.11368] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Yuting Ma
- Department of Statistics; Columbia University; New York New York
| | - Yuejing Ding
- Department of Statistics; Columbia University; New York New York
| | - Tian Zheng
- Department of Statistics; Columbia University; New York New York
| |
Collapse
|
215
|
Jung Y, Zhang H, Hu J. Transformed low-rank ANOVA models for high-dimensional variable selection. Stat Methods Med Res 2018; 28:1230-1246. [DOI: 10.1177/0962280217753726] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
High-dimensional data are often encountered in biomedical, environmental, and other studies. For example, in biomedical studies that involve high-throughput omic data, an important problem is to search for genetic variables that are predictive of a particular phenotype. A conventional solution is to characterize such relationships through regression models in which a phenotype is treated as the response variable and the variables are treated as covariates; this approach becomes particularly challenging when the number of variables exceeds the number of samples. We propose a general framework for expressing the transformed mean of high-dimensional variables in an exponential distribution family via ANOVA models in which a low-rank interaction space captures the association between the phenotype and the variables. This alternative method transforms the variable selection problem into a well-posed problem with the number of observations larger than the number of variables. In addition, we propose a model selection criterion for the new model framework with a diverging number of parameters, and establish the consistency of the selection criterion. We demonstrate the appealing performance of the proposed method in terms of prediction and detection accuracy through simulations and real data analyses.
Collapse
Affiliation(s)
- Yoonsuh Jung
- Department of Statistics, Korea University, Seoul, South Korea
| | - Hong Zhang
- Institute of Biostatistics, Fudan University, Shanghai, People’s Republic of China
| | - Jianhua Hu
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
216
|
Suhail Z, Denton ERE, Zwiggelaar R. Classification of micro-calcification in mammograms using scalable linear Fisher discriminant analysis. Med Biol Eng Comput 2018; 56:1475-1485. [PMID: 29368264 PMCID: PMC6061516 DOI: 10.1007/s11517-017-1774-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2017] [Accepted: 12/13/2017] [Indexed: 11/28/2022]
Abstract
Breast cancer is one of the major causes of death in women. Computer Aided Diagnosis (CAD) systems are being developed to assist radiologists in early diagnosis. Micro-calcifications can be an early symptom of breast cancer. Besides detection, classification of micro-calcification as benign or malignant is essential in a complete CAD system. We have developed a novel method for the classification of benign and malignant micro-calcification using an improved Fisher Linear Discriminant Analysis (LDA) approach for the linear transformation of segmented micro-calcification data in combination with a Support Vector Machine (SVM) variant to classify between the two classes. The results indicate an average accuracy equal to 96% which is comparable to state-of-the art methods in the literature. Classification of Micro-calcification in Mammograms using Scalable Linear Fisher Discriminant Analysis ![]()
Collapse
|
217
|
Tadesse DG, Carpenter M. A method for selecting the relevant dimensions for high-dimensional classification in singular vector spaces. ADV DATA ANAL CLASSI 2018. [DOI: 10.1007/s11634-018-0311-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
218
|
Abstract
SummaryBackground: Multi-class molecular cancer classification has great potential clinical implications. Such applications require statistical methods to accurately classify cancer types with a small subset of genes from thousands of genes in the data.Objectives: This paper presents a new functional gradient descent boosting algorithm that directly extends the HingeBoost algorithm from the binary case to the multi-class case without reducing the original problem to multiple binary problems.Methods: Minimizing a multi-class hinge loss with boosting technique, the proposed Hinge-Boost has good theoretical properties by implementing the Bayes decision rule and providing a unifying framework with either equal or unequal misclassification costs. Furthermore, we propose Twin HingeBoost which has better feature selection behavior than Hinge-Boost by reducing the number of ineffective covariates. Simulated data, benchmark data and two cancer gene expression data sets are utilized to evaluate the performance of the proposed approach.Results: Simulations and the benchmark data showed that the multi-class HingeBoost generated accurate predictions when compared with the alternative methods, especially with high-dimensional covariates. The multi-class Hinge-Boost also produced more accurate prediction or comparable prediction in two cancer classification problems using gene expression data.Conclusions: This work has shown that the HingeBoost provides a powerful tool for multi-classification problems. In many applications, the classification accuracy and feature selection behavior can be further improved when using Twin HingeBoost.
Collapse
|
219
|
Sauer S, Buettner R, Heidenreich T, Lemke J, Berg C, Kurz C. Mindful Machine Learning. EUROPEAN JOURNAL OF PSYCHOLOGICAL ASSESSMENT 2018. [DOI: 10.1027/1015-5759/a000312] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Abstract. Mindfulness refers to a stance of nonjudgmental awareness of present-moment experiences. A growing body of research suggests that mindfulness may increase cognitive resources, thereby buffering stress. However, existing models have not achieved a consensus on how mindfulness should be operationalized. As the sound measurement of mindfulness is the foundation needed before substantial hypotheses can be supported, we propose a novel way of gauging the psychometric quality of a mindfulness measurement instrument (the Freiburg Mindfulness Inventory; FMI). Specifically, we employed 10 predictive algorithms to scrutinize the measurement quality of the FMI. Our criterion of measurement quality was the degree to which an algorithm separated mindfulness practitioner from nonpractitioners in a sample of N = 276. A high predictive accuracy of class membership can be taken as an indicator of the psychometric quality of the instrument. In sum, two findings are of interest. First, over and above some items of the FMI were able to reliably predict class membership. However, some items appeared to be uninformative. Second, from an applied methodological point of view, it appears that machine learning algorithms can outperform traditional predictive methods such as logistic regression. This finding may generalize to other branches of research.
Collapse
Affiliation(s)
- Sebastian Sauer
- Institute of Business Psychology/ Institute of Management & Information Systems, FOM University of Applied Sciences, Munich, Germany
- Brain, Mind, and Healing Program, Samueli Institute, Alexandria, VA, USA
| | - Ricardo Buettner
- Institute of Business Psychology/ Institute of Management & Information Systems, FOM University of Applied Sciences, Munich, Germany
| | - Thomas Heidenreich
- Faculty of Social Work, Health, and Nursing, Esslingen University of Applied Sciences, Germany
| | - Jana Lemke
- Institute of Transcultural Health Sciences, Viadrina University, Frankfurt/Oder, Germany
| | - Christoph Berg
- Institute of Business Psychology/ Institute of Management & Information Systems, FOM University of Applied Sciences, Munich, Germany
| | - Christoph Kurz
- Institute of Health Economics and Health Care Managemant, Helmholtz Zentrum München, Germany
| |
Collapse
|
220
|
Abstract
Most drugs produce their phenotypic effects by interacting with target proteins, and understanding the molecular features that underpin drug-target interactions is crucial when designing a novel drug. In this chapter, we introduce the protocols that have driven recent advances in sparse modeling methods for analyzing drug-target interaction networks within a chemogenomic framework. In this approach, the chemical structures of candidate drug compounds are correlated with the genomic sequences of the candidate target proteins. We demonstrate the use of sparse canonical correspondence analysis and sparsity-induced binary classifiers to extract the underlying molecular features that are most strongly involved in drug-target interactions. We focus on drug chemical substructures and protein domains. Workflows for applying these methods are presented, and an application is described in detail. We consider the characteristics of each method and suggest possible directions for future research.
Collapse
|
221
|
Wang C, Jiang B. On the dimension effect of regularized linear discriminant analysis. Electron J Stat 2018. [DOI: 10.1214/18-ejs1469] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
222
|
Malegori C, Grassi S, Ohm JB, Anderson J, Marti A. GlutoPeak profile analysis for wheat classification: Skipping the refinement process. J Cereal Sci 2018. [DOI: 10.1016/j.jcs.2017.09.005] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
223
|
Zhou Y, Wang J, Zhao Y, Tong T. Discriminant Analysis and Normalization Methods for Next-Generation Sequencing Data. NEW FRONTIERS OF BIOSTATISTICS AND BIOINFORMATICS 2018. [DOI: 10.1007/978-3-319-99389-8_18] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|
224
|
|
225
|
Jiang H, Ching WK, Cheung WS, Hou W, Yin H. Hadamard Kernel SVM with applications for breast cancer outcome predictions. BMC SYSTEMS BIOLOGY 2017; 11:138. [PMID: 29322919 PMCID: PMC5763304 DOI: 10.1186/s12918-017-0514-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
BACKGROUND Breast cancer is one of the leading causes of deaths for women. It is of great necessity to develop effective methods for breast cancer detection and diagnosis. Recent studies have focused on gene-based signatures for outcome predictions. Kernel SVM for its discriminative power in dealing with small sample pattern recognition problems has attracted a lot attention. But how to select or construct an appropriate kernel for a specified problem still needs further investigation. RESULTS Here we propose a novel kernel (Hadamard Kernel) in conjunction with Support Vector Machines (SVMs) to address the problem of breast cancer outcome prediction using gene expression data. Hadamard Kernel outperform the classical kernels and correlation kernel in terms of Area under the ROC Curve (AUC) values where a number of real-world data sets are adopted to test the performance of different methods. CONCLUSIONS Hadamard Kernel SVM is effective for breast cancer predictions, either in terms of prognosis or diagnosis. It may benefit patients by guiding therapeutic options. Apart from that, it would be a valuable addition to the current SVM kernel families. We hope it will contribute to the wider biology and related communities.
Collapse
Affiliation(s)
- Hao Jiang
- Department of Mathematics, School of Information, Renmin University of China, No.59 Zhong Guan Cun Avenue, Hai Dian District, Beijing, 100872, China
| | - Wai-Ki Ching
- Department of Mathematics, The University of Hong Kong, Pokfulam Road, Hong Kong, Hong Kong
| | - Wai-Shun Cheung
- Department of Mathematics, The University of Hong Kong, Pokfulam Road, Hong Kong, Hong Kong
| | - Wenpin Hou
- Department of Mathematics, The University of Hong Kong, Pokfulam Road, Hong Kong, Hong Kong
| | - Hong Yin
- Department of Mathematics, School of Information, Renmin University of China, No.59 Zhong Guan Cun Avenue, Hai Dian District, Beijing, 100872, China.
| |
Collapse
|
226
|
Collaborative representation-based classification of microarray gene expression data. PLoS One 2017; 12:e0189533. [PMID: 29236759 PMCID: PMC5728509 DOI: 10.1371/journal.pone.0189533] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2017] [Accepted: 11/27/2017] [Indexed: 11/19/2022] Open
Abstract
Microarray technology is important to simultaneously express multiple genes over a number of time points. Multiple classifier models, such as sparse representation (SR)-based method, have been developed to classify microarray gene expression data. These methods allocate the gene data points to different clusters. In this paper, we propose a novel collaborative representation (CR)-based classification with regularized least square to classify gene data. First, the CR codes a testing sample as a sparse linear combination of all training samples and then classifies the testing sample by evaluating which class leads to the minimum representation error. This CR-based classification approach is remarkably less complex than traditional classification methods but leads to very competitive classification results. In addition, compressive sensing approach is adopted to project the high-dimensional gene expression dataset to a lower-dimensional space which nearly contains the whole information. This compression without loss is beneficial to reduce the computational load. Experiments to detect subtypes of diseases, such as leukemia and autism spectrum disorders, are performed by analyzing the gene expression. The results show that the proposed CR-based algorithm exhibits significantly higher stability and accuracy than the traditional classifiers, such as support vector machine algorithm.
Collapse
|
227
|
Lu Q, Qiao X. Sparse Fisher's linear discriminant analysis for partially labeled data. Stat Anal Data Min 2017. [DOI: 10.1002/sam.11367] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Qiyi Lu
- Department of Mathematical Sciences Binghamton University, State University of New York Binghamton New York 13902‐6000
| | - Xingye Qiao
- Department of Mathematical Sciences Binghamton University, State University of New York Binghamton New York 13902‐6000
| |
Collapse
|
228
|
Bogaert M, Ballings M, Hosten M, Van den Poel D. Identifying Soccer Players on Facebook Through Predictive Analytics. DECISION ANALYSIS 2017. [DOI: 10.1287/deca.2017.0354] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Affiliation(s)
| | - Michel Ballings
- Department of Business Analytics and Statistics, University of Tennessee, Knoxville, Tennessee 37996
| | - Martijn Hosten
- Department of Marketing, Ghent University, 9000 Ghent, Belgium
| | | |
Collapse
|
229
|
Zhou Y, Zhang B, Li G, Tong T, Wan X. GD-RDA: A New Regularized Discriminant Analysis for High-Dimensional Data. J Comput Biol 2017; 24:1099-1111. [DOI: 10.1089/cmb.2017.0029] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Affiliation(s)
- Yan Zhou
- College of Mathematics and Statistics, Institute of Statistical Sciences, Shenzhen University, ShenZhen, China
| | - Baoxue Zhang
- School of Statistics, Capital University of Economics and Business, Beijing, China
| | - Gaorong Li
- Beijing Institute for Scientific and Engineering Computing, Beijing University of Technology, Beijing, China
| | - Tiejun Tong
- Department of Mathematics, Hong Kong Baptist University, Hong Kong, China
| | - Xiang Wan
- Department of Computer Science, Hong Kong Baptist University, Hong Kong, China
| |
Collapse
|
230
|
A simultaneous testing of the mean vector and the covariance matrix among two populations for high-dimensional data. TEST-SPAIN 2017. [DOI: 10.1007/s11749-017-0567-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
231
|
Zararsiz G, Goksuluk D, Klaus B, Korkmaz S, Eldem V, Karabulut E, Ozturk A. voomDDA: discovery of diagnostic biomarkers and classification of RNA-seq data. PeerJ 2017; 5:e3890. [PMID: 29018623 PMCID: PMC5633036 DOI: 10.7717/peerj.3890] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Accepted: 09/14/2017] [Indexed: 12/03/2022] Open
Abstract
RNA-Seq is a recent and efficient technique that uses the capabilities of next-generation sequencing technology for characterizing and quantifying transcriptomes. One important task using gene-expression data is to identify a small subset of genes that can be used to build diagnostic classifiers particularly for cancer diseases. Microarray based classifiers are not directly applicable to RNA-Seq data due to its discrete nature. Overdispersion is another problem that requires careful modeling of mean and variance relationship of the RNA-Seq data. In this study, we present voomDDA classifiers: variance modeling at the observational level (voom) extensions of the nearest shrunken centroids (NSC) and the diagonal discriminant classifiers. VoomNSC is one of these classifiers and brings voom and NSC approaches together for the purpose of gene-expression based classification. For this purpose, we propose weighted statistics and put these weighted statistics into the NSC algorithm. The VoomNSC is a sparse classifier that models the mean-variance relationship using the voom method and incorporates voom’s precision weights into the NSC classifier via weighted statistics. A comprehensive simulation study was designed and four real datasets are used for performance assessment. The overall results indicate that voomNSC performs as the sparsest classifier. It also provides the most accurate results together with power-transformed Poisson linear discriminant analysis, rlog transformed support vector machines and random forests algorithms. In addition to prediction purposes, the voomNSC classifier can be used to identify the potential diagnostic biomarkers for a condition of interest. Through this work, statistical learning methods proposed for microarrays can be reused for RNA-Seq data. An interactive web application is freely available at http://www.biosoft.hacettepe.edu.tr/voomDDA/.
Collapse
Affiliation(s)
- Gokmen Zararsiz
- Department of Biostatistics, Erciyes University, Kayseri, Turkey.,Turcosa Analytics Solutions Ltd Co, Erciyes Teknopark 5, Kayseri, Turkey
| | - Dincer Goksuluk
- Turcosa Analytics Solutions Ltd Co, Erciyes Teknopark 5, Kayseri, Turkey.,Department of Biostatistics, Hacettepe University, Ankara, Turkey
| | - Bernd Klaus
- European Molecular Biology Laboratory, Heidelberg, Germany
| | - Selcuk Korkmaz
- Turcosa Analytics Solutions Ltd Co, Erciyes Teknopark 5, Kayseri, Turkey.,Department of Biostatistics, Trakya University, Edirne, Turkey
| | - Vahap Eldem
- Department of Biology, Istanbul University, Istanbul, Turkey
| | - Erdem Karabulut
- Department of Biostatistics, Hacettepe University, Ankara, Turkey
| | - Ahmet Ozturk
- Department of Biostatistics, Erciyes University, Kayseri, Turkey.,Turcosa Analytics Solutions Ltd Co, Erciyes Teknopark 5, Kayseri, Turkey
| |
Collapse
|
232
|
Peng Y, Li W, Liu Y. A Hybrid Approach for Biomarker Discovery from Microarray Gene Expression Data for Cancer Classification. Cancer Inform 2017. [DOI: 10.1177/117693510600200024] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Microarrays allow researchers to monitor the gene expression patterns for tens of thousands of genes across a wide range of cellular responses, phenotype and conditions. Selecting a small subset of discriminate genes from thousands of genes is important for accurate classification of diseases and phenotypes. Many methods have been proposed to find subsets of genes with maximum relevance and minimum redundancy, which can distinguish accurately between samples with different labels. To find the minimum subset of relevant genes is often referred as biomarker discovery. Two main approaches, filter and wrapper techniques, have been applied to biomarker discovery. In this paper, we conducted a comparative study of different biomarker discovery methods, including six filter methods and three wrapper methods. We then proposed a hybrid approach, FR-Wrapper, for biomarker discovery. The aim of this approach is to find an optimum balance between the precision of the biomarker discovery and the computation cost, by taking advantages of both filter method's efficiency and wrapper method's high accuracy. Our hybrid approach applies Fisher's ratio, a simple method easy to understand and implement, to filter out most of the irrelevant genes, then a wrapper method is employed to reduce the redundancy. The performance of FR-Wrapper approach is evaluated over four widely used microarray datasets. Analysis of experimental results reveals that the hybrid approach can achieve the goal of maximum relevance with minimum redundancy.
Collapse
Affiliation(s)
- Yanxiong Peng
- Laboratory for Bioinformatics and Medical Informatics, University of Texas at Dallas, Richardson, TX 75083-0688, U.S.A
- Department of Computer Science, University of Texas at Dallas, Richardson, TX 75083-0688, U.S.A
| | - Wenyuan Li
- Laboratory for Bioinformatics and Medical Informatics, University of Texas at Dallas, Richardson, TX 75083-0688, U.S.A
- Department of Computer Science, University of Texas at Dallas, Richardson, TX 75083-0688, U.S.A
| | - Ying Liu
- Laboratory for Bioinformatics and Medical Informatics, University of Texas at Dallas, Richardson, TX 75083-0688, U.S.A
- Department of Computer Science, University of Texas at Dallas, Richardson, TX 75083-0688, U.S.A
- Department of Molecular and Cell Biology, University of Texas at Dallas, Richardson, TX 75083-0688, U.S.A
| |
Collapse
|
233
|
Simon R, Lam A, Li MC, Ngan M, Menenzes S, Zhao Y. Analysis of Gene Expression Data Using BRB-Array Tools. Cancer Inform 2017. [DOI: 10.1177/117693510700300022] [Citation(s) in RCA: 452] [Impact Index Per Article: 56.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
BRB-ArrayTools is an integrated software system for the comprehensive analysis of DNA microarray experiments. It was developed by professional biostatisticians experienced in the design and analysis of DNA microarray studies and incorporates methods developed by leading statistical laboratories. The software is designed for use by biomedical scientists who wish to have access to state-of-the-art statistical methods for the analysis of gene expression data and to receive training in the statistical analysis of high dimensional data. The software provides the most extensive set of tools available for predictive classifier development and complete cross-validation. It offers extensive links to genomic websites for gene annotation and analysis tools for pathway analysis. An archive of over 100 datasets of published microarray data with associated clinical data is provided and BRB-ArrayTools automatically imports data from the Gene Expression Omnibus public archive at the National Center for Biotechnology Information.
Collapse
Affiliation(s)
- Richard Simon
- Biometric Research Branch, National Cancer Institute, 9000 Rockville Pike, Bethesda MD
| | - Amy Lam
- Emmes Corporation, Rockville MD
| | | | | | | | - Yingdong Zhao
- Biometric Research Branch, National Cancer Institute, 9000 Rockville Pike, Bethesda MD
| |
Collapse
|
234
|
Kurian SM, Ferreri K, Wang CH, Todorov I, Al-Abdullah IH, Rawson J, Mullen Y, Salomon DR, Kandeel F. Gene expression signature predicts human islet integrity and transplant functionality in diabetic mice. PLoS One 2017; 12:e0185331. [PMID: 28968432 PMCID: PMC5624587 DOI: 10.1371/journal.pone.0185331] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2017] [Accepted: 09/11/2017] [Indexed: 11/18/2022] Open
Abstract
There is growing evidence that transplantation of cadaveric human islets is an effective therapy for type 1 diabetes. However, gauging the suitability of islet samples for clinical use remains a challenge. We hypothesized that islet quality is reflected in the expression of specific genes. Therefore, gene expression in 59 human islet preparations was analyzed and correlated with diabetes reversal after transplantation in diabetic mice. Analysis yielded 262 differentially expressed probesets, which together predict islet quality with 83% accuracy. Pathway analysis revealed that failing islet preparations activated inflammatory pathways, while functional islets showed increased regeneration pathway gene expression. Gene expression associated with apoptosis and oxygen consumption showed little overlap with each other or with the 262 probeset classifier, indicating that the three tests are measuring different aspects of islet cell biology. A subset of 36 probesets surpassed the predictive accuracy of the entire set for reversal of diabetes, and was further reduced by logistic regression to sets of 14 and 5 without losing accuracy. These genes were further validated with an independent cohort of 16 samples. We believe this limited number of gene classifiers in combination with other tests may provide complementary verification of islet quality prior to their clinical use.
Collapse
Affiliation(s)
- Sunil M. Kurian
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, California, United States of America
| | - Kevin Ferreri
- Department of Translational Research and Cellular Therapeutics, Diabetes, and Metabolism Research Institute, City of Hope National Medical Center, Duarte, California, United States of America
| | - Chia-Hao Wang
- Department of Translational Research and Cellular Therapeutics, Diabetes, and Metabolism Research Institute, City of Hope National Medical Center, Duarte, California, United States of America
| | - Ivan Todorov
- Department of Translational Research and Cellular Therapeutics, Diabetes, and Metabolism Research Institute, City of Hope National Medical Center, Duarte, California, United States of America
| | - Ismail H. Al-Abdullah
- Department of Translational Research and Cellular Therapeutics, Diabetes, and Metabolism Research Institute, City of Hope National Medical Center, Duarte, California, United States of America
| | - Jeffrey Rawson
- Department of Translational Research and Cellular Therapeutics, Diabetes, and Metabolism Research Institute, City of Hope National Medical Center, Duarte, California, United States of America
| | - Yoko Mullen
- Department of Translational Research and Cellular Therapeutics, Diabetes, and Metabolism Research Institute, City of Hope National Medical Center, Duarte, California, United States of America
| | - Daniel R. Salomon
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, California, United States of America
| | - Fouad Kandeel
- Department of Translational Research and Cellular Therapeutics, Diabetes, and Metabolism Research Institute, City of Hope National Medical Center, Duarte, California, United States of America
- * E-mail:
| |
Collapse
|
235
|
High dimensional covariance matrix estimation by penalizing the matrix-logarithm transformed likelihood. Comput Stat Data Anal 2017. [DOI: 10.1016/j.csda.2017.04.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
236
|
Ma T, Song C, Tseng GC. Discussant paper on ‘Statistical contributions to bioinformatics: Design, modelling, structure learning and integration’. STAT MODEL 2017. [DOI: 10.1177/1471082x17705992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Affiliation(s)
- Tianzhou Ma
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh Pittsburgh, PA, USA
| | - Chi Song
- Division of Biostatistics, College of Public Health, Ohio State University, Columbus, OH, USA
| | - George C. Tseng
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh Pittsburgh, PA, USA
| |
Collapse
|
237
|
Hu Z, Dong K, Dai W, Tong T. A Comparison of Methods for Estimating the Determinant of High-Dimensional Covariance Matrix. Int J Biostat 2017; 13:/j/ijb.ahead-of-print/ijb-2017-0013/ijb-2017-0013.xml. [PMID: 28953454 DOI: 10.1515/ijb-2017-0013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Accepted: 08/16/2017] [Indexed: 11/15/2022]
Abstract
The determinant of the covariance matrix for high-dimensional data plays an important role in statistical inference and decision. It has many real applications including statistical tests and information theory. Due to the statistical and computational challenges with high dimensionality, little work has been proposed in the literature for estimating the determinant of high-dimensional covariance matrix. In this paper, we estimate the determinant of the covariance matrix using some recent proposals for estimating high-dimensional covariance matrix. Specifically, we consider a total of eight covariance matrix estimation methods for comparison. Through extensive simulation studies, we explore and summarize some interesting comparison results among all compared methods. We also provide practical guidelines based on the sample size, the dimension, and the correlation of the data set for estimating the determinant of high-dimensional covariance matrix. Finally, from a perspective of the loss function, the comparison study in this paper may also serve as a proxy to assess the performance of the covariance matrix estimation.
Collapse
|
238
|
Omae Y, Takahashi H. Feature Selection Algorithm Considering Trial and Individual Differences for Machine Learning of Human Activity Recognition. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS 2017. [DOI: 10.20965/jaciii.2017.p0813] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In recent years, many studies have been performed on the automatic classification of human body motions based on inertia sensor data using a combination of inertia sensors and machine learning; training data is necessary where sensor data and human body motions correspond to one another. It can be difficult to conduct experiments involving a large number of subjects over an extended time period, because of concern for the fatigue or injury of subjects. Many studies, therefore, allow a small number of subjects to perform repeated body motions subject to classification, to acquire data on which to build training data. Any classifiers constructed using such training data will have some problems associated with generalization errors caused by individual and trial differences. In order to suppress such generalization errors, feature spaces must be obtained that are less likely to generate generalization errors due to individual and trial differences. To obtain such feature spaces, we require indices to evaluate the likelihood of the feature spaces generating generalization errors due to individual and trial errors. This paper, therefore, aims to devise such evaluation indices from the perspectives. The evaluation indices we propose in this paper can be obtained by first constructing acquired data probability distributions that represent individual and trial differences, and then using such probability distributions to calculate any risks of generating generalization errors. We have verified the effectiveness of the proposed evaluation method by applying it to sensor data for butterfly and breaststroke swimming. For the purpose of comparison, we have also applied a few available existing evaluation methods. We have constructed classifiers for butterfly and breaststroke swimming by applying a support vector machine to the feature spaces obtained by the proposed and existing methods. Based on the accuracy verification we conducted with test data, we found that the proposed method produced significantly higher F-measure than the existing methods. This proves that the use of the proposed evaluation indices enables us to obtain a feature space that is less likely to generate generalization errors due to individual and trial differences.
Collapse
|
239
|
Qaqish BF, O’Brien JJ, Hibbard JC, Clowers KJ. Accelerating high-dimensional clustering with lossless data reduction. Bioinformatics 2017; 33:2867-2872. [PMID: 28520900 PMCID: PMC5870568 DOI: 10.1093/bioinformatics/btx328] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Revised: 01/27/2017] [Accepted: 05/16/2017] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION For cluster analysis, high-dimensional data are associated with instability, decreased classification accuracy and high-computational burden. The latter challenge can be eliminated as a serious concern. For applications where dimension reduction techniques are not implemented, we propose a temporary transformation which accelerates computations with no loss of information. The algorithm can be applied for any statistical procedure depending only on Euclidean distances and can be implemented sequentially to enable analyses of data that would otherwise exceed memory limitations. RESULTS The method is easily implemented in common statistical software as a standard pre-processing step. The benefit of our algorithm grows with the dimensionality of the problem and the complexity of the analysis. Consequently, our simple algorithm not only decreases the computation time for routine analyses, it opens the door to performing calculations that may have otherwise been too burdensome to attempt. AVAILABILITY AND IMPLEMENTATION R, Matlab and SAS/IML code for implementing lossless data reduction is freely available in the Appendix. CONTACT obrienj@hms.harvard.edu.
Collapse
Affiliation(s)
- Bahjat F Qaqish
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | - Jonathan C Hibbard
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Katie J Clowers
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
240
|
Zhang L, Zhou W, Wang B, Zhang Z, Li F. Applying 1-norm SVM with squared loss to gene selection for cancer classification. APPL INTELL 2017. [DOI: 10.1007/s10489-017-1056-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
241
|
Clinical application of modified bag-of-features coupled with hybrid neural-based classifier in dengue fever classification using gene expression data. Med Biol Eng Comput 2017; 56:709-720. [PMID: 28891000 DOI: 10.1007/s11517-017-1722-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Accepted: 08/28/2017] [Indexed: 12/27/2022]
Abstract
Dengue fever detection and classification have a vital role due to the recent outbreaks of different kinds of dengue fever. Recently, the advancement in the microarray technology can be employed for such classification process. Several studies have established that the gene selection phase takes a significant role in the classifier performance. Subsequently, the current study focused on detecting two different variations, namely, dengue fever (DF) and dengue hemorrhagic fever (DHF). A modified bag-of-features method has been proposed to select the most promising genes in the classification process. Afterward, a modified cuckoo search optimization algorithm has been engaged to support the artificial neural (ANN-MCS) to classify the unknown subjects into three different classes namely, DF, DHF, and another class containing convalescent and normal cases. The proposed method has been compared with other three well-known classifiers, namely, multilayer perceptron feed-forward network (MLP-FFN), artificial neural network (ANN) trained with cuckoo search (ANN-CS), and ANN trained with PSO (ANN-PSO). Experiments have been carried out with different number of clusters for the initial bag-of-features-based feature selection phase. After obtaining the reduced dataset, the hybrid ANN-MCS model has been employed for the classification process. The results have been compared in terms of the confusion matrix-based performance measuring metrics. The experimental results indicated a highly statistically significant improvement with the proposed classifier over the traditional ANN-CS model.
Collapse
|
242
|
Boulesteix AL, Wilson R, Hapfelmeier A. Towards evidence-based computational statistics: lessons from clinical research on the role and design of real-data benchmark studies. BMC Med Res Methodol 2017; 17:138. [PMID: 28888225 PMCID: PMC5591542 DOI: 10.1186/s12874-017-0417-2] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Accepted: 08/31/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The goal of medical research is to develop interventions that are in some sense superior, with respect to patient outcome, to interventions currently in use. Similarly, the goal of research in methodological computational statistics is to develop data analysis tools that are themselves superior to the existing tools. The methodology of the evaluation of medical interventions continues to be discussed extensively in the literature and it is now well accepted that medicine should be at least partly "evidence-based". Although we statisticians are convinced of the importance of unbiased, well-thought-out study designs and evidence-based approaches in the context of clinical research, we tend to ignore these principles when designing our own studies for evaluating statistical methods in the context of our methodological research. MAIN MESSAGE In this paper, we draw an analogy between clinical trials and real-data-based benchmarking experiments in methodological statistical science, with datasets playing the role of patients and methods playing the role of medical interventions. Through this analogy, we suggest directions for improvement in the design and interpretation of studies which use real data to evaluate statistical methods, in particular with respect to dataset inclusion criteria and the reduction of various forms of bias. More generally, we discuss the concept of "evidence-based" statistical research, its limitations and its impact on the design and interpretation of real-data-based benchmark experiments. CONCLUSION We suggest that benchmark studies-a method of assessment of statistical methods using real-world datasets-might benefit from adopting (some) concepts from evidence-based medicine towards the goal of more evidence-based statistical research.
Collapse
Affiliation(s)
- Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Marchioninistr. 15, Munich, 81377, Germany.
| | - Rory Wilson
- Institute for Medical Information Processing, Biometry and Epidemiology, LMU Munich, Marchioninistr. 15, Munich, 81377, Germany
| | - Alexander Hapfelmeier
- Institute of Medical Statistics and Epidemiology, Technical University Munich, Ismaninger Str. 22, Munich, 81675, Germany
| |
Collapse
|
243
|
Chang SM, Tzeng JY, Chen RB. Fast Bayesian Variable Screenings for Binary Response Regressions with Small Sample Size. J STAT COMPUT SIM 2017; 87:2708-2723. [PMID: 29075047 PMCID: PMC5653235 DOI: 10.1080/00949655.2017.1341887] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2016] [Accepted: 06/09/2017] [Indexed: 10/19/2022]
Abstract
Screening procedures play an important role in data analysis, especially in high-throughput biological studies where the datasets consist of more covariates than independent subjects. In this article, a Bayesian screening procedure is introduced for the binary response models with logit and probit links. In contrast to many screening rules based on marginal information involving one or a few covariates, the proposed Bayesian procedure simultaneously models all covariates and uses closed-form screening statistics. Specifically, we use the posterior means of the regression coefficients as screening statistics; by imposing a generalized g-prior on the regression coefficients, we derive the analytical form of their posterior means and compute the screening statistics without Markov chain Monte Carlo implementation. We evaluate the utility of the proposed Bayesian screening method using simulations and real data analysis. When the sample size is small, the simulation results suggest improved performance with comparable computational cost.
Collapse
Affiliation(s)
- S.-M. Chang
- Department of Statistics, National Cheng Kung University, Tainan, Taiwan
| | - J.-Y. Tzeng
- Department of Statistics and Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
| | - R.-B. Chen
- Department of Statistics, National Cheng Kung University, Tainan, Taiwan
| |
Collapse
|
244
|
Aijun Y, Xuejun J, Liming X, Jinguan L. Sparse Bayesian variable selection in multinomial probit regression model with application to high-dimensional data classification. COMMUN STAT-THEOR M 2017. [DOI: 10.1080/03610926.2015.1122056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Yang Aijun
- College of Economics and Management, Nanjing Forestry University, Nanjing, China
- School of Economics and Management, Southeast University, Nanjing, China
| | - Jiang Xuejun
- Department of Mathematics, South University of Science and Technology of China, Shenzhen, China
| | - Xiang Liming
- School of Physical & Mathematical Sciences, Nanyang Technological University, Singapore
| | - Lin Jinguan
- Department of Mathematics, Southeast University, Nanjing, China
| |
Collapse
|
245
|
Tharwat A, Gaber T, Ibrahim A, Hassanien AE. Linear discriminant analysis: A detailed tutorial. AI COMMUN 2017. [DOI: 10.3233/aic-170729] [Citation(s) in RCA: 343] [Impact Index Per Article: 42.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Affiliation(s)
- Alaa Tharwat
- Department of Computer Science and Engineering, Frankfurt University of Applied Sciences, Frankfurt am Main, Germany
- Faculty of Engineering, Suez Canal University, Egypt. E-mail:
| | - Tarek Gaber
- Faculty of Computers and Informatics, Suez Canal University, Egypt. E-mail:
| | | | | |
Collapse
|
246
|
A Cancer Gene Selection Algorithm Based on the K-S Test and CFS. BIOMED RESEARCH INTERNATIONAL 2017; 2017:1645619. [PMID: 28567418 PMCID: PMC5439177 DOI: 10.1155/2017/1645619] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 01/12/2017] [Accepted: 04/06/2017] [Indexed: 11/18/2022]
Abstract
Background To address the challenging problem of selecting distinguished genes from cancer gene expression datasets, this paper presents a gene subset selection algorithm based on the Kolmogorov-Smirnov (K-S) test and correlation-based feature selection (CFS) principles. The algorithm selects distinguished genes first using the K-S test, and then, it uses CFS to select genes from those selected by the K-S test. Results We adopted support vector machines (SVM) as the classification tool and used the criteria of accuracy to evaluate the performance of the classifiers on the selected gene subsets. This approach compared the proposed gene subset selection algorithm with the K-S test, CFS, minimum-redundancy maximum-relevancy (mRMR), and ReliefF algorithms. The average experimental results of the aforementioned gene selection algorithms for 5 gene expression datasets demonstrate that, based on accuracy, the performance of the new K-S and CFS-based algorithm is better than those of the K-S test, CFS, mRMR, and ReliefF algorithms. Conclusions The experimental results show that the K-S test-CFS gene selection algorithm is a very effective and promising approach compared to the K-S test, CFS, mRMR, and ReliefF algorithms.
Collapse
|
247
|
Makinde OS, Adewumi AD. A comparison of depth functions in maximal depth classification rules. JOURNAL OF MODERN APPLIED STATISTICAL METHODS 2017. [DOI: 10.22237/jmasm/1493598120] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
|
248
|
Abstract
INTRODUCTION Placental explant culture is an important model for studying placental development and functions. We investigated the differences in placental gene expression in response to tissue culture, atmospheric and physiologic oxygen concentrations. METHODS Placental explants were collected from normal term (38-39 weeks of gestation) placentae with no previous uterine contractile activity. Placental transcriptomic expressions were evaluated with GeneChip® Human Genome U133 Plus 2.0 arrays (Affymetrix). RESULTS We uncovered sub-sets of genes that regulate response to stress, induction of apoptosis programmed cell death, mis-regulation of cell growth, proliferation, cell morphogenesis, tissue viability, and protection from apoptosis in cultured placental explants. We also identified a sub-set of genes with highly unstable pattern of expression after exposure to tissue culture. Tissue culture irrespective of oxygen concentration induced dichotomous increase in significant gene expression and increased enrichment of significant pathways and transcription factor targets (TFTs) including HIF1A. The effect was exacerbated by culture at atmospheric oxygen concentration, where further up-regulation of TFTs including PPARA, CEBPD, HOXA9 and down-regulated TFTs such as JUND/FOS suggest intrinsic heightened key biological and metabolic mechanisms such as glucose use, lipid biosynthesis, protein metabolism; apoptosis, inflammatory responses; and diminished trophoblast proliferation, differentiation, invasion, regeneration, and viability. DISCUSSION These findings demonstrate that gene expression patterns differ between pre-culture and cultured explants, and the gene expression of explants cultured at atmospheric oxygen concentration favours stressed, pro-inflammatory and increased apoptotic transcriptomic response.
Collapse
Affiliation(s)
- O Brew
- University of West London, Paragon House, Boston Manor Road, Brentford, Middlesex, TW8 9GA, UK.
| | - M H F Sullivan
- Institute of Reproductive & Developmental Biology, Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London, W12 0NN, UK
| |
Collapse
|
249
|
Fu G, Wang G, Dai X. An adaptive threshold determination method of feature screening for genomic selection. BMC Bioinformatics 2017; 18:212. [PMID: 28403836 PMCID: PMC5389084 DOI: 10.1186/s12859-017-1617-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2016] [Accepted: 03/28/2017] [Indexed: 11/10/2022] Open
Abstract
Background Although the dimension of the entire genome can be extremely large, only a parsimonious set of influential SNPs are correlated with a particular complex trait and are important to the prediction of the trait. Efficiently and accurately selecting these influential SNPs from millions of candidates is in high demand, but poses challenges. We propose a backward elimination iterative distance correlation (BE-IDC) procedure to select the smallest subset of SNPs that guarantees sufficient prediction accuracy, while also solving the unclear threshold issue for traditional feature screening approaches. Results Verified through six simulations, the adaptive threshold estimated by the BE-IDC performed uniformly better than fixed threshold methods that have been used in the current literature. We also applied BE-IDC to an Arabidopsis thaliana genome-wide data. Out of 216,130 SNPs, BE-IDC selected four influential SNPs, and confirmed the same FRIGIDA gene that was reported by two other traditional methods. Conclusions BE-IDC accommodates both the prediction accuracy and the computational speed that are highly demanded in the genomic selection. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1617-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Guifang Fu
- Department of Mathematics and Statistics, Utah State University, Logan, 84322, UT, USA.
| | - Gang Wang
- Department of Mathematics and Statistics, Utah State University, Logan, 84322, UT, USA
| | - Xiaotian Dai
- Department of Mathematics and Statistics, Utah State University, Logan, 84322, UT, USA
| |
Collapse
|
250
|
Anand D, Pandey B, Pandey DK. Facioscapulohumeral Muscular Dystrophy Diagnosis Using Hierarchical Clustering Algorithm and K-Nearest Neighbor Based Methodology. INTERNATIONAL JOURNAL OF E-HEALTH AND MEDICAL COMMUNICATIONS 2017. [DOI: 10.4018/ijehmc.2017040103] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The genetic diagnosis of neuromuscular disorder is an active area of research. Microarrays are used to detect the changes in genes for the accurate diagnosis. Unfortunately, the number of genes in gene expression data is very large as compared to number of samples. The number of genes needs to be reduced for correct diagnosis. In the present paper, the authors have made an intelligent integrated model for clustering and diagnosis of neuromuscular diseases. Wilcoxon signed rank test is used to preselect the genes. K-means and hierarchical clustering algorithms with different distance metric are employed to cluster the genes. Three classifiers namely linear discriminant analysis, quadratic discriminant analysis and k-nearest neighbor are used. For the employment of integrated techniques, a balanced facioscapulohumeral muscular dystrophy dataset is taken. A comparative analysis of the above integrated algorithms is presented which demonstrate that the integration of cosine distance metric hierarchical clustering algorithm with k-nearest neighbor has given the best performance measures.
Collapse
Affiliation(s)
- Divya Anand
- Department of Computer Science and Engineering, Lovely Professional University, Phagwara, India
| | - Babita Pandey
- Department of Computer Applications, Lovely Professional University, Phagwara, India
| | | |
Collapse
|