1
|
Gu B, Geng X, Li X, Shi W, Zheng G, Deng C, Huang H. Scalable Kernel Ordinal Regression via Doubly Stochastic Gradients. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:3677-3689. [PMID: 32857699 DOI: 10.1109/tnnls.2020.3015937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Ordinal regression (OR) is one of the most important machine learning tasks. The kernel method is a major technique to achieve nonlinear OR. However, traditional kernel OR solvers are inefficient due to increased complexity introduced by multiple ordinal thresholds as well as the cost of kernel computation. Doubly stochastic gradient (DSG) is a very efficient and scalable kernel learning algorithm that combines random feature approximation with stochastic functional optimization. However, the theory and algorithm of DSG can only support optimization tasks within the unique reproducing kernel Hilbert space (RKHS), which is not suitable for OR problems where the multiple ordinal thresholds usually lead to multiple RKHSs. To address this problem, we construct a kernel whose RKHS can contain the decision function with multiple thresholds. Based on this new kernel, we further propose a novel DSG-like algorithm, DSGOR. In each iteration of DSGOR, we update the decision functional as well as the function bias with appropriately set learning rates for each. Our theoretic analysis shows that DSGOR can achieve O(1/t) convergence rate, which is as good as DSG, even though dealing with a much harder problem. Extensive experimental results demonstrate that our algorithm is much more efficient than traditional kernel OR solvers, especially on large-scale problems.
Collapse
|
2
|
Abstract
The so-called proportional odds assumption is popular in cumulative, ordinal regression. In practice, however, such an assumption is sometimes too restrictive. For instance, when modeling the perception of boar taint on an individual level, it turns out that, at least for some subjects, the effects of predictors (androstenone and skatole) vary between response categories. For more flexible modeling, we consider the use of a ‘smooth-effects-on-response penalty’ (SERP) as a connecting link between proportional and fully non-proportional odds models, assuming that parameters of the latter vary smoothly over response categories. The usefulness of SERP is further demonstrated through a simulation study. Besides flexible and accurate modeling, SERP also enables fitting of parameters in cases where the pure, unpenalized non-proportional odds model fails to converge.
Collapse
|
3
|
Wang L, Zhu D. Tackling Ordinal Regression Problem for Heterogeneous Data: Sparse and Deep Multi-Task Learning Approaches. Data Min Knowl Discov 2021; 35:1134-1161. [PMID: 34054330 PMCID: PMC8153254 DOI: 10.1007/s10618-021-00746-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2019] [Accepted: 03/04/2021] [Indexed: 11/27/2022]
Abstract
Many real-world datasets are labeled with natural orders, i.e., ordinal labels. Ordinal regression is a method to predict ordinal labels that finds a wide range of applications in data-rich domains, such as natural, health and social sciences. Most existing ordinal regression approaches work well for independent and identically distributed (IID) instances via formulating a single ordinal regression task. However, for heterogeneous non-IID instances with well-defined local geometric structures, e.g., subpopulation groups, multi-task learning (MTL) provides a promising framework to encode task (subgroup) relatedness, bridge data from all tasks, and simultaneously learn multiple related tasks in efforts to improve generalization performance. Even though MTL methods have been extensively studied, there is barely existing work investigating MTL for heterogeneous data with ordinal labels. We tackle this important problem via sparse and deep multi-task approaches. Specifically, we develop a regularized multi-task ordinal regression (MTOR) model for smaller datasets and a deep neural networks based MTOR model for large-scale datasets. We evaluate the performance using three real-world healthcare datasets with applications to multi-stage disease progression diagnosis. Our experiments indicate that the proposed MTOR models markedly improve the prediction performance comparing with single-task ordinal regression models.
Collapse
Affiliation(s)
- Lu Wang
- Dept. of Computer Science, Wayne State University, Detroit, MI 48202
| | - Dongxiao Zhu
- Dept. of Computer Science, Wayne State University, Detroit, MI 48202
| |
Collapse
|
4
|
Yılmaz Ö, Çelik E, Çukur T. Informed feature regularization in voxelwise modeling for naturalistic fMRI experiments. Eur J Neurosci 2020; 52:3394-3410. [PMID: 32343012 PMCID: PMC9748846 DOI: 10.1111/ejn.14760] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Revised: 03/18/2020] [Accepted: 04/21/2020] [Indexed: 12/16/2022]
Abstract
Voxelwise modeling is a powerful framework to predict single-voxel functional selectivity for the stimulus features that exist in complex natural stimuli. Yet, because VM disregards potential correlations across stimulus features or neighboring voxels, it may yield suboptimal sensitivity in measuring functional selectivity in the presence of high levels of measurement noise. Here, we introduce a novel voxelwise modeling approach that simultaneously utilizes stimulus correlations in model features and response correlations among voxel neighborhoods. The proposed method performs feature and spatial regularization while still generating single-voxel response predictions. We demonstrated the performance of our approach on a functional magnetic resonance imaging dataset from a natural vision experiment. Compared to VM, the proposed method yields clear improvements in prediction performance, together with increased feature coherence and spatial coherence of voxelwise models. Overall, the proposed method can offer improved sensitivity in modeling of single voxels in naturalistic functional magnetic resonance imaging experiments.
Collapse
Affiliation(s)
- Özgür Yılmaz
- National Magnetic Resonance Research Center, Bilkent University, Ankara, Turkey,Department of Electrical and Electronics Engineering, Bilkent University, Ankara, Turkey
| | - Emin Çelik
- National Magnetic Resonance Research Center, Bilkent University, Ankara, Turkey,Neuroscience Program, Sabuncu Brain Research Center, Bilkent University, Ankara, Turkey
| | - Tolga Çukur
- National Magnetic Resonance Research Center, Bilkent University, Ankara, Turkey,Department of Electrical and Electronics Engineering, Bilkent University, Ankara, Turkey,Neuroscience Program, Sabuncu Brain Research Center, Bilkent University, Ankara, Turkey
| |
Collapse
|
5
|
Cano JR, Luengo J, García S. Label noise filtering techniques to improve monotonic classification. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.05.131] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
6
|
Abstract
BACKGROUND This paper aims to synthesise the literature on machine learning (ML) and big data applications for mental health, highlighting current research and applications in practice. METHODS We employed a scoping review methodology to rapidly map the field of ML in mental health. Eight health and information technology research databases were searched for papers covering this domain. Articles were assessed by two reviewers, and data were extracted on the article's mental health application, ML technique, data type, and study results. Articles were then synthesised via narrative review. RESULTS Three hundred papers focusing on the application of ML to mental health were identified. Four main application domains emerged in the literature, including: (i) detection and diagnosis; (ii) prognosis, treatment and support; (iii) public health, and; (iv) research and clinical administration. The most common mental health conditions addressed included depression, schizophrenia, and Alzheimer's disease. ML techniques used included support vector machines, decision trees, neural networks, latent Dirichlet allocation, and clustering. CONCLUSIONS Overall, the application of ML to mental health has demonstrated a range of benefits across the areas of diagnosis, treatment and support, research, and clinical administration. With the majority of studies identified focusing on the detection and diagnosis of mental health conditions, it is evident that there is significant room for the application of ML to other areas of psychology and mental health. The challenges of using ML techniques are discussed, as well as opportunities to improve and advance the field.
Collapse
Affiliation(s)
- Adrian B R Shatte
- Federation University, School of Science, Engineering & Information Technology,Melbourne,Australia
| | - Delyse M Hutchinson
- Deakin University, Centre for Social and Early Emotional Development, School of Psychology, Faculty of Health,Geelong,Australia
| | - Samantha J Teague
- Deakin University, Centre for Social and Early Emotional Development, School of Psychology, Faculty of Health,Geelong,Australia
| |
Collapse
|
7
|
Linthicum KP, Schafer KM, Ribeiro JD. Machine learning in suicide science: Applications and ethics. BEHAVIORAL SCIENCES & THE LAW 2019; 37:214-222. [PMID: 30609102 DOI: 10.1002/bsl.2392] [Citation(s) in RCA: 66] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Revised: 11/15/2018] [Accepted: 11/20/2018] [Indexed: 06/09/2023]
Abstract
For decades, our ability to predict suicide has remained at near-chance levels. Machine learning has recently emerged as a promising tool for advancing suicide science, particularly in the domain of suicide prediction. The present review provides an introduction to machine learning and its potential application to open questions in suicide research. Although only a few studies have implemented machine learning for suicide prediction, results to date indicate considerable improvement in accuracy and positive predictive value. Potential barriers to algorithm integration into clinical practice are discussed, as well as attendant ethical issues. Overall, machine learning approaches hold promise for accurate, scalable, and effective suicide risk detection; however, many critical questions and issues remain unexplored.
Collapse
Affiliation(s)
- Kathryn P Linthicum
- Department of Psychology, Florida State University, Tallahassee, FL, 32306-4301, USA
| | | | - Jessica D Ribeiro
- Department of Psychology, Florida State University, Tallahassee, FL, 32306-4301, USA
| |
Collapse
|
8
|
Helgheim BI, Maia R, Ferreira JC, Martins AL. Merging Data Diversity of Clinical Medical Records to Improve Effectiveness. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2019; 16:ijerph16050769. [PMID: 30832447 PMCID: PMC6427263 DOI: 10.3390/ijerph16050769] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/30/2018] [Revised: 02/04/2019] [Accepted: 02/24/2019] [Indexed: 12/13/2022]
Abstract
Medicine is a knowledge area continuously experiencing changes. Every day, discoveries and procedures are tested with the goal of providing improved service and quality of life to patients. With the evolution of computer science, multiple areas experienced an increase in productivity with the implementation of new technical solutions. Medicine is no exception. Providing healthcare services in the future will involve the storage and manipulation of large volumes of data (big data) from medical records, requiring the integration of different data sources, for a multitude of purposes, such as prediction, prevention, personalization, participation, and becoming digital. Data integration and data sharing will be essential to achieve these goals. Our work focuses on the development of a framework process for the integration of data from different sources to increase its usability potential. We integrated data from an internal hospital database, external data, and also structured data resulting from natural language processing (NPL) applied to electronic medical records. An extract-transform and load (ETL) process was used to merge different data sources into a single one, allowing more effective use of these data and, eventually, contributing to more efficient use of the available resources.
Collapse
Affiliation(s)
- Berit I Helgheim
- Logistics, Molde University College, Molde, NO-6410 Molde, Norway.
| | - Rui Maia
- DEI, Instituto Superior Técnico, Lisboa, 1049-001 Portugal.
| | - Joao C Ferreira
- Instituto Universitário de Lisboa (ISCTE-IUL), ISTAR-IUL, Lisbon 1649-026, Portugal.
| | - Ana Lucia Martins
- Instituto Universitário de Lisboa (ISCTE-IUL), BRU-IUL, Lisbon 1649-026, Portugal.
| |
Collapse
|
9
|
Huang Z, Ge Z, Dong W, He K, Duan H, Bath P. Relational regularized risk prediction of acute coronary syndrome using electronic health records. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2018.07.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
10
|
Saha B, Gupta S, Phung D, Venkatesh S. A Framework for Mixed-Type Multioutcome Prediction With Applications in Healthcare. IEEE J Biomed Health Inform 2017; 21:1182-1191. [PMID: 28328519 DOI: 10.1109/jbhi.2017.2681799] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Health analysis often involves prediction of multiple outcomes of mixed type. The existing work is restrictive to either a limited number or specific outcome types. We propose a framework for mixed-type multioutcome prediction. Our proposed framework proposes a cumulative loss function composed of a specific loss function for each outcome type-as an example, least square (continuous outcome), hinge (binary outcome), Poisson (count outcome), and exponential (nonnegative outcome). To model these outcomes jointly, we impose a commonality across the prediction parameters through a common matrix normal prior. The framework is formulated as iterative optimization problems and solved using an efficient block-coordinate descent method. We empirically demonstrate both scalability and convergence. We apply the proposed model to a synthetic dataset and then on two real-world cohorts: a cancer cohort and an acute myocardial infarction cohort collected over a two-year period. We predict multiple emergency-related outcomes-as example, future emergency presentations (binary), emergency admissions (count), emergency length of stay days (nonnegative), and emergency time to next admission day (nonnegative). We show that the predictive performance of the proposed model is better than several state-of-the-art baselines.
Collapse
|
11
|
Into the Bowels of Depression: Unravelling Medical Symptoms Associated with Depression by Applying Machine-Learning Techniques to a Community Based Population Sample. PLoS One 2016; 11:e0167055. [PMID: 27935995 PMCID: PMC5147841 DOI: 10.1371/journal.pone.0167055] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Accepted: 11/08/2016] [Indexed: 12/15/2022] Open
Abstract
Background Depression is commonly comorbid with many other somatic diseases and symptoms. Identification of individuals in clusters with comorbid symptoms may reveal new pathophysiological mechanisms and treatment targets. The aim of this research was to combine machine-learning (ML) algorithms with traditional regression techniques by utilising self-reported medical symptoms to identify and describe clusters of individuals with increased rates of depression from a large cross-sectional community based population epidemiological study. Methods A multi-staged methodology utilising ML and traditional statistical techniques was performed using the community based population National Health and Nutrition Examination Study (2009–2010) (N = 3,922). A Self-organised Mapping (SOM) ML algorithm, combined with hierarchical clustering, was performed to create participant clusters based on 68 medical symptoms. Binary logistic regression, controlling for sociodemographic confounders, was used to then identify the key clusters of participants with higher levels of depression (PHQ-9≥10, n = 377). Finally, a Multiple Additive Regression Tree boosted ML algorithm was run to identify the important medical symptoms for each key cluster within 17 broad categories: heart, liver, thyroid, respiratory, diabetes, arthritis, fractures and osteoporosis, skeletal pain, blood pressure, blood transfusion, cholesterol, vision, hearing, psoriasis, weight, bowels and urinary. Results Five clusters of participants, based on medical symptoms, were identified to have significantly increased rates of depression compared to the cluster with the lowest rate: odds ratios ranged from 2.24 (95% CI 1.56, 3.24) to 6.33 (95% CI 1.67, 24.02). The ML boosted regression algorithm identified three key medical condition categories as being significantly more common in these clusters: bowel, pain and urinary symptoms. Bowel-related symptoms was found to dominate the relative importance of symptoms within the five key clusters. Conclusion This methodology shows promise for the identification of conditions in general populations and supports the current focus on the potential importance of bowel symptoms and the gut in mental health research.
Collapse
|
12
|
Nguyen P, Tran T, Wickramasinghe N, Venkatesh S. $\mathtt {Deepr}$: A Convolutional Net for Medical Records. IEEE J Biomed Health Inform 2016; 21:22-30. [PMID: 27913366 DOI: 10.1109/jbhi.2016.2633963] [Citation(s) in RCA: 124] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Feature engineering remains a major bottleneck when creating predictive systems from electronic medical records. At present, an important missing element is detecting predictive regular clinical motifs from irregular episodic records. We present Deepr (short for Deep record), a new end-to-end deep learning system that learns to extract features from medical records and predicts future risk automatically. Deepr transforms a record into a sequence of discrete elements separated by coded time gaps and hospital transfers. On top of the sequence is a convolutional neural net that detects and combines predictive local clinical motifs to stratify the risk. Deepr permits transparent inspection and visualization of its inner working. We validate Deepr on hospital data to predict unplanned readmission after discharge. Deepr achieves superior accuracy compared to traditional techniques, detects meaningful clinical motifs, and uncovers the underlying structure of the disease and intervention space.
Collapse
|
13
|
Utilizing Chinese Admission Records for MACE Prediction of Acute Coronary Syndrome. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2016; 13:ijerph13090912. [PMID: 27649220 PMCID: PMC5036745 DOI: 10.3390/ijerph13090912] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2016] [Revised: 08/09/2016] [Accepted: 08/31/2016] [Indexed: 11/18/2022]
Abstract
Background: Clinical major adverse cardiovascular event (MACE) prediction of acute coronary syndrome (ACS) is important for a number of applications including physician decision support, quality of care assessment, and efficient healthcare service delivery on ACS patients. Admission records, as typical media to contain clinical information of patients at the early stage of their hospitalizations, provide significant potential to be explored for MACE prediction in a proactive manner. Methods: We propose a hybrid approach for MACE prediction by utilizing a large volume of admission records. Firstly, both a rule-based medical language processing method and a machine learning method (i.e., Conditional Random Fields (CRFs)) are developed to extract essential patient features from unstructured admission records. After that, state-of-the-art supervised machine learning algorithms are applied to construct MACE prediction models from data. Results: We comparatively evaluate the performance of the proposed approach on a real clinical dataset consisting of 2930 ACS patient samples collected from a Chinese hospital. Our best model achieved 72% AUC in MACE prediction. In comparison of the performance between our models and two well-known ACS risk score tools, i.e., GRACE and TIMI, our learned models obtain better performances with a significant margin. Conclusions: Experimental results reveal that our approach can obtain competitive performance in MACE prediction. The comparison of classifiers indicates the proposed approach has a competitive generality with datasets extracted by different feature extraction methods. Furthermore, our MACE prediction model obtained a significant improvement by comparison with both GRACE and TIMI. It indicates that using admission records can effectively provide MACE prediction service for ACS patients at the early stage of their hospitalizations.
Collapse
|
14
|
A probabilistic topic model for clinical risk stratification from electronic health records. J Biomed Inform 2015; 58:28-36. [PMID: 26370451 DOI: 10.1016/j.jbi.2015.09.005] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2015] [Revised: 09/04/2015] [Accepted: 09/05/2015] [Indexed: 11/21/2022]
Abstract
BACKGROUND AND OBJECTIVE Risk stratification aims to provide physicians with the accurate assessment of a patient's clinical risk such that an individualized prevention or management strategy can be developed and delivered. Existing risk stratification techniques mainly focus on predicting the overall risk of an individual patient in a supervised manner, and, at the cohort level, often offer little insight beyond a flat score-based segmentation from the labeled clinical dataset. To this end, in this paper, we propose a new approach for risk stratification by exploring a large volume of electronic health records (EHRs) in an unsupervised fashion. METHODS Along this line, this paper proposes a novel probabilistic topic modeling framework called probabilistic risk stratification model (PRSM) based on Latent Dirichlet Allocation (LDA). The proposed PRSM recognizes a patient clinical state as a probabilistic combination of latent sub-profiles, and generates sub-profile-specific risk tiers of patients from their EHRs in a fully unsupervised fashion. The achieved stratification results can be easily recognized as high-, medium- and low-risk, respectively. In addition, we present an extension of PRSM, called weakly supervised PRSM (WS-PRSM) by incorporating minimum prior information into the model, in order to improve the risk stratification accuracy, and to make our models highly portable to risk stratification tasks of various diseases. RESULTS We verify the effectiveness of the proposed approach on a clinical dataset containing 3463 coronary heart disease (CHD) patient instances. Both PRSM and WS-PRSM were compared with two established supervised risk stratification algorithms, i.e., logistic regression and support vector machine, and showed the effectiveness of our models in risk stratification of CHD in terms of the Area Under the receiver operating characteristic Curve (AUC) analysis. As well, in comparison with PRSM, WS-PRSM has over 2% performance gain, on the experimental dataset, demonstrating that incorporating risk scoring knowledge as prior information can improve the performance in risk stratification. CONCLUSIONS Experimental results reveal that our models achieve competitive performance in risk stratification in comparison with existing supervised approaches. In addition, the unsupervised nature of our models makes them highly portable to the risk stratification tasks of various diseases. Moreover, patient sub-profiles and sub-profile-specific risk tiers generated by our models are coherent and informative, and provide significant potential to be explored for the further tasks, such as patient cohort analysis. We hypothesize that the proposed framework can readily meet the demand for risk stratification from a large volume of EHRs in an open-ended fashion.
Collapse
|