1
|
Asadi F, Homayounfar R, Mehrali Y, Masci C, Talebi S, Zayeri F. Detection of cardiovascular disease cases using advanced tree-based machine learning algorithms. Sci Rep 2024; 14:22230. [PMID: 39333550 PMCID: PMC11437204 DOI: 10.1038/s41598-024-72819-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2024] [Accepted: 09/10/2024] [Indexed: 09/29/2024] Open
Abstract
Cardiovascular disease (CVD) can often lead to serious consequences such as death or disability. This study aims to identify a tree-based machine learning method with the best performance criteria for the detection of CVD. This study analyzed data collected from 9,499 participants, with a focus on 38 different variables. The target variable was the presence of cardiovascular disease (CVD) and the villages were considered as the cluster variable. The standard tree, random forest, Generalized Linear Mixed Model tree (GLMM tree), and Generalized Mixed Effect random forest (GMERF) were fitted to the data and the estimated prediction power indices were compared to identify the best approach. According to the analysis of important variables in all models, five variables (age, LDL, history of cardiac disease in first-degree relatives, physical activity level, and presence of hypertension) were identified as the most influential in predicting CVD. Fitting the decision tree, random forest, GLMM tree, and GMERF, respectively, resulted in an area under the ROC curve of 0.56, 0.73, 0.78, and 0.80. The GMERF model demonstrated the best predictive performance among the fitted models based on evaluation criteria. Regarding the clustered structure of the data, using relevant machine-learning approaches that account for this clustering may result in more accurate predicting indices and targeted prevention frameworks.
Collapse
Affiliation(s)
- Fariba Asadi
- Department of Biostatistics, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Reza Homayounfar
- Food Technology Research Institute, Faculty of Nutrition Sciences and Food Technology, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | | | - Chiara Masci
- MOX-Department of Mathematics, Politecnico Di Milano, Milan, Italy
| | - Samaneh Talebi
- Department of Biostatistics, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Farid Zayeri
- Proteomics Research Center, Department of Biostatistics, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Qods Square, Darband Street, Tehran, Iran.
| |
Collapse
|
2
|
Fu Z, Xi J, Ji Z, Zhang R, Wang J, Shi R, Pu X, Yu J, Xue F, Liu J, Wang Y, Zhong H, Feng J, Zhang M, He Y. Analysis of anterior segment in primary angle closure suspect with deep learning models. BMC Med Inform Decis Mak 2024; 24:251. [PMID: 39251987 PMCID: PMC11385134 DOI: 10.1186/s12911-024-02658-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2024] [Accepted: 08/29/2024] [Indexed: 09/11/2024] Open
Abstract
OBJECTIVE To analyze primary angle closure suspect (PACS) patients' anatomical characteristics of anterior chamber configuration, and to establish artificial intelligence (AI)-aided diagnostic system for PACS screening. METHODS A total of 1668 scans of 839 patients were included in this cross-sectional study. The subjects were divided into two groups: PACS group and normal group. With anterior segment optical coherence tomography scans, the anatomical diversity between two groups was compared, and anterior segment structure features of PACS were extracted. Then, AI-aided diagnostic system was constructed, which based different algorithms such as classification and regression tree (CART), random forest (RF), logistic regression (LR), VGG-16 and Alexnet. Then the diagnostic efficiencies of different algorithms were evaluated, and compared with junior physicians and experienced ophthalmologists. RESULTS RF [sensitivity (Se) = 0.84; specificity (Sp) = 0.92; positive predict value (PPV) = 0.82; negative predict value (NPV) = 0.95; area under the curve (AUC) = 0.90] and CART (Se = 0.76, Sp = 0.93, PPV = 0.85, NPV = 0.92, AUC = 0.90) showed better performance than LR (Se = 0.68, Sp = 0.91, PPV = 0.79, NPV = 0.90, AUC = 0.86). In convolutional neural networks (CNN), Alexnet (Se = 0.83, Sp = 0.95, PPV = 0.92, NPV = 0.87, AUC = 0.85) was better than VGG-16 (Se = 0.84, Sp = 0.90, PPV = 0.85, NPV = 0.90, AUC = 0.79). The performance of 2 CNN algorithms was better than 5 junior physicians, and the mean value of diagnostic indicators of 2 CNN algorithm was similar to experienced ophthalmologists. CONCLUSION PACS patients have distinct anatomical characteristics compared with health controls. AI models for PACS screening are reliable and powerful, equivalent to experienced ophthalmologists.
Collapse
Affiliation(s)
- Ziwei Fu
- The Second Affiliated Hospital of Xi'an Medical University, Xi'an, Shaanxi, 710038, China
- Xi'an Medical University, Xi'an, Shaanxi, 710021, China
- Xi'an Key Laboratory for the Prevention and Treatment of Eye and Brain Neurological Related Diseases, Xi'an, Shaanxi, 710038, China
| | - Jinwei Xi
- The Second Affiliated Hospital of Xi'an Medical University, Xi'an, Shaanxi, 710038, China
| | - Zhi Ji
- The Second Affiliated Hospital of Xi'an Medical University, Xi'an, Shaanxi, 710038, China
- Xi'an Medical University, Xi'an, Shaanxi, 710021, China
| | - Ruxue Zhang
- School of Mathematics, Northwest University, Xi'an, 710127, China
| | - Jianping Wang
- Shaanxi Provincial People's Hospital, Xi'an, Shaanxi, 710068, China
| | - Rui Shi
- Shaanxi Provincial People's Hospital, Xi'an, Shaanxi, 710068, China
| | - Xiaoli Pu
- Xianyang First People's Hospital, Xianyang, Shaanxi Province, 712000, China
| | - Jingni Yu
- Xi'an People's Hospital, Xi'an, Shaanxi, 712099, China
| | - Fang Xue
- Xi'an Medical University, Xi'an, Shaanxi, 710021, China
| | - Jianrong Liu
- Xi'an People's Hospital, Xi'an, Shaanxi, 712099, China
| | - Yanrong Wang
- Yan'an People's Hospital, Yan'an, Shaanxi, 716099, China
| | - Hua Zhong
- The First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan Province, 650032, China
| | - Jun Feng
- School of Mathematics, Northwest University, Xi'an, 710127, China
| | - Min Zhang
- School of Mathematics, Northwest University, Xi'an, 710127, China.
| | - Yuan He
- The Second Affiliated Hospital of Xi'an Medical University, Xi'an, Shaanxi, 710038, China.
- Xi'an Medical University, Xi'an, Shaanxi, 710021, China.
- Xi'an Key Laboratory for the Prevention and Treatment of Eye and Brain Neurological Related Diseases, Xi'an, Shaanxi, 710038, China.
| |
Collapse
|
3
|
Wu D, Salsbury FR. Unraveling the Role of Hydrogen Bonds in Thrombin via Two Machine Learning Methods. J Chem Inf Model 2023; 63:3705-3718. [PMID: 37285464 PMCID: PMC11164249 DOI: 10.1021/acs.jcim.3c00153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Hydrogen bonds play a critical role in the folding and stability of proteins, such as proteins and nucleic acids, by providing strong and directional interactions. They help to maintain the secondary and 3D structure of proteins, and structural changes in these molecules often result from the formation or breaking of hydrogen bonds. To gain insights into these hydrogen bonding networks, we applied two machine learning models - a logistic regression model and a decision tree model - to study four variants of thrombin: wild-type, ΔK9, E8K, and R4A. Our results showed that both models have their unique advantages. The logistic regression model highlighted potential key residues (GLU295) in thrombin's allosteric pathways, while the decision tree model identified important hydrogen bonding motifs. This information can aid in understanding the mechanisms of folding in proteins and has potential applications in drug design and other therapies. The use of these two models highlights their usefulness in studying hydrogen bonding networks in proteins.
Collapse
Affiliation(s)
- Dizhou Wu
- Department of Physics, Wake Forest University, Winston-Salem, North Carolina 27106, United States
| | - Freddie R Salsbury
- Department of Physics, Wake Forest University, Winston-Salem, North Carolina 27106, United States
| |
Collapse
|
4
|
The L2 convergence of stream data mining algorithms based on probabilistic neural networks. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.02.074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2023]
|
5
|
Su C, Zhang L, Zhao L. Online local fisher risk minimization: a new online kernel method for online classification. APPL INTELL 2023. [DOI: 10.1007/s10489-022-04400-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
|
6
|
Song Y, Lu J, Liu A, Lu H, Zhang G. A Segment-Based Drift Adaptation Method for Data Streams. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:4876-4889. [PMID: 33835922 DOI: 10.1109/tnnls.2021.3062062] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In concept drift adaptation, we aim to design a blind or an informed strategy to update our best predictor for future data at each time point. However, existing informed drift adaptation methods need to wait for an entire batch of data to detect drift and then update the predictor (if drift is detected), which causes adaptation delay. To overcome the adaptation delay, we propose a sequentially updated statistic, called drift-gradient to quantify the increase of distributional discrepancy when every new instance arrives. Based on drift-gradient, a segment-based drift adaptation (SEGA) method is developed to online update our best predictor. Drift-gradient is defined on a segment in the training set. It can precisely quantify the increase of distributional discrepancy between the old segment and the newest segment when only one new instance is available at each time point. A lower value of drift-gradient on the old segment represents that the distribution of the new instance is closer to the distribution of the old segment. Based on the drift-gradient, SEGA retrains our best predictors with the segments that have the minimum drift-gradient when every new instance arrives. SEGA has been validated by extensive experiments on both synthetic and real-world, classification and regression data streams. The experimental results show that SEGA outperforms competitive blind and informed drift adaptation methods.
Collapse
|
7
|
Statistical Analysis and Development of an Ensemble-Based Machine Learning Model for Photovoltaic Fault Detection. ENERGIES 2022. [DOI: 10.3390/en15155492] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
This paper presents a framework for photovoltaic (PV) fault detection based on statistical, supervised, and unsupervised machine learning (ML) approaches. The research is motivated by a need to develop a cost-effective solution that detects the fault types within PV systems based on a real dataset with a minimum number of input features. We discover the appropriate conditions for method selection and establish how to minimize computational demand from different ML approaches. Subsequently, the PV dataset is labeled as a result of clustering and classification. The labelled dataset is then trained using various ML models before evaluating each based on accuracy, precision, and a confusion matrix. Notably, an accuracy ranging from 94% to 100% is achieved with datasets from two different PV systems. The model robustness is affirmed by performing the approach on an additional real-world dataset that exhibits noise and missing values.
Collapse
|
8
|
Pal M, Parija S, Panda G, Dhama K, Mohapatra RK. Risk prediction of cardiovascular disease using machine learning classifiers. Open Med (Wars) 2022; 17:1100-1113. [PMID: 35799599 PMCID: PMC9206502 DOI: 10.1515/med-2022-0508] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Revised: 05/14/2022] [Accepted: 05/23/2022] [Indexed: 11/15/2022] Open
Abstract
Abstract
Cardiovascular disease (CVD) makes our heart and blood vessels dysfunctional and often leads to death or physical paralysis. Therefore, early and automatic detection of CVD can save many human lives. Multiple investigations have been carried out to achieve this objective, but there is still room for improvement in performance and reliability. This study is yet another step in this direction. In this study, two reliable machine learning techniques, multi-layer perceptron (MLP), and K-nearest neighbour (K-NN) have been employed for CVD detection using publicly available University of California Irvine repository data. The performances of the models are optimally increased by removing outliers and attributes having null values. Experimental-based results demonstrate that a higher accuracy in detection of 82.47% and an area-under-the-curve value of 86.41% are obtained using the MLP model, unlike the K-NN model. Therefore, the proposed MLP model was recommended for automatic CVD detection. The proposed methodology can also be employed in detecting other diseases. In addition, the performance of the proposed model can be assessed via other standard data sets.
Collapse
Affiliation(s)
- Madhumita Pal
- Department of Electronics and Communication Engineering, C. V. Raman Global University , Bidyanagar, Mahura, Janla , Bhubaneswar , Odisha 752054 , India
| | - Smita Parija
- Department of Electronics and Communication Engineering, C. V. Raman Global University , Bidyanagar, Mahura, Janla , Bhubaneswar , Odisha 752054 , India
| | - Ganapati Panda
- Department of Electronics and Communication Engineering, C. V. Raman Global University , Bidyanagar, Mahura, Janla , Bhubaneswar , Odisha 752054 , India
| | - Kuldeep Dhama
- Division of Pathology, ICAR-Indian Veterinary Research Institute , Izatnagar , Bareilly 243122 , Uttar Pradesh , India
| | - Ranjan K. Mohapatra
- Department of Chemistry, Government College of Engineering , Keonjhar , Odisha 758002 , India
| |
Collapse
|
9
|
Sauer J, Mariani VC, dos Santos Coelho L, Ribeiro MHDM, Rampazzo M. Extreme gradient boosting model based on improved Jaya optimizer applied to forecasting energy consumption in residential buildings. EVOLVING SYSTEMS 2021. [DOI: 10.1007/s12530-021-09404-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
10
|
AdaDT: An adaptive decision tree for addressing local class imbalance based on multiple split criteria. APPL INTELL 2021. [DOI: 10.1007/s10489-020-02061-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
11
|
Mahan F, Mohammadzad M, Rozekhani SM, Pedrycz W. Chi-MFlexDT:Chi-square-based multi flexible fuzzy decision tree for data stream classification. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107301] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
12
|
A New Approach to Detection of Changes in Multidimensional Patterns - Part II. JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH 2021. [DOI: 10.2478/jaiscr-2021-0013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Abstract
In the paper we develop an algorithm based on the Parzen kernel estimate for detection of sudden changes in 3-dimensional shapes which happen along the edge curves. Such problems commonly arise in various areas of computer vision, e.g., in edge detection, bioinformatics and processing of satellite imagery. In many engineering problems abrupt change detection may help in fault protection e.g. the jump detection in functions describing the static and dynamic properties of the objects in mechanical systems. We developed an algorithm for detecting abrupt changes which is nonparametric in nature and utilizes Parzen regression estimates of multivariate functions and their derivatives. In tests we apply this method, particularly but not exclusively, to the functions of two variables.
Collapse
|
13
|
Reese B, Silwal A, Daugherity E, Daugherity M, Arabi M, Daly P, Paterson Y, Woolford L, Christie A, Elias R, Brugarolas J, Wang T, Karbowniczek M, Markiewski MM. Complement as Prognostic Biomarker and Potential Therapeutic Target in Renal Cell Carcinoma. THE JOURNAL OF IMMUNOLOGY 2020; 205:3218-3229. [PMID: 33158953 DOI: 10.4049/jimmunol.2000511] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Accepted: 09/29/2020] [Indexed: 12/21/2022]
Abstract
Preclinical studies demonstrated that complement promotes tumor growth. Therefore, we sought to determine the best target for complement-based therapy among common human malignancies. High expression of 11 complement genes was linked to unfavorable prognosis in renal cell carcinoma. Complement protein expression or deposition was observed mainly in stroma, leukocytes, and tumor vasculature, corresponding to a role of complement in regulating the tumor microenvironment. Complement abundance in tumors correlated with a high nuclear grade. Complement genes clustered within an aggressive inflammatory subtype of renal cancer characterized by poor prognosis, markers of T cell dysfunction, and alternatively activated macrophages. Plasma levels of complement proteins correlated with response to immune checkpoint inhibitors. Corroborating human data, complement deficiencies and blockade reduced tumor growth by enhancing antitumor immunity and seemingly reducing angiogenesis in a mouse model of kidney cancer resistant to PD-1 blockade. Overall, this study implicates complement in the immune landscape of renal cell carcinoma, and notwithstanding cohort size and preclinical model limitations, the data suggest that tumors resistant to immune checkpoint inhibitors might be suitable targets for complement-based therapy.
Collapse
Affiliation(s)
- Britney Reese
- Department of Immunotherapeutics and Biotechnology, School of Pharmacy, Texas Tech University Health Sciences Center, Abilene, TX 79601
| | - Ashok Silwal
- Department of Immunotherapeutics and Biotechnology, School of Pharmacy, Texas Tech University Health Sciences Center, Abilene, TX 79601
| | - Elizabeth Daugherity
- Department of Immunotherapeutics and Biotechnology, School of Pharmacy, Texas Tech University Health Sciences Center, Abilene, TX 79601
| | - Michael Daugherity
- Department of Engineering and Physics, Abilene Christian University, Abilene, TX 79601
| | - Mahshid Arabi
- Department of Immunotherapeutics and Biotechnology, School of Pharmacy, Texas Tech University Health Sciences Center, Abilene, TX 79601
| | - Pierce Daly
- Department of Immunotherapeutics and Biotechnology, School of Pharmacy, Texas Tech University Health Sciences Center, Abilene, TX 79601
| | - Yvonne Paterson
- Department of Microbiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - Layton Woolford
- Division of Hematology and Oncology, Department of Internal Medicine, University of Texas Southwestern, Dallas, TX 75390.,Kidney Cancer Program, Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX 75390; and
| | - Alana Christie
- Division of Hematology and Oncology, Department of Internal Medicine, University of Texas Southwestern, Dallas, TX 75390.,Kidney Cancer Program, Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX 75390; and
| | - Roy Elias
- Division of Hematology and Oncology, Department of Internal Medicine, University of Texas Southwestern, Dallas, TX 75390.,Kidney Cancer Program, Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX 75390; and
| | - James Brugarolas
- Division of Hematology and Oncology, Department of Internal Medicine, University of Texas Southwestern, Dallas, TX 75390.,Kidney Cancer Program, Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX 75390; and
| | - Tao Wang
- Kidney Cancer Program, Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX 75390; and.,The Quantitative Biomedical Research Center, Department of Population and Data Sciences, University of Texas Southwestern Medical Center, Dallas, TX 75390
| | - Magdalena Karbowniczek
- Department of Immunotherapeutics and Biotechnology, School of Pharmacy, Texas Tech University Health Sciences Center, Abilene, TX 79601
| | - Maciej M Markiewski
- Department of Immunotherapeutics and Biotechnology, School of Pharmacy, Texas Tech University Health Sciences Center, Abilene, TX 79601;
| |
Collapse
|
14
|
Huang L, Wang CD, Chao HY, Yu PS. MVStream: Multiview Data Stream Clustering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:3482-3496. [PMID: 31675346 DOI: 10.1109/tnnls.2019.2944851] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This article studies a new problem of data stream clustering, namely, multiview data stream (MVStream) clustering. Although many data stream clustering algorithms have been developed, they are restricted to the single-view streaming data, and clustering MVStreams still remains largely unsolved. In addition to the many issues encountered by the conventional single-view data stream clustering, such as capturing cluster evolution and discovering clusters of arbitrary shapes under the limited computational resources, the main challenge of MVStream clustering lies in integrating information from multiple views in a streaming manner and abstracting summary statistics from the integrated features simultaneously. In this article, we propose a novel MVStream clustering algorithm for the first time. The main idea is to design a multiview support vector domain description (MVSVDD) model, by which the information from multiple insufficient views can be integrated, and the outputting support vectors (SVs) are utilized to abstract the summary statistics of the historical multiview data objects. Based on the MVSVDD model, a new multiview cluster labeling method is designed, whereby clusters of arbitrary shapes can be discovered for each view. By tracking the cluster labels of SVs in each view, the cluster evolution associated with concept drift can be captured. Since the SVs occupy only a small portion of data objects, the proposed MVStream algorithm is quite efficient with the limited computational resources. Extensive experiments are conducted to demonstrate the effectiveness and efficiency of the proposed method.
Collapse
|
15
|
Duda P, Rutkowski L, Jaworski M, Rutkowska D. On the Parzen Kernel-Based Probability Density Function Learning Procedures Over Time-Varying Streaming Data With Applications to Pattern Classification. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:1683-1696. [PMID: 30452383 DOI: 10.1109/tcyb.2018.2877611] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this paper, we propose a recursive variant of the Parzen kernel density estimator (KDE) to track changes of dynamic density over data streams in a nonstationary environment. In stationary environments, well-established traditional KDE techniques have nice asymptotic properties. Their existing extensions to deal with stream data are mostly based on various heuristic concepts (losing convergence properties). In this paper, we study recursive KDEs, called recursive concept drift tracking KDEs, and prove their weak (in probability) and strong (with probability one) convergence, resulting in perfect tracking properties as the sample size approaches infinity. In three theorems and subsequent examples, we show how to choose the bandwidth and learning rate of a recursive KDE in order to ensure weak and strong convergence. The simulation results illustrate the effectiveness of our algorithm both for density estimation and classification over time-varying stream data.
Collapse
|
16
|
A New Approach to Detection of Changes in Multidimensional Patterns. JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH 2020. [DOI: 10.2478/jaiscr-2020-0009] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Abstract
Nowadays, unprecedented amounts of heterogeneous data collections are stored, processed and transmitted via the Internet. In data analysis one of the most important problems is to verify whether data observed or/and collected in time are genuine and stationary, i.e. the information sources did not change their characteristics. There is a variety of data types: texts, images, audio or video files or streams, metadata descriptions, thereby ordinary numbers. All of them changes in many ways. If the change happens the next question is what is the essence of this change and when and where the change has occurred. The main focus of this paper is detection of change and classification of its type. Many algorithms have been proposed to detect abnormalities and deviations in the data. In this paper we propose a new approach for abrupt changes detection based on the Parzen kernel estimation of the partial derivatives of the multivariate regression functions in presence of probabilistic noise. The proposed change detection algorithm is applied to oneand two-dimensional patterns to detect the abrupt changes.
Collapse
|
17
|
Abstract
Abstract
Mining and analysing streaming data is crucial for many applications, and this area of research has gained extensive attention over the past decade. However, there are several inherent problems that continue to challenge the hardware and the state-of-the art algorithmic solutions. Examples of such problems include the unbound size, varying speed and unknown data characteristics of arriving instances from a data stream. The aim of this research is to portray key challenges faced by algorithmic solutions for stream mining, particularly focusing on the prevalent issue of concept drift. A comprehensive discussion of concept drift and its inherent data challenges in the context of stream mining is presented, as is a critical, in-depth review of relevant literature. Current issues with the evaluative procedure for concept drift detectors is also explored, highlighting problems such as a lack of established base datasets and the impact of temporal dependence on concept drift detection. By exposing gaps in the current literature, this study suggests recommendations for future research which should aid in the progression of stream mining and concept drift detection algorithms.
Collapse
|
18
|
Alizadehsani R, Abdar M, Roshanzamir M, Khosravi A, Kebria PM, Khozeimeh F, Nahavandi S, Sarrafzadegan N, Acharya UR. Machine learning-based coronary artery disease diagnosis: A comprehensive review. Comput Biol Med 2019; 111:103346. [PMID: 31288140 DOI: 10.1016/j.compbiomed.2019.103346] [Citation(s) in RCA: 80] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Revised: 06/26/2019] [Accepted: 06/26/2019] [Indexed: 02/02/2023]
Abstract
Coronary artery disease (CAD) is the most common cardiovascular disease (CVD) and often leads to a heart attack. It annually causes millions of deaths and billions of dollars in financial losses worldwide. Angiography, which is invasive and risky, is the standard procedure for diagnosing CAD. Alternatively, machine learning (ML) techniques have been widely used in the literature as fast, affordable, and noninvasive approaches for CAD detection. The results that have been published on ML-based CAD diagnosis differ substantially in terms of the analyzed datasets, sample sizes, features, location of data collection, performance metrics, and applied ML techniques. Due to these fundamental differences, achievements in the literature cannot be generalized. This paper conducts a comprehensive and multifaceted review of all relevant studies that were published between 1992 and 2019 for ML-based CAD diagnosis. The impacts of various factors, such as dataset characteristics (geographical location, sample size, features, and the stenosis of each coronary artery) and applied ML techniques (feature selection, performance metrics, and method) are investigated in detail. Finally, the important challenges and shortcomings of ML-based CAD diagnosis are discussed.
Collapse
Affiliation(s)
- Roohallah Alizadehsani
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Australia.
| | - Moloud Abdar
- Département d'informatique, Université du Québec à Montréal, Montréal, Québec, Canada
| | - Mohamad Roshanzamir
- Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, Iran
| | - Abbas Khosravi
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Australia
| | - Parham M Kebria
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Australia
| | - Fahime Khozeimeh
- Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Saeid Nahavandi
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Australia
| | - Nizal Sarrafzadegan
- Faculty of Medicine, SPPH, University of British Columbia, Vancouver, BC, Canada; Isfahan Cardiovascular Research Center, Cardiovascular Research Institute, Isfahan University of Medical Sciences, Khorram Ave, Isfahan, Iran
| | - U Rajendra Acharya
- Department of Electronics and Computer Engineering, Ngee Ann Polytechnic, Singapore; Department of Biomedical Engineering, School of Science and Technology, Singapore University of Social Sciences, Singapore; Department of Biomedical Engineering, Faculty of Engineering, University of Malaya, Malaysia
| |
Collapse
|
19
|
Costa VGTD, Carvalho ACPDLFD, Barbon Junior S. Strict Very Fast Decision Tree: A memory conservative algorithm for data stream mining. Pattern Recognit Lett 2018. [DOI: 10.1016/j.patrec.2018.09.004] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
20
|
Duda P, Jaworski M, Rutkowski L. Convergent Time-Varying Regression Models for Data Streams: Tracking Concept Drift by the Recursive Parzen-Based Generalized Regression Neural Networks. Int J Neural Syst 2017; 28:1750048. [PMID: 29129128 DOI: 10.1142/s0129065717500484] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
One of the greatest challenges in data mining is related to processing and analysis of massive data streams. Contrary to traditional static data mining problems, data streams require that each element is processed only once, the amount of allocated memory is constant and the models incorporate changes of investigated streams. A vast majority of available methods have been developed for data stream classification and only a few of them attempted to solve regression problems, using various heuristic approaches. In this paper, we develop mathematically justified regression models working in a time-varying environment. More specifically, we study incremental versions of generalized regression neural networks, called IGRNNs, and we prove their tracking properties - weak (in probability) and strong (with probability one) convergence assuming various concept drift scenarios. First, we present the IGRNNs, based on the Parzen kernels, for modeling stationary systems under nonstationary noise. Next, we extend our approach to modeling time-varying systems under nonstationary noise. We present several types of concept drifts to be handled by our approach in such a way that weak and strong convergence holds under certain conditions. Finally, in the series of simulations, we compare our method with commonly used heuristic approaches, based on forgetting mechanism or sliding windows, to deal with concept drift. Finally, we apply our concept in a real life scenario solving the problem of currency exchange rates prediction.
Collapse
Affiliation(s)
- Piotr Duda
- * Institute of Computational Intelligence, Czestochowa University of Technology, Al. Armii Krajowej 36, 42-200 Czestochowa, Poland
| | - Maciej Jaworski
- * Institute of Computational Intelligence, Czestochowa University of Technology, Al. Armii Krajowej 36, 42-200 Czestochowa, Poland
| | - Leszek Rutkowski
- * Institute of Computational Intelligence, Czestochowa University of Technology, Al. Armii Krajowej 36, 42-200 Czestochowa, Poland.,† Information Technology Institute, Academy of Social Sciences, 90-113 Łódź, Poland
| |
Collapse
|