1
|
Ortega Vázquez C, vanden Broucke S, De Weerdt J. Hellinger distance decision trees for PU learning in imbalanced data sets. Mach Learn 2023. [DOI: 10.1007/s10994-023-06323-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023]
|
2
|
Xiong H, Chen H, Xu L, Liu H, Fan L, Tang Q, Cho H. A survey of data element perspective: Application of artificial intelligence in health big data. Front Neurosci 2022; 16:1031732. [PMID: 36389224 PMCID: PMC9641178 DOI: 10.3389/fnins.2022.1031732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 10/06/2022] [Indexed: 11/26/2022] Open
Abstract
Artificial intelligence (AI) based on the perspective of data elements is widely used in the healthcare informatics domain. Large amounts of clinical data from electronic medical records (EMRs), electronic health records (EHRs), and electroencephalography records (EEGs) have been generated and collected at an unprecedented speed and scale. For instance, the new generation of wearable technologies enables easy-collecting peoples’ daily health data such as blood pressure, blood glucose, and physiological data, as well as the application of EHRs documenting large amounts of patient data. The cost of acquiring and processing health big data is expected to reduce dramatically with the help of AI technologies and open-source big data platforms such as Hadoop and Spark. The application of AI technologies in health big data presents new opportunities to discover the relationship among living habits, sports, inheritances, diseases, symptoms, and drugs. Meanwhile, with the development of fast-growing AI technologies, many promising methodologies are proposed in the healthcare field recently. In this paper, we review and discuss the application of machine learning (ML) methods in health big data in two major aspects: (1) Special features of health big data including multimodal, incompletion, time validation, redundancy, and privacy. (2) ML methodologies in the healthcare field including classification, regression, clustering, and association. Furthermore, we review the recent progress and breakthroughs of automatic diagnosis in health big data and summarize the challenges, gaps, and opportunities to improve and advance automatic diagnosis in the health big data field.
Collapse
Affiliation(s)
- Honglin Xiong
- Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai, China
| | - Hongmin Chen
- Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai, China
| | - Li Xu
- Antai College of Economics and Management, Shanghai Jiao Tong University, Shanghai, China
- *Correspondence: Li Xu,
| | - Hong Liu
- Business School, University of Shanghai for Science and Technology, Shanghai, China
- Hong Liu,
| | - Lumin Fan
- Business School, University of Shanghai for Science and Technology, Shanghai, China
- Operation Management Department, East Hospital Affiliated to Tongji University, Shanghai, China
| | - Qifeng Tang
- Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, China
- National Engineering Laboratory for Big Data Distribution and Exchange Technologies, Shanghai, China
- Shanghai Data Exchange Corporation, Shanghai, China
| | - Hsunfang Cho
- National Engineering Laboratory for Big Data Distribution and Exchange Technologies, Shanghai, China
- Shanghai Data Exchange Corporation, Shanghai, China
| |
Collapse
|
3
|
Rezvani S, Wang X. Class imbalance learning using fuzzy ART and intuitionistic fuzzy twin support vector machines. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.07.010] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
4
|
Pozi MSM, Azhar NA, Raziff ARA, Ajrina LH. SVGPM: evolving SVM decision function by using genetic programming to solve imbalanced classification problem. PROGRESS IN ARTIFICIAL INTELLIGENCE 2021. [DOI: 10.1007/s13748-021-00260-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
5
|
|
6
|
A methodology for customizing clinical tests for esophageal cancer based on patient preferences. Artif Intell Med 2018; 95:16-26. [PMID: 30279042 DOI: 10.1016/j.artmed.2018.08.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2016] [Revised: 05/02/2018] [Accepted: 08/02/2018] [Indexed: 12/20/2022]
Abstract
BACKGROUND Clinical tests for diagnosis of any disease may be expensive, uncomfortable, time consuming and can have side effects e.g. barium swallow test for esophageal cancer. Although we can predict non-existence of esophageal cancer with near 100% certainty just using demographics, lifestyle, medical history information, and a few basic clinical tests but our objective is to devise a general methodology for customizing tests with user preferences to avoid expensive or uncomfortable tests. METHOD We propose to use classifiers trained from electronic medical records (EMR) for selection of tests. The key idea is to design classifiers with 100% false normal rates, possibly at the cost of higher false abnormal. We find kernel logistic regression to be most suitable for the task. We propose an algorithm for finding the best probability threshold for kernel LR, based on test set accuracy tuning with help of a validation data set. Using the proposed algorithm, we describe schemes for selecting tests, which appear as features in the automatic classification algorithm, using preferences on costs and discomfort of the users i.e the proposed method is able to detect almost all true patients in the population even with user preferred clinical tests. RESULT We test our methodology with EMRs collected for more than 3000 patients, as a part of project carried out by a reputed hospital in Mumbai, India. We found that kernel SVM and kernel LR with a polynomial kernel of degree 3, yields an accuracy of 99.18% and sensitivity 100% using only demographic, lifestyle, patient history, and basic clinical tests. We demonstrate our test selection algorithm using two case studies, one using cost of clinical tests, and other using "discomfort" values for clinical tests. We compute the test sets corresponding to the lowest false abnormals for each criterion described above, using exhaustive enumeration of 12 and 15 clinical tests respectively. The sets turn out to be different, substantiating our claim that one can customize test sets based on user preferences.
Collapse
|
7
|
Bidi N, Elberrichi Z. Best Features Selection for Biomedical Data Classification Using Seven Spot Ladybird Optimization Algorithm. INTERNATIONAL JOURNAL OF APPLIED METAHEURISTIC COMPUTING 2018. [DOI: 10.4018/ijamc.2018070104] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
This article presents a new adaptive algorithm called FS-SLOA (Feature Selection-Seven Spot Ladybird Optimization Algorithm) which is a meta-heuristic feature selection method based on the foraging behavior of a seven spot ladybird. The new efficient technique has been applied to find the best subset features, which achieves the highest accuracy in classification using three classifiers: the Naive Bayes (NB), the Nearest Neighbors (KNN) and the Support Vector Machine (SVM). The authors' proposed approach has been experimented on four well-known benchmark datasets (Wisconsin Breast cancer, Pima Diabetes, Mammographic Mass, and Dermatology datasets) taken from the UCI machine learning repository. Experimental results prove that the classification accuracy of FS-SLOA is the best performing for different datasets.
Collapse
Affiliation(s)
- Noria Bidi
- Department of Science and Technology, University Mustapha Stamboli, Mascara, Algeria
| | - Zakaria Elberrichi
- Department of Computer Science, University Djillali Liabes, Sidi Bel Abbes, Algeria
| |
Collapse
|
8
|
Luque A, Gómez-Bellido J, Carrasco A, Barbancho J. Optimal Representation of Anuran Call Spectrum in Environmental Monitoring Systems Using Wireless Sensor Networks. SENSORS 2018; 18:s18061803. [PMID: 29865290 PMCID: PMC6022039 DOI: 10.3390/s18061803] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/14/2018] [Revised: 05/31/2018] [Accepted: 06/01/2018] [Indexed: 02/05/2023]
Abstract
The analysis and classification of the sounds produced by certain animal species, notably anurans, have revealed these amphibians to be a potentially strong indicator of temperature fluctuations and therefore of the existence of climate change. Environmental monitoring systems using Wireless Sensor Networks are therefore of interest to obtain indicators of global warming. For the automatic classification of the sounds recorded on such systems, the proper representation of the sound spectrum is essential since it contains the information required for cataloguing anuran calls. The present paper focuses on this process of feature extraction by exploring three alternatives: the standardized MPEG-7, the Filter Bank Energy (FBE), and the Mel Frequency Cepstral Coefficients (MFCC). Moreover, various values for every option in the extraction of spectrum features have been considered. Throughout the paper, it is shown that representing the frame spectrum with pure FBE offers slightly worse results than using the MPEG-7 features. This performance can easily be increased, however, by rescaling the FBE in a double dimension: vertically, by taking the logarithm of the energies; and, horizontally, by applying mel scaling in the filter banks. On the other hand, representing the spectrum in the cepstral domain, as in MFCC, has shown additional marginal improvements in classification performance.
Collapse
Affiliation(s)
- Amalia Luque
- Ingeniería del Diseño, Escuela Politécnica Superior, Universidad de Sevilla, 41004 Sevilla, Spain.
| | - Jesús Gómez-Bellido
- Ingeniería del Diseño, Escuela Politécnica Superior, Universidad de Sevilla, 41004 Sevilla, Spain.
| | - Alejandro Carrasco
- Tecnología Electrónica, Escuela Ingeniería Informática, Universidad de Sevilla, 41004 Sevilla, Spain.
| | - Julio Barbancho
- Tecnología Electrónica, Escuela Politécnica Superior, Universidad de Sevilla, 41004 Sevilla, Spain.
| |
Collapse
|
9
|
Luque A, Romero-Lemos J, Carrasco A, Gonzalez-Abril L. Temporally-aware algorithms for the classification of anuran sounds. PeerJ 2018; 6:e4732. [PMID: 29740517 PMCID: PMC5937479 DOI: 10.7717/peerj.4732] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2017] [Accepted: 04/18/2018] [Indexed: 11/20/2022] Open
Abstract
Several authors have shown that the sounds of anurans can be used as an indicator of climate change. Hence, the recording, storage and further processing of a huge number of anuran sounds, distributed over time and space, are required in order to obtain this indicator. Furthermore, it is desirable to have algorithms and tools for the automatic classification of the different classes of sounds. In this paper, six classification methods are proposed, all based on the data-mining domain, which strive to take advantage of the temporal character of the sounds. The definition and comparison of these classification methods is undertaken using several approaches. The main conclusions of this paper are that: (i) the sliding window method attained the best results in the experiments presented, and even outperformed the hidden Markov models usually employed in similar applications; (ii) noteworthy overall classification performance has been obtained, which is an especially striking result considering that the sounds analysed were affected by a highly noisy background; (iii) the instance selection for the determination of the sounds in the training dataset offers better results than cross-validation techniques; and (iv) the temporally-aware classifiers have revealed that they can obtain better performance than their non-temporally-aware counterparts.
Collapse
Affiliation(s)
- Amalia Luque
- Departamento de Ingeniería del Diseño, Universidad de Sevilla, Sevilla, Spain
| | - Javier Romero-Lemos
- Departamento de Ingeniería del Diseño, Universidad de Sevilla, Sevilla, Spain
| | - Alejandro Carrasco
- Departamento de Tecnología Electrónica, Universidad de Sevilla, Sevilla, Spain
| | | |
Collapse
|
10
|
Gonzalez-Abril L, Angulo C, Nuñez H, Leal Y. Handling binary classification problems with a priority class by using Support Vector Machines. Appl Soft Comput 2017. [DOI: 10.1016/j.asoc.2017.08.023] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
11
|
Nekooeimehr I, Lai-Yuen SK. Cluster-based Weighted Oversampling for Ordinal Regression (CWOS-Ord). Neurocomputing 2016. [DOI: 10.1016/j.neucom.2016.08.071] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
12
|
|
13
|
|
14
|
Dai HL. Imbalanced Protein Data Classification Using Ensemble FTM-SVM. IEEE Trans Nanobioscience 2015; 14:350-359. [DOI: 10.1109/tnb.2015.2431292] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|