1
|
Han X, Cao M, He J, Xu D, Liang Y, Lang X, Guan R. A comprehensive psychological tendency prediction model for pregnant women based on questionnaires. Sci Rep 2023; 13:2. [PMID: 36593288 PMCID: PMC9807629 DOI: 10.1038/s41598-022-26977-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Accepted: 12/22/2022] [Indexed: 01/04/2023] Open
Abstract
More and more people are under high pressure in modern society, leading to growing mental disorders, such as antenatal depression for pregnant women. Antenatal depression can affect pregnant woman's physical and psychological health and child outcomes, and cause postpartum depression. Therefore, it is essential to detect the antenatal depression of pregnant women early. This study aims to predict pregnant women's antenatal depression and identify factors that may lead to antenatal depression. First, a questionnaire was designed, based on the daily life of pregnant women. The survey was conducted on pregnant women in a hospital, where 5666 pregnant women participated. As the collected data is unbalanced and has high dimensions, we developed a one-class classifier named Stacked Auto Encoder Support Vector Data Description (SAE-SVDD) to distinguish depressed pregnant women from normal ones. To validate the method, SAE-SVDD was firstly applied on three benchmark datasets. The results showed that SAE-SVDD was effective, with its F-scores better than other popular classifiers. For the antenatal depression problem, the F-score of SAE- SVDD was higher than 0.87, demonstrating that the questionnaire is informative and the classification method is successful. Then, by an improved Term Frequency-Inverse Document Frequency (TF-IDF) analysis, the critical factors of antenatal depression were identified as work stress, marital status, husband support, passive smoking, and alcohol consumption. With its generalizability, SAE-SVDD can be applied to analyze other questionnaires.
Collapse
Affiliation(s)
- Xiaosong Han
- grid.64924.3d0000 0004 1760 5735Key Laboratory for Symbol Computation and Knowledge Engineering of National Education Ministry, College of Computer Science and Technology, Jilin University, Changchun, 130012 China
| | - Mengchen Cao
- grid.64924.3d0000 0004 1760 5735Key Laboratory for Symbol Computation and Knowledge Engineering of National Education Ministry, College of Computer Science and Technology, Jilin University, Changchun, 130012 China
| | - Junru He
- grid.64924.3d0000 0004 1760 5735Key Laboratory for Symbol Computation and Knowledge Engineering of National Education Ministry, College of Computer Science and Technology, Jilin University, Changchun, 130012 China
| | - Dong Xu
- grid.134936.a0000 0001 2162 3504Department of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211 USA
| | - Yanchun Liang
- grid.64924.3d0000 0004 1760 5735Key Laboratory for Symbol Computation and Knowledge Engineering of National Education Ministry, College of Computer Science and Technology, Jilin University, Changchun, 130012 China ,Zhuhai Laboratory of Key Laboratory for Symbol Computation and Knowledge Engineering of Ministry of Education, Zhuhai College of Science and Technology, Zhuhai, 519041 China
| | - Xiaoduo Lang
- Jilin Provincial Institute of Population Science and Technology, Changchun, 130000 China
| | - Renchu Guan
- grid.64924.3d0000 0004 1760 5735Key Laboratory for Symbol Computation and Knowledge Engineering of National Education Ministry, College of Computer Science and Technology, Jilin University, Changchun, 130012 China
| |
Collapse
|
2
|
Towards hybrid over- and under-sampling combination methods for class imbalanced datasets: an experimental study. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10186-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
3
|
Efficient SVDD sampling with approximation guarantees for the decision boundary. Mach Learn 2022. [DOI: 10.1007/s10994-022-06149-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
AbstractSupport Vector Data Description (SVDD) is a popular one-class classifier for anomaly and novelty detection. But despite its effectiveness, SVDD does not scale well with data size. To avoid prohibitive training times, sampling methods select small subsets of the training data on which SVDD trains a decision boundary hopefully equivalent to the one obtained on the full data set. According to the literature, a good sample should therefore contain so-called boundary observations that SVDD would select as support vectors on the full data set. However, non-boundary observations also are essential to not fragment contiguous inlier regions and avoid poor classification accuracy. Other aspects, such as selecting a sufficiently representative sample, are important as well. But existing sampling methods largely overlook them, resulting in poor classification accuracy. In this article, we study how to select a sample considering these points. Our approach is to frame SVDD sampling as an optimization problem, where constraints guarantee that sampling indeed approximates the original decision boundary. We then propose RAPID, an efficient algorithm to solve this optimization problem. RAPID does not require any tuning of parameters, is easy to implement and scales well to large data sets. We evaluate our approach on real-world and synthetic data. Our evaluation is the most comprehensive one for SVDD sampling so far. Our results show that RAPID outperforms its competitors in classification accuracy, in sample size, and in runtime.
Collapse
|
4
|
Deng M, Guo Y, Wang C, Wu F. An oversampling method for multi-class imbalanced data based on composite weights. PLoS One 2021; 16:e0259227. [PMID: 34767567 PMCID: PMC8589211 DOI: 10.1371/journal.pone.0259227] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 10/18/2021] [Indexed: 11/23/2022] Open
Abstract
To solve the oversampling problem of multi-class small samples and to improve their classification accuracy, we develop an oversampling method based on classification ranking and weight setting. The designed oversampling algorithm sorts the data within each class of dataset according to the distance from original data to the hyperplane. Furthermore, iterative sampling is performed within the class and inter-class sampling is adopted at the boundaries of adjacent classes according to the sampling weight composed of data density and data sorting. Finally, information assignment is performed on all newly generated sampling data. The training and testing experiments of the algorithm are conducted by using the UCI imbalanced datasets, and the established composite metrics are used to evaluate the performance of the proposed algorithm and other algorithms in comprehensive evaluation method. The results show that the proposed algorithm makes the multi-class imbalanced data balanced in terms of quantity, and the newly generated data maintain the distribution characteristics and information properties of the original samples. Moreover, compared with other algorithms such as SMOTE and SVMOM, the proposed algorithm has reached a higher classification accuracy of about 90%. It is concluded that this algorithm has high practicability and general characteristics for imbalanced multi-class samples.
Collapse
Affiliation(s)
- Mingyang Deng
- School of Automobile, Chang’an University, Xi’an, China
- College of Automobile Engineering, College of Humanities and Information Changchun University of Technology, Changchun, China
| | - Yingshi Guo
- School of Automobile, Chang’an University, Xi’an, China
- * E-mail:
| | - Chang Wang
- School of Automobile, Chang’an University, Xi’an, China
| | - Fuwei Wu
- School of Automobile, Chang’an University, Xi’an, China
| |
Collapse
|
5
|
Gong C, Su ZG, Wang PH, Wang Q, You Y. Evidential instance selection for K-nearest neighbor classification of big data. Int J Approx Reason 2021. [DOI: 10.1016/j.ijar.2021.08.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
6
|
Pascual-Triana JD, Charte D, Andrés Arroyo M, Fernández A, Herrera F. Revisiting data complexity metrics based on morphology for overlap and imbalance: snapshot, new overlap number of balls metrics and singular problems prospect. Knowl Inf Syst 2021. [DOI: 10.1007/s10115-021-01577-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
7
|
Wang Z, Tsai CF, Lin WC. Data cleaning issues in class imbalanced datasets: instance selection and missing values imputation for one-class classifiers. DATA TECHNOLOGIES AND APPLICATIONS 2021. [DOI: 10.1108/dta-01-2021-0027] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
PurposeClass imbalance learning, which exists in many domain problem datasets, is an important research topic in data mining and machine learning. One-class classification techniques, which aim to identify anomalies as the minority class from the normal data as the majority class, are one representative solution for class imbalanced datasets. Since one-class classifiers are trained using only normal data to create a decision boundary for later anomaly detection, the quality of the training set, i.e. the majority class, is one key factor that affects the performance of one-class classifiers.Design/methodology/approachIn this paper, we focus on two data cleaning or preprocessing methods to address class imbalanced datasets. The first method examines whether performing instance selection to remove some noisy data from the majority class can improve the performance of one-class classifiers. The second method combines instance selection and missing value imputation, where the latter is used to handle incomplete datasets that contain missing values.FindingsThe experimental results are based on 44 class imbalanced datasets; three instance selection algorithms, including IB3, DROP3 and the GA, the CART decision tree for missing value imputation, and three one-class classifiers, which include OCSVM, IFOREST and LOF, show that if the instance selection algorithm is carefully chosen, performing this step could improve the quality of the training data, which makes one-class classifiers outperform the baselines without instance selection. Moreover, when class imbalanced datasets contain some missing values, combining missing value imputation and instance selection, regardless of which step is first performed, can maintain similar data quality as datasets without missing values.Originality/valueThe novelty of this paper is to investigate the effect of performing instance selection on the performance of one-class classifiers, which has never been done before. Moreover, this study is the first attempt to consider the scenario of missing values that exist in the training set for training one-class classifiers. In this case, performing missing value imputation and instance selection with different orders are compared.
Collapse
|
8
|
|
9
|
Gautam C, Tiwari A, Tanveer M. AEKOC+: Kernel Ridge Regression-Based Auto-Encoder for One-Class Classification Using Privileged Information. Cognit Comput 2020. [DOI: 10.1007/s12559-019-09705-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
10
|
Kordos M, Łapa K. Multi-Objective Evolutionary Instance Selection for Regression Tasks. ENTROPY 2018; 20:e20100746. [PMID: 33265835 PMCID: PMC7512309 DOI: 10.3390/e20100746] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2018] [Revised: 09/20/2018] [Accepted: 09/25/2018] [Indexed: 11/16/2022]
Abstract
The purpose of instance selection is to reduce the data size while preserving as much useful information stored in the data as possible and detecting and removing the erroneous and redundant information. In this work, we analyze instance selection in regression tasks and apply the NSGA-II multi-objective evolutionary algorithm to direct the search for the optimal subset of the training dataset and the k-NN algorithm for evaluating the solutions during the selection process. A key advantage of the method is obtaining a pool of solutions situated on the Pareto front, where each of them is the best for certain RMSE-compression balance. We discuss different parameters of the process and their influence on the results and put special efforts to reducing the computational complexity of our approach. The experimental evaluation proves that the proposed method achieves good performance in terms of minimization of prediction error and minimization of dataset size.
Collapse
Affiliation(s)
- Mirosław Kordos
- Department of Computer Science and Automatics, University of Bielsko-Biała, ul. Willowa 2, 43-309 Bielsko-Biała, Poland
- Correspondence:
| | - Krystian Łapa
- Institute of Computational Intelligence, Częstochowa University of Technology, 42-201 Częstochowa, Poland
| |
Collapse
|