1
|
Zhang L, Gao S, Yuan Q, Fu Y, Yang R. An ensemble learning method combined with multiple feature representation strategies to predict lncRNA subcellular localizations. Comput Biol Chem 2025; 115:108336. [PMID: 39752849 DOI: 10.1016/j.compbiolchem.2024.108336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2024] [Revised: 10/26/2024] [Accepted: 12/25/2024] [Indexed: 02/26/2025]
Abstract
Long non-coding RNAs (lncRNAs) are strongly associated with cellular physiological mechanisms and implicated in the numerous diseases. By exploring the subcellular localizations of lncRNAs, we can not only gain crucial insights into the molecular mechanisms of lncRNA-related biological processes but also make valuable contributions towards the diagnosis, prevention, and treatment of various human diseases. However, conventional experimental techniques tend to be laborious and time-intensive. In this context, computational methods are in increased demand. The focus of this paper is the development of an innovative ensemble method that incorporates hybrid features to accurately predict the subcellular localizations of lncRNAs. To address the issue of incomplete reflection of inherent correlation with the intended target using singular source features, the utilization of heterogeneous multi-source features is implemented by introducing information on sequence composition, physicochemical properties, and structure. To address the issue of the imbalance classes in the benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE) is employed. Finally, the resulting predictor termed lncSLPre is developed by integrating the outputs of the individual classifiers. Experimental findings suggest that the complementarity of multi-source heterogeneous features improves prediction performance. Additionally, it is demonstrated that the application of SMOTE is effective in mitigating the issue of the imbalanced dataset, while the feature selection approach is critical in eliminating extraneous and redundant features. Compared with existing advanced methods, lncSLPre achieves better performance with an overall accuracy improvement of 13.13%, 2.15%, and 3.23%, respectively, indicating that lncSLPre can effectively predict lncRNA subcellular localizations.
Collapse
Affiliation(s)
- Lina Zhang
- School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, 264209, China.
| | - Sizan Gao
- School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, 264209, China.
| | - Qinghao Yuan
- School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, 264209, China.
| | - Yao Fu
- School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, 264209, China.
| | - Runtao Yang
- School of Mechanical, Electrical and Information Engineering, Shandong University at Weihai, 264209, China.
| |
Collapse
|
2
|
Yang J, Wang H, Liu P, Lu Y, Yao M, Yan H. Prediction of hypertension risk based on multiple feature fusion. J Biomed Inform 2024; 157:104701. [PMID: 39047932 DOI: 10.1016/j.jbi.2024.104701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Revised: 07/12/2024] [Accepted: 07/20/2024] [Indexed: 07/27/2024]
Abstract
OBJECTIVE In the application of machine learning to the prediction of hypertension, many factors have seriously affected the classification accuracy and generalization performance. We propose a pulse wave classification model based on multi-feature fusion for accuracy prediction of hypertension. METHODS AND MATERIALS We propose an ensemble under-sampling model with dynamic weights to decrease the influence of class imbalance on classification, further to automatically classify of hypertension on inquiry diagnosis. We also build a deep learning model based on hybrid attention mechanism, which transforms pulse waves to feature maps for extraction of in-depth features, so as to automatically classify hypertension on pulse diagnosis. We build the multi-feature fusion model based on dynamic Dempster/Shafer (DS) theory combining inquiry diagnosis and pulse diagnosis to enhance fault tolerance of prediction for multiple classifiers. In addition, this study calculates feature importance ranking of scale features on inquiry diagnosis and temporal and frequency-domain features on pulse diagnosis. RESULTS The accuracy, sensitivity, specificity, F1-score and G-mean after 5-fold cross-validation were 94.08%, 93.43%, 96.86%, 93.45% and 95.12%, respectively, based on the hypertensive samples of 409 cases from Longhua Hospital affiliated to Shanghai University of Traditional Chinese Medicine and Hospital of Integrated Traditional Chinese and Western Medicine. We find the key factors influencing hypertensive classification accuracy, so as to assist in the prevention and clinical diagnosis of hypertension. CONCLUSION Compared with the state-of-the-art models, the multi-feature fusion model effectively utilizes the patients' correlated multimodal features, and has higher classification accuracy and generalization performance.
Collapse
Affiliation(s)
- Jingdong Yang
- Autonomous Robot Lab, School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China.
| | - Han Wang
- Autonomous Robot Lab, School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Peng Liu
- Autonomous Robot Lab, School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Yuhang Lu
- Autonomous Robot Lab, School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Minghui Yao
- Department of Traditional Chinese Medicine Diagnosis, Basic Medical College, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China
| | - Haixia Yan
- Department of Traditional Chinese Medicine Diagnosis, Basic Medical College, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China.
| |
Collapse
|
3
|
Fatlawi HK, Kiss A. An Elastic Self-Adjusting Technique for Rare-Class Synthetic Oversampling Based on Cluster Distortion Minimization in Data Stream. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23042061. [PMID: 36850659 PMCID: PMC9963940 DOI: 10.3390/s23042061] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 02/08/2023] [Accepted: 02/10/2023] [Indexed: 06/12/2023]
Abstract
Adaptive machine learning has increasing importance due to its ability to classify a data stream and handle the changes in the data distribution. Various resources, such as wearable sensors and medical devices, can generate a data stream with an imbalanced distribution of classes. Many popular oversampling techniques have been designed for imbalanced batch data rather than a continuous stream. This work proposes a self-adjusting window to improve the adaptive classification of an imbalanced data stream based on minimizing cluster distortion. It includes two models; the first chooses only the previous data instances that preserve the coherence of the current chunk's samples. The second model relaxes the strict filter by excluding the examples of the last chunk. Both models include generating synthetic points for oversampling rather than the actual data points. The evaluation of the proposed models using the Siena EEG dataset showed their ability to improve the performance of several adaptive classifiers. The best results have been obtained using Adaptive Random Forest in which Sensitivity reached 96.83% and Precision reached 99.96%.
Collapse
Affiliation(s)
- Hayder K. Fatlawi
- Department of Information Systems, ELTE Eötvös Loránd University, 1117 Budapest, Hungary
- Center of Information Technology Research and Development, University of Kufa, Najaf 540011, Iraq
| | - Attila Kiss
- Department of Information Systems, ELTE Eötvös Loránd University, 1117 Budapest, Hungary
- Department of Informatics, J. Selye University, 94501 Komárno, Slovakia
| |
Collapse
|
4
|
Yin M, Li J, Li H. A
CNN
approach based on correlation metrics to chemical process fault classifications with limited labelled data. CAN J CHEM ENG 2023. [DOI: 10.1002/cjce.24749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Affiliation(s)
- Min Yin
- College of Information Science and Technology Beijing University of Chemical Technology Beijing China
| | - Jince Li
- College of Information Science and Technology Beijing University of Chemical Technology Beijing China
| | - Hongguang Li
- College of Information Science and Technology Beijing University of Chemical Technology Beijing China
| |
Collapse
|
5
|
Dai Q, Liu J, Yang J. Multi‐armed bandit heterogeneous ensemble learning for imbalanced data. Comput Intell 2022. [DOI: 10.1111/coin.12566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Qi Dai
- Department of Automation, College of Information Science and Engineering Beijing China
| | - Jian‐wei Liu
- Department of Automation, College of Information Science and Engineering Beijing China
| | - Jiapeng Yang
- College of Science North China University of Science and Technology Tangshan China
| |
Collapse
|
6
|
Han M, Zhang X, Chen Z, Wu H, Li M. Dynamic ensemble selection classification algorithm based on window over imbalanced drift data stream. Knowl Inf Syst 2022. [DOI: 10.1007/s10115-022-01791-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
7
|
Fatlawi HK, Kiss A. Similarity-Based Adaptive Window for Improving Classification of Epileptic Seizures with Imbalance EEG Data Stream. ENTROPY (BASEL, SWITZERLAND) 2022; 24:e24111641. [PMID: 36421496 PMCID: PMC9689083 DOI: 10.3390/e24111641] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 11/06/2022] [Accepted: 11/08/2022] [Indexed: 06/12/2023]
Abstract
Data stream mining techniques have recently received increasing research interest, especially in medical data classification. An unbalanced representation of the classification's targets in these data is a common challenge because classification techniques are biased toward the major class. Many methods have attempted to address this problem but have been exaggeratedly biased toward the minor class. In this work, we propose a method for balancing the presence of the minor class within the current window of the data stream while preserving the data's original majority as much as possible. The proposed method utilized similarity analysis for selecting specific instances from the previous window. This group of minor-class was then added to the current window's instances. Implementing the proposed method using the Siena dataset showed promising results compared to the Skew ensemble method and some other research methods.
Collapse
Affiliation(s)
- Hayder K. Fatlawi
- Department of Information Systems, ELTE Eötvös Loránd University, 1117 Budapest, Hungary
- Center of Information Technology Research and Development, University of Kufa, Najaf 540011, Iraq
| | - Attila Kiss
- Department of Information Systems, ELTE Eötvös Loránd University, 1117 Budapest, Hungary
- Department of Informatics, J. Selye University, 94501 Komárno, Slovakia
| |
Collapse
|
8
|
Klikowski J, Woźniak M. Deterministic Sampling Classifier with weighted Bagging for drifted imbalanced data stream classification. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.108855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
9
|
ROSE: robust online self-adjusting ensemble for continual learning on imbalanced drifting data streams. Mach Learn 2022. [DOI: 10.1007/s10994-022-06168-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
10
|
Han M, Li X, Wang L, Zhang N, Cheng H. Review of ensemble classification over data streams based on supervised and semi-supervised. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-211101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Most data stream ensemble classification algorithms use supervised learning. This method needs to use a large number of labeled data to train the classifier, and the cost of obtaining labeled data is very high. Therefore, the semi supervised learning algorithm using labeled data and unlabeled data to train the classifier becomes more and more popular. This article is the first to review data stream ensemble classification methods from the perspectives of supervised learning and semi-supervised learning. Firstly, basic classifiers such as decision trees, neural networks, and support vector machines are introduced from the perspective of supervised learning and semi-supervised learning. Secondly, the key technologies in data stream ensemble classification are explained from the two aspects of incremental and online. Finally, the majority voting and weight voting are explained in the ensemble strategies. The different ensemble methods are summarized and the classic algorithms are quantitatively analyzed. Further research directions are given, including the handling of concept drift under supervised and semi-supervised learning, the study of homogeneous ensemble and heterogeneous ensemble, and the classification of data stream ensemble under unsupervised learning.
Collapse
Affiliation(s)
- Meng Han
- School of Computer Science and Engineering, North Minzu University, Yinchuan, China
| | - Xiaojuan Li
- School of Computer Science and Engineering, North Minzu University, Yinchuan, China
| | - Le Wang
- School of Computer Science and Engineering, North Minzu University, Yinchuan, China
| | - Ni Zhang
- School of Computer Science and Engineering, North Minzu University, Yinchuan, China
| | - Haodong Cheng
- School of Computer Science and Engineering, North Minzu University, Yinchuan, China
| |
Collapse
|
11
|
Hong B, Ma X, Tang W, Shen Z. Recognition of Air Passengers' Willingness to Pay for Seat Selection for Imbalanced Data Based on Improved XGBoost. INTERNATIONAL JOURNAL OF COGNITIVE INFORMATICS AND NATURAL INTELLIGENCE 2022. [DOI: 10.4018/ijcini.312249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Passenger-paid seat selection is one of the important sources of ancillary revenue for airlines, and machine learning-based willingness-to-pay identification is of great practicality for airlines to accurately tap potential willing passengers. However, affected by periodic statistical errors, air passenger order data often has some problems such as high noise, high latitude, and unbalanced category. In view of this, this paper proposes a method for identifying air passengers' willingness to pay for seat selection based on improved XGBoost, which is improved and integrated from three stages: data, feature, and algorithm. The feasibility of the proposed multi-stage improved integration method is verified by real airline passenger dataset, and the experimental results show that the proposed improved method has better classification effect when compared with the classical six imbalance classification models, which provides a basis for accurate marketing of airline paid seat selection programs.
Collapse
|