1
|
Fili M, Hu G, Han C, Kort A, Trettin J, Haim H. A classification algorithm based on dynamic ensemble selection to predict mutational patterns of the envelope protein in HIV-infected patients. Algorithms Mol Biol 2023; 18:4. [PMID: 37337202 DOI: 10.1186/s13015-023-00228-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 06/04/2023] [Indexed: 06/21/2023] Open
Abstract
BACKGROUND Therapeutics against the envelope (Env) proteins of human immunodeficiency virus type 1 (HIV-1) effectively reduce viral loads in patients. However, due to mutations, new therapy-resistant Env variants frequently emerge. The sites of mutations on Env that appear in each patient are considered random and unpredictable. Here we developed an algorithm to estimate for each patient the mutational state of each position based on the mutational state of adjacent positions on the three-dimensional structure of the protein. METHODS We developed a dynamic ensemble selection algorithm designated k-best classifiers. It identifies the best classifiers within the neighborhood of a new observation and applies them to predict the variability state of each observation. To evaluate the algorithm, we applied amino acid sequences of Envs from 300 HIV-1-infected individuals (at least six sequences per patient). For each patient, amino acid variability values at all Env positions were mapped onto the three-dimensional structure of the protein. Then, the variability state of each position was estimated by the variability at adjacent positions of the protein. RESULTS The proposed algorithm showed higher performance than the base learner and a panel of classification algorithms. The mutational state of positions in the high-mannose patch and CD4-binding site of Env, which are targeted by multiple therapeutics, was predicted well. Importantly, the algorithm outperformed other classification techniques for predicting the variability state at multi-position footprints of therapeutics on Env. CONCLUSIONS The proposed algorithm applies a dynamic classifier-scoring approach that increases its performance relative to other classification methods. Better understanding of the spatiotemporal patterns of variability across Env may lead to new treatment strategies that are tailored to the unique mutational patterns of each patient. More generally, we propose the algorithm as a new high-performance dynamic ensemble selection technique.
Collapse
Affiliation(s)
- Mohammad Fili
- Department of Industrial and Manufacturing Systems Engineering, Iowa State University, 3014 Black Engineering, 2529 Union Drive, Ames, IA, 50011, USA
| | - Guiping Hu
- Department of Industrial and Manufacturing Systems Engineering, Iowa State University, 3014 Black Engineering, 2529 Union Drive, Ames, IA, 50011, USA.
| | - Changze Han
- Department of Microbiology and Immunology, Carver College of Medicine, University of Iowa, 51 Newton Rd, 3-770 BSB, Iowa City, IA, 52242, USA
| | - Alexa Kort
- Department of Microbiology and Immunology, Carver College of Medicine, University of Iowa, 51 Newton Rd, 3-770 BSB, Iowa City, IA, 52242, USA
| | - John Trettin
- Department of Industrial and Manufacturing Systems Engineering, Iowa State University, 3014 Black Engineering, 2529 Union Drive, Ames, IA, 50011, USA
| | - Hillel Haim
- Department of Microbiology and Immunology, Carver College of Medicine, University of Iowa, 51 Newton Rd, 3-770 BSB, Iowa City, IA, 52242, USA.
| |
Collapse
|
2
|
Wu J, Shen J, Xu M, Shao M. A novel combined dynamic ensemble selection model for imbalanced data to detect COVID-19 from complete blood count. Comput Methods Programs Biomed 2021; 211:106444. [PMID: 34614451 PMCID: PMC8479386 DOI: 10.1016/j.cmpb.2021.106444] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2021] [Accepted: 09/22/2021] [Indexed: 06/01/2023]
Abstract
BACKGROUND As blood testing is radiation-free, low-cost and simple to operate, some researchers use machine learning to detect COVID-19 from blood test data. However, few studies take into consideration the imbalanced data distribution, which can impair the performance of a classifier. METHOD A novel combined dynamic ensemble selection (DES) method is proposed for imbalanced data to detect COVID-19 from complete blood count. This method combines data preprocessing and improved DES. Firstly, we use the hybrid synthetic minority over-sampling technique and edited nearest neighbor (SMOTE-ENN) to balance data and remove noise. Secondly, in order to improve the performance of DES, a novel hybrid multiple clustering and bagging classifier generation (HMCBCG) method is proposed to reinforce the diversity and local regional competence of candidate classifiers. RESULTS The experimental results based on three popular DES methods show that the performance of HMCBCG is better than only use bagging. HMCBCG+KNE obtains the best performance for COVID-19 screening with 99.81% accuracy, 99.86% F1, 99.78% G-mean and 99.81% AUC. CONCLUSION Compared to other advanced methods, our combined DES model can improve accuracy, G-mean, F1 and AUC of COVID-19 screening.
Collapse
Affiliation(s)
- Jiachao Wu
- College of Management and Economics, Tianjin University, Tianjin, 300072, China
| | - Jiang Shen
- College of Management and Economics, Tianjin University, Tianjin, 300072, China
| | - Man Xu
- Business School, Nankai University, Tianjin, 300071, China
| | - Minglai Shao
- School of New Media and Communication, Tianjin University, Tianjin, 300072, China.
| |
Collapse
|
3
|
Zulfira FZ, Suyanto S, Septiarini A. Segmentation technique and dynamic ensemble selection to enhance glaucoma severity detection. Comput Biol Med 2021; 139:104951. [PMID: 34678479 DOI: 10.1016/j.compbiomed.2021.104951] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Revised: 10/14/2021] [Accepted: 10/14/2021] [Indexed: 10/20/2022]
Abstract
The severity of glaucoma can be observed by categorising glaucoma diseases into several classes based on a classification process. The two most suitable parameters, cup-to-disc ratio (CDR) and peripapillary atrophy (PPA), which are commonly used to identify glaucoma are utilized in this study to strengthen the classification. First, an active contour snake (ACS) is employed to retrieve both optic disc (OD) and optic cup (OC) values, which are required to calculate the CDR. Moreover, Otsu segmentation and thresholding techniques are used to identify PPA, and the features are then extracted using a grey-level co-occurrence matrix (GLCM). An advanced segmentation technique, combined with an improved classifier called dynamic ensemble selection (DES), is proposed to classify glaucoma. Because DES is generally used to handle an imbalanced dataset, the proposed model is expected to detect glaucoma severity and determine the subsequent treatment accurately. The proposed model obtains a higher mean accuracy (0.96) than the deep learning-based U-Net (0.90) when evaluated using three datasets of 250 retinal fundus images (200 training, 50 testings) based on the 5-fold cross-validation scheme.
Collapse
Affiliation(s)
| | | | - Anindita Septiarini
- Department of Informatics, Faculty of Engineering, Mulawarman University, Samarinda, Indonesia.
| |
Collapse
|
4
|
Alves Ribeiro VH, Moritz S, Rehbach F, Reynoso-Meza G. A novel dynamic multi-criteria ensemble selection mechanism applied to drinking water quality anomaly detection. Sci Total Environ 2020; 749:142368. [PMID: 33370917 DOI: 10.1016/j.scitotenv.2020.142368] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/13/2020] [Revised: 09/08/2020] [Accepted: 09/11/2020] [Indexed: 06/12/2023]
Abstract
The provision of clean and safe drinking water is a crucial task for water supply companies from all over the world. To this end, automatic anomaly detection plays a critical role in drinking water quality monitoring. Recent anomaly detection studies use techniques that focus on a single global objective. Yet, companies need solutions that better balance the trade-off between false positives (FPs), which lead to financial losses to water companies, and false negatives (FNs), which severely impact public health and damage the environment. This work proposes a novel dynamic multi-criteria ensemble selection mechanism to cope with both problems simultaneously: the non-dominated local class-specific accuracy (NLCA). Moreover, experiments rely on recent time series related classification metrics to assess the predictive performance. Results on data from a real-world water distribution system show that NLCA outperforms other ensemble learning and dynamic ensemble selection techniques by more than 15% in terms of time series related F1 scores. As a conclusion, NLCA enables the development of stronger anomaly detection systems for drinking water quality monitoring. The proposed technique also offers a new perspective on dynamic ensemble selection, which can be applied to different classification tasks to balance conflicting criteria.
Collapse
Affiliation(s)
- Victor Henrique Alves Ribeiro
- Programa de Pós-Graduação em Engenharia de Produção e Sistemas (PPGEPS), Pontifícia Universidade Católica do Paraná (PUCPR), Rua Imaculada Conceição, 1155, 80215-901 Curitiba, PR, Brazil.
| | - Steffen Moritz
- Institute of Data Science, Engineering, and Analytics, TH Köln, Campus Gummersbach, Steinmüllerallee 1, 51643 Gummersbach, Germany.
| | - Frederik Rehbach
- Institute of Data Science, Engineering, and Analytics, TH Köln, Campus Gummersbach, Steinmüllerallee 1, 51643 Gummersbach, Germany.
| | - Gilberto Reynoso-Meza
- Programa de Pós-Graduação em Engenharia de Produção e Sistemas (PPGEPS), Pontifícia Universidade Católica do Paraná (PUCPR), Rua Imaculada Conceição, 1155, 80215-901 Curitiba, PR, Brazil.
| |
Collapse
|
5
|
García-Cano E, Arámbula Cosío F, Duong L, Bellefleur C, Roy-Beaudry M, Joncas J, Parent S, Labelle H. Dynamic ensemble selection of learner-descriptor classifiers to assess curve types in adolescent idiopathic scoliosis. Med Biol Eng Comput 2018; 56:2221-2231. [PMID: 29949021 DOI: 10.1007/s11517-018-1853-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2016] [Accepted: 05/27/2018] [Indexed: 12/01/2022]
Abstract
While classification is important for assessing adolescent idiopathic scoliosis (AIS), it however suffers from low interobserver and intraobserver reliability. Classification using ensemble methods may contribute to improving reliability using the proper 2D and 3D images of spine curvature features. In this study, we present two new techniques to describe the spine, namely, leave-one-out and fan leave-one-out. Using these techniques, three descriptors are computed from a stereoradiographic 3D reconstruction to describe the relationship between a vertebra and its neighbors. A dynamic ensemble selection method is introduced for automatic spine classification. The performance of the method is evaluated on a dataset containing 962 3D spine models categorized according to three curve types. With a log loss of 0.5623, the dynamic ensemble selection outperforms voting and stacking ensemble learning techniques. This method can improve intraobserver and interobserver reliability, identify the best combination of descriptors for characterizing spine curve types, and provide assistance to clinicians in the form of information to classify borderline curvature types. Graphical abstract ᅟ.
Collapse
Affiliation(s)
- Edgar García-Cano
- École de technologie supérieure, 1100 Notre-Dame Street West, Montreal, Quebec, H3C 1K3, Canada.
| | - Fernando Arámbula Cosío
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Circuito Escolar S/N. Ciudad Universitaria, C.P. 04510, Mexico, D.F., Mexico
| | - Luc Duong
- École de technologie supérieure, 1100 Notre-Dame Street West, Montreal, Quebec, H3C 1K3, Canada
| | - Christian Bellefleur
- Research Center, Sainte-Justine Hospital, 3175 Côte-Sainte-Catherine, Montreal, Quebec, Canada
| | - Marjolaine Roy-Beaudry
- Research Center, Sainte-Justine Hospital, 3175 Côte-Sainte-Catherine, Montreal, Quebec, Canada
| | - Julie Joncas
- Research Center, Sainte-Justine Hospital, 3175 Côte-Sainte-Catherine, Montreal, Quebec, Canada
| | - Stefan Parent
- Research Center, Sainte-Justine Hospital, 3175 Côte-Sainte-Catherine, Montreal, Quebec, Canada
| | - Hubert Labelle
- Research Center, Sainte-Justine Hospital, 3175 Côte-Sainte-Catherine, Montreal, Quebec, Canada
| |
Collapse
|