1
|
Chhillar I, Singh A. A feature engineering-based machine learning technique to detect and classify lung and colon cancer from histopathological images. Med Biol Eng Comput 2024; 62:913-924. [PMID: 38091162 DOI: 10.1007/s11517-023-02984-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 11/29/2023] [Indexed: 02/22/2024]
Abstract
Globally, lung and colon cancers are among the most prevalent and lethal tumors. Early cancer identification is essential to increase the likelihood of survival. Histopathological images are considered an appropriate tool for diagnosing cancer, which is tedious and error-prone if done manually. Recently, machine learning methods based on feature engineering have gained prominence in automatic histopathological image classification. Furthermore, these methods are more interpretable than deep learning, which operates in a "black box" manner. In the medical profession, the interpretability of a technique is critical to gaining the trust of end users to adopt it. In view of the above, this work aims to create an accurate and interpretable machine-learning technique for the automated classification of lung and colon cancers from histopathology images. In the proposed approach, following the preprocessing steps, texture and color features are retrieved by utilizing the Haralick and Color histogram feature extraction algorithms, respectively. The obtained features are concatenated to form a single feature set. The three feature sets (texture, color, and combined features) are passed into the Light Gradient Boosting Machine (LightGBM) classifier for classification. And their performance is evaluated on the LC25000 dataset using hold-out and stratified 10-fold cross-validation (Stratified 10-FCV) techniques. With a test/hold-out set, the LightGBM with texture, color, and combined features classifies the lung and colon cancer images with 97.72%, 99.92%, and 100% accuracy respectively. In addition, a stratified 10-fold cross-validation method also revealed that LightGBM's combined or color features performed well, with an excellent mean auc_mu score and a low mean multi_logloss value. Thus, this proposed technique can help histologists detect and classify lung and colon histopathology images more efficiently, effectively, and economically, resulting in more productivity.
Collapse
Affiliation(s)
- Indu Chhillar
- Department of Computer Science and Engineering, Deenbandhu Chhotu Ram University of Science and Technology, Murthal, Haryana, India.
| | - Ajmer Singh
- Department of Computer Science and Engineering, Deenbandhu Chhotu Ram University of Science and Technology, Murthal, Haryana, India
| |
Collapse
|
2
|
Khouy M, Jabrane Y, Ameur M, Hajjam El Hassani A. Medical Image Segmentation Using Automatic Optimized U-Net Architecture Based on Genetic Algorithm. J Pers Med 2023; 13:1298. [PMID: 37763066 PMCID: PMC10533074 DOI: 10.3390/jpm13091298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 07/29/2023] [Accepted: 08/07/2023] [Indexed: 09/29/2023] Open
Abstract
Image segmentation is a crucial aspect of clinical decision making in medicine, and as such, it has greatly enhanced the sustainability of medical care. Consequently, biomedical image segmentation has become a prominent research area in the field of computer vision. With the advent of deep learning, many manual design-based methods have been proposed and have shown promising results in achieving state-of-the-art performance in biomedical image segmentation. However, these methods often require significant expert knowledge and have an enormous number of parameters, necessitating substantial computational resources. Thus, this paper proposes a new approach called GA-UNet, which employs genetic algorithms to automatically design a U-shape convolution neural network with good performance while minimizing the complexity of its architecture-based parameters, thereby addressing the above challenges. The proposed GA-UNet is evaluated on three datasets: lung image segmentation, cell nuclei segmentation in microscope images (DSB 2018), and liver image segmentation. Interestingly, our experimental results demonstrate that the proposed method achieves competitive performance with a smaller architecture and fewer parameters than the original U-Net model. It achieves an accuracy of 98.78% for lung image segmentation, 95.96% for cell nuclei segmentation in microscope images (DSB 2018), and 98.58% for liver image segmentation by using merely 0.24%, 0.48%, and 0.67% of the number of parameters in the original U-Net architecture for the lung image segmentation dataset, the DSB 2018 dataset, and the liver image segmentation dataset, respectively. This reduction in complexity makes our proposed approach, GA-UNet, a more viable option for deployment in resource-limited environments or real-world implementations that demand more efficient and faster inference times.
Collapse
Affiliation(s)
- Mohammed Khouy
- MSC Laboratory, Cadi Ayyad University, Marrakech 40000, Morocco; (M.K.); (Y.J.); (M.A.)
| | - Younes Jabrane
- MSC Laboratory, Cadi Ayyad University, Marrakech 40000, Morocco; (M.K.); (Y.J.); (M.A.)
| | - Mustapha Ameur
- MSC Laboratory, Cadi Ayyad University, Marrakech 40000, Morocco; (M.K.); (Y.J.); (M.A.)
| | - Amir Hajjam El Hassani
- Nanomedicine Imagery & Therapeutics Laboratory, EA4662—Bourgogne-Franche-Comté University, University of Technologie of Belfort Montbéliard, CEDEX, 90010 Belfort, France
| |
Collapse
|
3
|
B N, V I. Enhanced machine learning based feature subset through FFS enabled classification for cervical cancer diagnosis. KES 2022. [DOI: 10.3233/kes-220009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
A dataset that has massive features and imbalanced classes may be challenging for obtaining adequate accuracy in classification approaches of Machine Learning (ML). The purpose of this research is to find the optimal feature subset for cervical cancer diagnosis with efficient classification approach by estimating the performance of various Machine Learning predictive models. Filter-based feature selection techniques of Relief and Information Gain are applied in this study to calculate the rank for each feature that can be applied to order and select highest scoring features for feature selection. An optimal feature subset is generated with wrapper approach through Recursive Feature Elimination which uses a Random Forest procedure and Genetic Algorithm has been employed based on evolutionary principle. The predictive models are established with 10fold cross validation using prevalent classification algorithms like Random Forest, C5.0, K-Nearest Neighbour and Naïve Bayes. The results showed an enhancement in the average performance of these classifiers concurrently and the classification error for these classifiers decreases substantially. The experiments also exhibited that by employing this approach an optimal and reduced feature subset is desirable for the enrichment of classification accuracy with a lower computational cost. The features generated by fused approach of Relief and Genetic algorithm methods were able to predict the results in an efficient manner, hence an optimal feature subset has been nominated through this procedure. Maximum number of classifiers have shown good results in terms of performance outcomes. In addition, Random Forest method has shown advanced accuracy rate with an improved percentage of sensitivity and specificity results. Also, this work established that the best and optimal feature subset selection through Fused Feature Selection (FFS) approach could reduce the complexity of the predictive model.
Collapse
Affiliation(s)
- Nithya B
- New Horizon College of Engineering, Bengaluru, India
| | - Ilango V
- Department of MCA, CMR Institute of Technology, Bengaluru, India
| |
Collapse
|
4
|
Aswiga RV, Shanthi AP. A Multilevel Transfer Learning Technique and LSTM Framework for Generating Medical Captions for Limited CT and DBT Images. J Digit Imaging 2022; 35:564-580. [PMID: 35217942 PMCID: PMC9156604 DOI: 10.1007/s10278-021-00567-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 09/24/2021] [Accepted: 12/06/2021] [Indexed: 12/15/2022] Open
Abstract
Medical image captioning has been recently attracting the attention of the medical community. Also, generating captions for images involving multiple organs is an even more challenging task. Therefore, any attempt toward such medical image captioning becomes the need of the hour. In recent years, the rapid developments in deep learning approaches have made them an effective option for the analysis of medical images and automatic report generation. But analyzing medical images that are scarce and limited is hard, and it is difficult even with machine learning approaches. The concept of transfer learning can be employed in such applications that suffer from insufficient training data. This paper presents an approach to develop a medical image captioning model based on a deep recurrent architecture that combines Multi Level Transfer Learning (MLTL) framework with a Long Short-Term-Memory (LSTM) model. A basic MLTL framework with three models is designed to detect and classify very limited datasets, using the knowledge acquired from easily available datasets. The first model for the source domain uses the abundantly available non-medical images and learns the generalized features. The acquired knowledge is then transferred to the second model for the intermediate and auxiliary domain, which is related to the target domain. This information is then used for the final target domain, which consists of medical datasets that are very limited in nature. Therefore, the knowledge learned from a non-medical source domain is transferred to improve the learning in the target domain that deals with medical images. Then, a novel LSTM model, which is used for sequence generation and machine translation, is proposed to generate captions for the given medical image from the MLTL framework. To improve the captioning of the target sentence further, an enhanced multi-input Convolutional Neural Network (CNN) model along with feature extraction techniques is proposed. This enhanced multi-input CNN model extracts the most important features of an image that help in generating a more precise and detailed caption of the medical image. Experimental results show that the proposed model performs well with an accuracy of 96.90%, with BLEU score of 76.9%, even with very limited datasets, when compared to the work reported in literature.
Collapse
Affiliation(s)
- R. V. Aswiga
- Department of Computer Science & Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Tamil Nadu, Chennai, 601103 India
| | - A. P. Shanthi
- Department of Computer Science & Engineering, College of Engineering, Guindy (CEG), Anna University, Tamil Nadu, Chennai, 600025 India
| |
Collapse
|
5
|
Bhatia A, Chug A, Singh AP, Singh D. Fractional mega trend diffusion function-based feature extraction for plant disease prediction. INT J MACH LEARN CYB. [DOI: 10.1007/s13042-022-01562-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
6
|
Abstract
Online portfolio selection (OLPS) is a procedure for allocating portfolio assets using only past information to maximize an expected return. There have been successful mean reversion strategies that have achieved large excess returns on the traditional OLPS benchmark datasets. We propose a genetic mean reversion strategy that evolves a population of portfolio vectors using a hybrid genetic algorithm. Each vector represents the proportion of the portfolio assets, and our strategy chooses the best vector in terms of the expected returns on every trading day. To test our strategy, we used the price information of the S&P 500 constituents from 2000 to 2017 and compared various strategies for online portfolio selection. Our hybrid genetic framework successfully evolved the portfolio vectors; therefore, our strategy outperformed the other strategies when explicit or implicit transaction costs were incurred.
Collapse
|
7
|
Punitha S, Stephan T, Gandomi AH. A Novel Breast Cancer Diagnosis Scheme With Intelligent Feature and Parameter Selections. Comput Methods Programs Biomed 2022; 214:106432. [PMID: 34844767 DOI: 10.1016/j.cmpb.2021.106432] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Accepted: 09/15/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND AND OBJECTIVE Breast cancer is the most commonly occurring cancer among women, which contributes to the global death rate. The key to increasing the survival rate of affected patients is early diagnosis along with appropriate treatments. Manual methods for breast cancer diagnosis fail due to human errors, inaccurate diagnoses, and are time-consuming when demands are high. Intelligent systems based on Artificial Neural Network (ANN) for automated breast cancer diagnosis are powerful due to their strong decision-making capabilities in complicated cases. Artificial Bee Colony, Artificial Immune System, and Bacterial Foraging Optimization are swarm intelligence algorithms that solve combinatorial optimization problems. This paper proposes two novel hybrid Artificial Bee Colony (ABC) optimization algorithms that overcome the demerits of standard ABC algorithms. First, this paper proposes a hybrid ABC approach called HABC, in which the standard ABC optimization is hybridized with a modified clonal selection algorithm of the Artificial Immune System that eliminates the poor exploration capabilities of standard ABC optimization. Further, this paper proposes a novel hybrid Artificial Bee Colony (Hybrid ABC) optimization where the strong explorative capabilities of the chemotaxis phase of the bacterial foraging optimization are integrated with a spiral model-based exploitative phase of the ABC by which the proposed Hybrid ABC overcomes the demerits of poor exploration and exploitation of the standard ABC algorithm. METHODS In this work, the two proposed hybrid approaches were used in concurrent feature selection and parameter optimization of an ANN model. The proposed algorithm is implemented using various back-propagation algorithms, including resilient back-propagation (HABC-RP and Hybrid ABC-RP), Levenberg Marquart (HABC-LM and Hybrid ABC-LM), and momentum-based gradient descent (HABC-MGD and Hybrid ABC-GD) for parameter tuning of ANN. The Wisconsin breast cancer dataset was used to evaluate the performance of the proposed algorithms in terms of accuracy, complexity, and computational time. RESULTS The mean accuracy of the proposed HABC-RP was 99.14% and 99.54% for Hybrid ABC which is better than the results found in the existing literature. HABC-RP attained a sensitivity of 98.32%, a specificity of 99.63%, and a precision of 99.38% whereas Hybrid ABC attained sensitivity of 99.08% and Specificity of 99.81%. CONCLUSIONS HABC-RP and Hybrid ABC-RP yielded high accuracy with a low complexity ANN structure compared to other variants. After evaluation, interestingly it is found that the Hybrid ABC-RP has achieved the highest mean accuracy of 99.54% with low complexity of 10.25 mean connections when compared to other variants proposed in this paper. It can be concluded that the concurrent selection of input features and tuning of parameters of ANN plays a vital role in increasing the accuracy of a breast cancer diagnosis. The proposed HABC-RP and Hybrid ABC-RP showed better results when compared to the existing breast cancer diagnosis systems taken for comparison. In the future, the proposed two-hybrid approaches can be used to generate optimal thresholds for the segmentation of tumors in abnormal images. HABC and Hybrid ABC can be used for tuning the parameters of various classifiers.
Collapse
Affiliation(s)
- S Punitha
- Department of Computer Science Engineering, Karunya Insitute of Technology and Sciences, Tamilnadu, India
| | - Thompson Stephan
- Department of Computer Science Engineering, Faculty of Engineering and Technology, M. S. Ramaiah University of Applied Sciences, Bengaluru, India
| | - Amir H Gandomi
- Faculty of Engineering and Information Technology, University of Technology Sydney, Australia.
| |
Collapse
|
8
|
R V A, R A, A P S. Augmenting Transfer Learning with Feature Extraction Techniques for Limited Breast Imaging Datasets. J Digit Imaging 2021; 34:618-629. [PMID: 33973065 DOI: 10.1007/s10278-021-00456-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 02/25/2021] [Accepted: 04/27/2021] [Indexed: 11/24/2022] Open
Abstract
Computer aided detection (CADe) and computer aided diagnostic (CADx) systems are ongoing research areas for identifying lesions among complex inner structures with different pixel intensities, and for medical image classification. There are several techniques available for breast cancer detection and diagnosis using CADe and CADx systems. However, some of these systems are not accurate enough or suffer from lack of sufficient data. For example, mammography is the most commonly used breast cancer detection technique, and there are several CADe and CADx systems based on mammography, because of the huge dataset that is publicly available. But, the number of cancers escaping detection with mammography is substantial, particularly in dense-breasted women. On the other hand, digital breast tomosynthesis (DBT) is a new imaging technique, which alleviates the limitations of the mammography technique. However, the collections of huge amounts of the DBT images are difficult as it is not publicly available. In such cases, the concept of transfer learning can be employed. The knowledge learned from a trained source domain task, whose dataset is readily available, is transferred to improve the learning in the target domain task, whose dataset may be scarce. In this paper, a two-level framework is developed for the classification of the DBT datasets. A basic multilevel transfer learning (MLTL) based framework is proposed to use the knowledge learned from general non-medical image datasets and the mammography dataset, to train and classify the target DBT dataset. A feature extraction based transfer learning (FETL) framework is proposed to further improve the classification performance of the MLTL based framework. The FETL framework looks at three different feature extraction techniques to augment the MLTL based framework performance. The area under receiver operating characteristic (ROC) curve of value 0.89 is obtained, with just 2.08% of the source domain (non-medical) dataset, 5.09% of the intermediate domain (mammography) dataset, and 3.94% of the target domain (DBT) dataset, when compared to the dataset reported in literature.
Collapse
Affiliation(s)
- Aswiga R V
- Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Chennai, Tamil Nadu, India.
| | - Aishwarya R
- Department of Computer Science & Engineering, Anna University, Chennai-600025, Tamil Nadu, India
| | - Shanthi A P
- Department of Computer Science & Engineering, Anna University, Chennai-600025, Tamil Nadu, India
| |
Collapse
|
9
|
Stephan P, Stephan T, Kannan R, Abraham A. A hybrid artificial bee colony with whale optimization algorithm for improved breast cancer diagnosis. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-05997-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
10
|
Dash R. An Adaptive Harmony Search Approach for Gene Selection and Classification of High Dimensional Medical Data. Journal of King Saud University - Computer and Information Sciences 2021; 33:195-207. [DOI: 10.1016/j.jksuci.2018.02.013] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
11
|
Jude Hemanth D, Anitha J. Modified Genetic Algorithm approaches for classification of abnormal Magnetic Resonance Brain tumour images. Appl Soft Comput 2019. [DOI: 10.1016/j.asoc.2018.10.054] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
12
|
Zhang K, Liu X, Jiang J, Li W, Wang S, Liu L, Zhou X, Wang L. Prediction of postoperative complications of pediatric cataract patients using data mining. J Transl Med 2019; 17:2. [PMID: 30602368 PMCID: PMC6317183 DOI: 10.1186/s12967-018-1758-2] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Accepted: 12/21/2018] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND The common treatment for pediatric cataracts is to replace the cloudy lens with an artificial one. However, patients may suffer complications (severe lens proliferation into the visual axis and abnormal high intraocular pressure; SLPVA and AHIP) within 1 year after surgery and factors causing these complications are unknown. METHODS Apriori algorithm is employed to find association rules related to complications. We use random forest (RF) and Naïve Bayesian (NB) to predict the complications with datasets preprocessed by SMOTE (synthetic minority oversampling technique). Genetic feature selection is exploited to find real features related to complications. RESULTS Average classification accuracies in three binary classification problems are over 75%. Second, the relationship between the classification performance and the number of random forest tree is studied. Results show except for gender and age at surgery (AS); other attributes are related to complications. Except for the secondary IOL placement, operation mode, AS and area of cataracts; other attributes are related to SLPVA. Except for the gender, operation mode, and laterality; other attributes are related to the AHIP. Next, the association rules related to the complications are mined out. Then additional 50 data were used to test the performance of RF and NB, both of then obtained the accuracies of over 65% for three classification problems. Finally, we developed a webserver to assist doctors. CONCLUSIONS The postoperative complications of pediatric cataracts patients can be predicted. Then the factors related to the complications are found. Finally, the association rules that is about the complications can provide reference to doctors.
Collapse
Affiliation(s)
- Kai Zhang
- School of Computer Science and Technology, Xidian University, No.2 South Taibai Rd, Xi'an, 710071, China.,State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, 510060, China
| | - Xiyang Liu
- School of Computer Science and Technology, Xidian University, No.2 South Taibai Rd, Xi'an, 710071, China. .,Institute of Software Engineering, Xidian University, Xi'an, 710071, China. .,School of Software, Xidian University, Xi'an, 710071, China.
| | - Jiewei Jiang
- School of Computer Science and Technology, Xidian University, No.2 South Taibai Rd, Xi'an, 710071, China.,State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, 510060, China
| | - Wangting Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, 510060, China
| | - Shuai Wang
- School of Software, Xidian University, Xi'an, 710071, China
| | - Lin Liu
- School of Computer Science and Technology, Xidian University, No.2 South Taibai Rd, Xi'an, 710071, China
| | - Xiaojing Zhou
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Liming Wang
- School of Computer Science and Technology, Xidian University, No.2 South Taibai Rd, Xi'an, 710071, China.,Institute of Software Engineering, Xidian University, Xi'an, 710071, China.,School of Software, Xidian University, Xi'an, 710071, China
| |
Collapse
|
13
|
Öztürk Ş, Akdemir B. Application of Feature Extraction and Classification Methods for Histopathological Image using GLCM, LBP, LBGLCM, GLRLM and SFTA. ACTA ACUST UNITED AC 2018. [DOI: 10.1016/j.procs.2018.05.057] [Citation(s) in RCA: 63] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|