1
|
Westphal M, Zapf A. Statistical inference for diagnostic test accuracy studies with multiple comparisons. Stat Methods Med Res 2024; 33:669-680. [PMID: 38490184 PMCID: PMC11025299 DOI: 10.1177/09622802241236933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2024]
Abstract
Diagnostic accuracy studies assess the sensitivity and specificity of a new index test in relation to an established comparator or the reference standard. The development and selection of the index test are usually assumed to be conducted prior to the accuracy study. In practice, this is often violated, for instance, if the choice of the (apparently) best biomarker, model or cutpoint is based on the same data that is used later for validation purposes. In this work, we investigate several multiple comparison procedures which provide family-wise error rate control for the emerging multiple testing problem. Due to the nature of the co-primary hypothesis problem, conventional approaches for multiplicity adjustment are too conservative for the specific problem and thus need to be adapted. In an extensive simulation study, five multiple comparison procedures are compared with regard to statistical error rates in least-favourable and realistic scenarios. This covers parametric and non-parametric methods and one Bayesian approach. All methods have been implemented in the new open-source R package cases which allows us to reproduce all simulation results. Based on our numerical results, we conclude that the parametric approaches (maxT and Bonferroni) are easy to apply but can have inflated type I error rates for small sample sizes. The two investigated Bootstrap procedures, in particular the so-called pairs Bootstrap, allow for a family-wise error rate control in finite samples and in addition have a competitive statistical power.
Collapse
Affiliation(s)
- Max Westphal
- Fraunhofer Institute for Digital Medicine MEVIS, Bremen, Germany
- The two authors contributed equally and are listed in alphabetical order
| | - Antonia Zapf
- Department of Medical Biometry and Epidemiology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
- The two authors contributed equally and are listed in alphabetical order
| |
Collapse
|
2
|
Kong X, Zhou M, Bian K, Lai W, Hu F, Dai R, Yan J. Research on SPDTRS-PNN based intelligent assistant diagnosis for breast cancer. Sci Rep 2023; 13:4386. [PMID: 36928059 PMCID: PMC10020448 DOI: 10.1038/s41598-023-28316-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2022] [Accepted: 01/17/2023] [Indexed: 03/18/2023] Open
Abstract
Breast cancer is the second dangerous cancer in the world. Breast cancer data often contains more redundant information. Redundant information makes the breast cancer auxiliary diagnosis less accurate and time consuming. Dimension reduction algorithm combined with machine learning can solve these problems well. This paper proposes the single parameter decision theoretic rough set (SPDTRS) combined with the probability neural network (PNN) model for breast cancer diagnosis. We find that when the parameter value of SPDTRS is 2.5 and the SPREAD value is 0.75, the number of 30 attributes of the original breast cancer data dropped to 12, the accuracy of the SPDTRS-PNN model training set is 99.25%, the accuracy of the test set is 97.04%, and the test time is 0.093 s. The experimental results show that the SPDTRS-PNN model can improve the ac-curacy of breast cancer recognition, reduce the time required for diagnosis.
Collapse
Affiliation(s)
- Xixi Kong
- School of Electrical and Information Engineering, Anhui University of Science and Technology, Huainan, 232001, China.
| | - Mengran Zhou
- School of Electrical and Information Engineering, Anhui University of Science and Technology, Huainan, 232001, China
| | - Kai Bian
- School of Electrical and Information Engineering, Anhui University of Science and Technology, Huainan, 232001, China
| | - Wenhao Lai
- School of Electrical and Information Engineering, Anhui University of Science and Technology, Huainan, 232001, China
| | - Feng Hu
- School of Electrical and Information Engineering, Anhui University of Science and Technology, Huainan, 232001, China
| | - Rongying Dai
- School of Electrical and Information Engineering, Anhui University of Science and Technology, Huainan, 232001, China
| | - Jingjing Yan
- School of Electrical and Information Engineering, Anhui University of Science and Technology, Huainan, 232001, China
| |
Collapse
|
3
|
Ghaffar Nia N, Kaplanoglu E, Nasab A. Evaluation of artificial intelligence techniques in disease diagnosis and prediction. DISCOVER ARTIFICIAL INTELLIGENCE 2023. [PMCID: PMC9885935 DOI: 10.1007/s44163-023-00049-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
AbstractA broad range of medical diagnoses is based on analyzing disease images obtained through high-tech digital devices. The application of artificial intelligence (AI) in the assessment of medical images has led to accurate evaluations being performed automatically, which in turn has reduced the workload of physicians, decreased errors and times in diagnosis, and improved performance in the prediction and detection of various diseases. AI techniques based on medical image processing are an essential area of research that uses advanced computer algorithms for prediction, diagnosis, and treatment planning, leading to a remarkable impact on decision-making procedures. Machine Learning (ML) and Deep Learning (DL) as advanced AI techniques are two main subfields applied in the healthcare system to diagnose diseases, discover medication, and identify patient risk factors. The advancement of electronic medical records and big data technologies in recent years has accompanied the success of ML and DL algorithms. ML includes neural networks and fuzzy logic algorithms with various applications in automating forecasting and diagnosis processes. DL algorithm is an ML technique that does not rely on expert feature extraction, unlike classical neural network algorithms. DL algorithms with high-performance calculations give promising results in medical image analysis, such as fusion, segmentation, recording, and classification. Support Vector Machine (SVM) as an ML method and Convolutional Neural Network (CNN) as a DL method is usually the most widely used techniques for analyzing and diagnosing diseases. This review study aims to cover recent AI techniques in diagnosing and predicting numerous diseases such as cancers, heart, lung, skin, genetic, and neural disorders, which perform more precisely compared to specialists without human error. Also, AI's existing challenges and limitations in the medical area are discussed and highlighted.
Collapse
Affiliation(s)
- Nafiseh Ghaffar Nia
- College of Engineering and Computer Science, The University of Tennessee at Chattanooga, Chattanooga, TN 37403 USA
| | - Erkan Kaplanoglu
- College of Engineering and Computer Science, The University of Tennessee at Chattanooga, Chattanooga, TN 37403 USA
| | - Ahad Nasab
- College of Engineering and Computer Science, The University of Tennessee at Chattanooga, Chattanooga, TN 37403 USA
| |
Collapse
|
4
|
GAN-Based Approaches for Generating Structured Data in the Medical Domain. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12147075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Modern machine and deep learning methods require large datasets to achieve reliable and robust results. This requirement is often difficult to meet in the medical field, due to data sharing limitations imposed by privacy regulations or the presence of a small number of patients (e.g., rare diseases). To address this data scarcity and to improve the situation, novel generative models such as Generative Adversarial Networks (GANs) have been widely used to generate synthetic data that mimic real data by representing features that reflect health-related information without reference to real patients. In this paper, we consider several GAN models to generate synthetic data used for training binary (malignant/benign) classifiers, and compare their performances in terms of classification accuracy with cases where only real data are considered. We aim to investigate how synthetic data can improve classification accuracy, especially when a small amount of data is available. To this end, we have developed and implemented an evaluation framework where binary classifiers are trained on extended datasets containing both real and synthetic data. The results show improved accuracy for classifiers trained with generated data from more advanced GAN models, even when limited amounts of original data are available.
Collapse
|
5
|
Gupta SR. Prediction time of breast cancer tumor recurrence using Machine Learning. Cancer Treat Res Commun 2022; 32:100602. [PMID: 35797887 DOI: 10.1016/j.ctarc.2022.100602] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 06/01/2022] [Accepted: 06/28/2022] [Indexed: 06/15/2023]
Abstract
An in-depth study using the database from GLOBOCAN, CDC, and WHO health repository highlights the lethality of breast cancer, taking thousands of lives each year. However, a timely prediction of cancer can help patients to consult the doctor on time. In the past, various studies have successfully predicted the nature of the tumor to be benign or malignant and if the breast cancer tumor will reoccur or not but, no time-based models have been studied. With the help of Machine Learning, this study shows various prediction models that can be used to predict tumor reoccurrence time as accurately as 1 year. Among the 198 patients analyzed, 40% of the total patients were predicted to have breast cancer tumors reoccurring within 1st year of the diagnosis. The proposed machine learning techniques use various classification models such as Spectral clustering, DBSCAN, and k-means along with prediction models like Support Vector Machines (SVM), Decision trees, and Random Forest. The results demonstrate the ability of the model to predict the time taken by the tumor to reoccur or the time taken by the patient for full recovery with the best accuracy of 78.7% using SVM. This population-based study performed on multivariate real attributed characteristics data can therefore provide the patients a reasonable estimate about their recovery time or the time before which they should consult the doctor.
Collapse
Affiliation(s)
- Siddharth Raj Gupta
- Department of Mechanical Engineering, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA; Department of Electrical and Computer Engineering, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA.
| |
Collapse
|
6
|
Zeid MAE, El-Bahnasy K, Abo-Youssef SE. DeepBreast: Building Optimized Framework for Prognosis of Breast Cancer Classification Based on Computational Intelligence. 2022 2ND INTERNATIONAL MOBILE, INTELLIGENT, AND UBIQUITOUS COMPUTING CONFERENCE (MIUCC) 2022. [DOI: 10.1109/miucc55081.2022.9781677] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Affiliation(s)
- Magdy Abd-Elghany Zeid
- Obour High Institute for Management and Informatics,Computer Science department,Cairo,Egypt
| | - Khaled El-Bahnasy
- Obour High Institute for Management and Informatics,Computer Science department,Cairo,Egypt
| | - S. E. Abo-Youssef
- Al-Azhar University,Faculty of Science,Mathematics and Computer Science department,Cairo,Egypt
| |
Collapse
|
7
|
Kim Y, Kim J. Identification of New Clusters from Labeled Data Using Mixture Models. J Comput Biol 2022; 29:585-596. [PMID: 35384743 DOI: 10.1089/cmb.2021.0443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Nowadays attempts to segment classes or groups are often found in various fields. Especially, one of emerging issues in biological and medical areas is identification of new subtypes of biological samples or patients. For the identification, we often need to find new subtypes from known classes. In such cases, we usually use clustering techniques. However, usual clustering methods could mix up the labels of the known classes in clustering outcomes and it might lead to wrong interpretation for the identified clusters. Also, they do not use the information about known classes. Thus, this study proposes a Gaussian mixture model-based approach for identifying new clusters from known classes while it maintains them. The performance of the proposed model is verified through simulations and it is applied to a breast cancer data set.
Collapse
Affiliation(s)
- Yujung Kim
- Department of Statistics, Sungkyunkwan University, Seoul, South Korea
| | - Jaejik Kim
- Department of Statistics, Sungkyunkwan University, Seoul, South Korea
| |
Collapse
|
8
|
Shehab M, Abualigah L, Shambour Q, Abu-Hashem MA, Shambour MKY, Alsalibi AI, Gandomi AH. Machine learning in medical applications: A review of state-of-the-art methods. Comput Biol Med 2022; 145:105458. [PMID: 35364311 DOI: 10.1016/j.compbiomed.2022.105458] [Citation(s) in RCA: 86] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 03/23/2022] [Accepted: 03/24/2022] [Indexed: 12/11/2022]
Abstract
Applications of machine learning (ML) methods have been used extensively to solve various complex challenges in recent years in various application areas, such as medical, financial, environmental, marketing, security, and industrial applications. ML methods are characterized by their ability to examine many data and discover exciting relationships, provide interpretation, and identify patterns. ML can help enhance the reliability, performance, predictability, and accuracy of diagnostic systems for many diseases. This survey provides a comprehensive review of the use of ML in the medical field highlighting standard technologies and how they affect medical diagnosis. Five major medical applications are deeply discussed, focusing on adapting the ML models to solve the problems in cancer, medical chemistry, brain, medical imaging, and wearable sensors. Finally, this survey provides valuable references and guidance for researchers, practitioners, and decision-makers framing future research and development directions.
Collapse
Affiliation(s)
- Mohammad Shehab
- Information Technology, The World Islamic Sciences and Education University. Amman, Jordan.
| | - Laith Abualigah
- Faculty of Computer Sciences and Informatics, Amman Arab University, Amman, Jordan; School of Computer Sciences, Universiti Sains Malaysia, Pulau, Pinang, 11800, Malaysia.
| | - Qusai Shambour
- Department of Software Engineering, Al-Ahliyya Amman University, Amman, Jordan.
| | - Muhannad A Abu-Hashem
- Department of Geomatics, Faculty of Architecture and Planning, King Abdulaziz University, Jeddah, Saudi Arabia.
| | | | | | - Amir H Gandomi
- Faculty of Engineering and Information Technology, University of Technology Sydney, Ultimo, NSW, 2007, Australia.
| |
Collapse
|
9
|
Vecchi E, Pospíšil L, Albrecht S, O'Kane TJ, Horenko I. eSPA+: Scalable Entropy-Optimal Machine Learning Classification for Small Data Problems. Neural Comput 2022; 34:1220-1255. [PMID: 35344997 DOI: 10.1162/neco_a_01490] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 12/20/2021] [Indexed: 11/04/2022]
Abstract
Classification problems in the small data regime (with small data statistic T and relatively large feature space dimension D) impose challenges for the common machine learning (ML) and deep learning (DL) tools. The standard learning methods from these areas tend to show a lack of robustness when applied to data sets with significantly fewer data points than dimensions and quickly reach the overfitting bound, thus leading to poor performance beyond the training set. To tackle this issue, we propose eSPA+, a significant extension of the recently formulated entropy-optimal scalable probabilistic approximation algorithm (eSPA). Specifically, we propose to change the order of the optimization steps and replace the most computationally expensive subproblem of eSPA with its closed-form solution. We prove that with these two enhancements, eSPA+ moves from the polynomial to the linear class of complexity scaling algorithms. On several small data learning benchmarks, we show that the eSPA+ algorithm achieves a many-fold speed-up with respect to eSPA and even better performance results when compared to a wide array of ML and DL tools. In particular, we benchmark eSPA+ against the standard eSPA and the main classes of common learning algorithms in the small data regime: various forms of support vector machines, random forests, and long short-term memory algorithms. In all the considered applications, the common learning methods and eSPA are markedly outperformed by eSPA+, which achieves significantly higher prediction accuracy with an orders-of-magnitude lower computational cost.
Collapse
Affiliation(s)
- Edoardo Vecchi
- Universitá della Svizzera Italiana, Faculty of Informatics, TI-6900 Lugano, Switzerland
| | - Lukáš Pospíšil
- VSB Ostrava, Department of Mathematics, Ludvika Podeste 1875/17 708 33 Ostrava, Czech Republic
| | - Steffen Albrecht
- University Medical Center of the Johannes Gutenberg-Universität, Institute of Physiology, 55128 Mainz, Germany
| | | | - Illia Horenko
- Universitá della Svizzera Italiana, Faculty of Informatics, TI-6900 Lugano, Switzerland
| |
Collapse
|
10
|
Reid J, Parmar P, Lund T, Aalto DK, Jeffery CC. Development of a machine-learning based voice disorder screening tool. Am J Otolaryngol 2022; 43:103327. [PMID: 34923280 DOI: 10.1016/j.amjoto.2021.103327] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Accepted: 12/08/2021] [Indexed: 01/20/2023]
Abstract
OBJECTIVE Early recognition and referral are crucial for voice disorder management. Limited availability of subspecialists, poor primary care awareness, and the need for specialized equipment impede effective care. Thus, there is a need for a tool to improve voice pathology screening. Machine learning algorithms (MLAs) have shown promise in analyzing acoustic characteristics of phonation. However, few studies report clinical applications of MLAs for voice pathology detection. The objective of this study was to design and validate a MLA for detecting pathological voices. METHODS A MLA was developed for voice analysis. Audio samples converted into spectrograms were inputted into a pre-existing VGG19 convolutional neural network (CNN) and image-classifier. The resulting feature map was classified as either pathological or healthy using a Support Vector Machine (SVM) binary linear classifier. This combined MLA was "trained" with 950 sustained "/i/" vowel audio samples from the Saarbrucken Voice Database (SVD), which contains subjects with and without voice disorders. The trained MLA was "tested" with 406 SVD samples to determine sensitivity, specificity, and overall accuracy. External validation of the MLA was performed using clinical voice samples collected from patients attending a subspecialty voice clinic. RESULTS The MLA detected pathologies in SVD samples with 98.5% sensitivity, 97.1% specificity and 97.8% overall accuracy. In 30 samples obtained prospectively from voice clinic patients, the MLA detected pathologies with 100% sensitivity, 96.3% specificity and 96.7% overall accuracy. CONCLUSIONS This study demonstrates that a MLA using a simple audio input can detect diverse vocal pathologies with high sensitivity and specificity. Thus, this algorithm shows promise as a potential screening tool.
Collapse
Affiliation(s)
- Jonathan Reid
- Division of Otolaryngology-Head and Neck Surgery, Department of Surgery, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, AB, Canada
| | - Preet Parmar
- Department of Physics, Faculty of Science, University of Alberta, Edmonton, AB, Canada
| | - Tyler Lund
- Faculty of Engineering, University of Alberta, Edmonton, AB, Canada
| | - Daniel K Aalto
- Communication Sciences and Disorders, Faculty of Rehabilitation Medicine, University of Alberta, Edmonton, AB, Canada
| | - Caroline C Jeffery
- Division of Otolaryngology-Head and Neck Surgery, Department of Surgery, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, AB, Canada; Communication Sciences and Disorders, Faculty of Rehabilitation Medicine, University of Alberta, Edmonton, AB, Canada.
| |
Collapse
|
11
|
A. Rodrigues S, Huggins R, Liquet B. Central subspaces review: methods and applications. STATISTICS SURVEYS 2022. [DOI: 10.1214/22-ss138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Sabrina A. Rodrigues
- Department of Epidemiology and Biostatistics, School of Public Health Imperial College London, London UK
| | - Richard Huggins
- School of Mathematics and Statistics, The University of Melbourne, Australia
| | - Benoit Liquet
- Laboratoire de Mathématiques et de leurs Applications de Pau, Université de Pau et des Pays de l’Adour, Pau, France [2mm] School of Mathematical and Physical Sciences, Macquarie University, Sydney, New South Wales, Australia
| |
Collapse
|
12
|
Musa IH, Afolabi LO, Zamit I, Musa TH, Musa HH, Tassang A, Akintunde TY, Li W. Artificial Intelligence and Machine Learning in Cancer Research: A Systematic and Thematic Analysis of the Top 100 Cited Articles Indexed in Scopus Database. Cancer Control 2022; 29:10732748221095946. [PMID: 35688650 PMCID: PMC9189515 DOI: 10.1177/10732748221095946] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
INTRODUCTION Cancer is a major public health problem and a global leading cause of death where the screening, diagnosis, prediction, survival estimation, and treatment of cancer and control measures are still a major challenge. The rise of Artificial Intelligence (AI) and Machine Learning (ML) techniques and their applications in various fields have brought immense value in providing insights into advancement in support of cancer control. METHODS A systematic and thematic analysis was performed on the Scopus database to identify the top 100 cited articles in cancer research. Data were analyzed using RStudio and VOSviewer.Var1.6.6. RESULTS The top 100 articles in AI and ML in cancer received a 33 920 citation score with a range of 108 to 5758 times. Doi Kunio from the USA was the most cited author with total number of citations (TNC = 663). Out of 43 contributed countries, 30% of the top 100 cited articles originated from the USA, and 10% originated from China. Among the 57 peer-reviewed journals, the "Expert Systems with Application" published 8% of the total articles. The results were presented in highlight technological advancement through AI and ML via the widespread use of Artificial Neural Network (ANNs), Deep Learning or machine learning techniques, Mammography-based Model, Convolutional Neural Networks (SC-CNN), and text mining techniques in the prediction, diagnosis, and prevention of various types of cancers towards cancer control. CONCLUSIONS This bibliometric study provides detailed overview of the most cited empirical evidence in AI and ML adoption in cancer research that could efficiently help in designing future research. The innovations guarantee greater speed by using AI and ML in the detection and control of cancer to improve patient experience.
Collapse
Affiliation(s)
- Ibrahim H. Musa
- Department of Software Engineering, School of Computer Science and Engineering, Southeast University, Nanjing, China
- Key Laboratory of Computer Network and Information Integration, Ministry of Education, Southeast University, Nanjing, China
| | - Lukman O. Afolabi
- Guangdong Immune Cell Therapy Engineering and Technology Research Center, Center for Protein and Cell-Based Drugs, Institute of Biomedicine and Biotechnology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Ibrahim Zamit
- University of Chinese Academy of Sciences, Beijing, China
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong, China
| | - Taha H. Musa
- Biomedical Research Institute, Darfur University College, Nyala, South Darfur, Sudan
- Key Laboratory of Environmental Medicine Engineering, Ministry of Education, Department of Epidemiology and Health Statistics, School of Public Health, Southeast University, Nanjing, Jiangsu Province, China
| | - Hassan H. Musa
- Faculty of Medical Laboratory Sciences, University of Khartoum, Khartoum, Sudan
| | - Andrew Tassang
- Faculty of Health Sciences, University of Buea, Cameroon
- Buea Regional Hospital, Annex, Cameroon
| | - Tosin Y. Akintunde
- Department of Sociology, School of Public Administration, Hohai University, Nanjing, China
| | - Wei Li
- Department of quality management, Children’s hospital of Nanjing Medical University, Nanjing, China
| |
Collapse
|
13
|
Les T, Markiewicz T, Dziekiewicz M, Lorent M. Adaptive two-way sweeping method to 3D kidney reconstruction. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.102544] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
14
|
Choudhary T, Mishra V, Goswami A, Sarangapani J. A transfer learning with structured filter pruning approach for improved breast cancer classification on point-of-care devices. Comput Biol Med 2021; 134:104432. [PMID: 33964737 DOI: 10.1016/j.compbiomed.2021.104432] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2021] [Revised: 04/16/2021] [Accepted: 04/21/2021] [Indexed: 01/01/2023]
Abstract
BACKGROUND AND OBJECTIVE A significant progress has been made in automated medical diagnosis with the advent of deep learning methods in recent years. However, deploying a deep learning model for mobile and small-scale, low-cost devices is a major bottleneck. Further, breast cancer is more prevalent currently, and ductal carcinoma being its most common type. Although many machine/deep learning methods have already been investigated, still, there is a need for further improvement. METHOD This paper proposes a novel deep convolutional neural network (CNN) based transfer learning approach complemented with structured filter pruning for histopathological image classification, and to bring down the run-time resource requirement of the trained deep learning models. In the proposed method, first, the less important filters are pruned from the convolutional layers and then the pruned models are trained on the histopathological image dataset. RESULTS We performed extensive experiments using three popular pre-trained CNNs, VGG19, ResNet34, and ResNet50. With VGG19 pruned model, we achieved an accuracy of 91.25% outperforming earlier methods on the same dataset and architecture while reducing 63.46% FLOPs. Whereas, with the ResNet34 pruned model, the accuracy increases to 91.80% with 40.63% fewer FLOPs. Moreover, with the ResNet50 model, we achieved an accuracy of 92.07% with 30.97% less FLOPs. CONCLUSION The experimental results reveal that the pre-trained model's performance complemented with filter pruning exceeds original pre-trained models. Another important outcome of the research is that the pruned model with reduced resource requirements can be deployed in point-of-care devices for automated diagnosis applications with ease.
Collapse
Affiliation(s)
| | - Vipul Mishra
- Bennett University, Greater Noida, Uttar Pradesh, 201310, India.
| | - Anurag Goswami
- Bennett University, Greater Noida, Uttar Pradesh, 201310, India.
| | | |
Collapse
|
15
|
Assaf D, Rayman S, Segev L, Neuman Y, Zippel D, Goitein D. Improving pre-bariatric surgery diagnosis of hiatal hernia using machine learning models. MINIM INVASIV THER 2021; 31:760-767. [PMID: 33779469 DOI: 10.1080/13645706.2021.1901120] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
BACKGROUND Bariatric patients have a high prevalence of hiatal hernia (HH). HH imposes various difficulties in performing laparoscopic bariatric surgery. Preoperative evaluation is generally inaccurate, establishing the need for better preoperative assessment. OBJECTIVE To utilize machine learning ability to improve preoperative diagnosis of HH. METHODS Machine learning (ML) prediction models were utilized to predict preoperative HH diagnosis using data from a prospectively maintained database of bariatric procedures performed in a high-volume bariatric surgical center between 2012 and 2015. We utilized three optional ML models to improve preoperative contrast swallow study (SS) prediction, automatic feature selection was performed using patients' features. The prediction efficacy of the models was compared to SS. RESULTS During the study period, 2482 patients underwent bariatric surgery. All underwent preoperative SS, considered the baseline diagnostic modality, which identified 236 (9.5%) patients with presumed HH. Achieving 38.5% sensitivity and 92.9% specificity. ML models increased sensitivity up to 60.2%, creating three optional models utilizing data and patient selection process for this purpose. CONCLUSION Implementing machine learning derived prediction models enabled an increase of up to 1.5 times of the baseline diagnostic sensitivity. By harnessing this ability, we can improve traditional medical diagnosis, increasing the sensitivity of preoperative diagnostic workout.
Collapse
Affiliation(s)
- Dan Assaf
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.,Department of Surgery C, Chaim Sheba Medical Center, Tel Hashomer, Israel
| | - Shlomi Rayman
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.,Department of Surgery C, Chaim Sheba Medical Center, Tel Hashomer, Israel
| | - Lior Segev
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.,Department of Surgery C, Chaim Sheba Medical Center, Tel Hashomer, Israel
| | - Yair Neuman
- The Department of Cognitive and Brain Sciences, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Douglas Zippel
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.,Department of Surgery C, Chaim Sheba Medical Center, Tel Hashomer, Israel
| | - David Goitein
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.,Department of Surgery C, Chaim Sheba Medical Center, Tel Hashomer, Israel
| |
Collapse
|
16
|
Abstract
AbstractExplainable artificial intelligence is an emerging research direction helping the user or developer of machine learning models understand why models behave the way they do. The most popular explanation technique is feature importance. However, there are several different approaches how feature importances are being measured, most notably global and local. In this study we compare different feature importance measures using both linear (logistic regression with L1 penalization) and non-linear (random forest) methods and local interpretable model-agnostic explanations on top of them. These methods are applied to two datasets from the medical domain, the openly available breast cancer data from the UCI Archive and a recently collected running injury data. Our results show that the most important features differ depending on the technique. We argue that a combination of several explanation techniques could provide more reliable and trustworthy results. In particular, local explanations should be used in the most critical cases such as false negatives.
Collapse
|
17
|
Marazzi F, Tagliaferri L, Masiello V, Moschella F, Colloca GF, Corvari B, Sanchez AM, Capocchiano ND, Pastorino R, Iacomini C, Lenkowicz J, Masciocchi C, Patarnello S, Franceschini G, Gambacorta MA, Masetti R, Valentini V. GENERATOR Breast DataMart-The Novel Breast Cancer Data Discovery System for Research and Monitoring: Preliminary Results and Future Perspectives. J Pers Med 2021; 11:jpm11020065. [PMID: 33498985 PMCID: PMC7911086 DOI: 10.3390/jpm11020065] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 01/18/2021] [Accepted: 01/20/2021] [Indexed: 02/07/2023] Open
Abstract
Background: Artificial Intelligence (AI) is increasingly used for process management in daily life. In the medical field AI is becoming part of computerized systems to manage information and encourage the generation of evidence. Here we present the development of the application of AI to IT systems present in the hospital, for the creation of a DataMart for the management of clinical and research processes in the field of breast cancer. Materials and methods: A multidisciplinary team of radiation oncologists, epidemiologists, medical oncologists, breast surgeons, data scientists, and data management experts worked together to identify relevant data and sources located inside the hospital system. Combinations of open-source data science packages and industry solutions were used to design the target framework. To validate the DataMart directly on real-life cases, the working team defined tumoral pathology and clinical purposes of proof of concepts (PoCs). Results: Data were classified into “Not organized, not ‘ontologized’ data”, “Organized, not ‘ontologized’ data”, and “Organized and ‘ontologized’ data”. Archives of real-world data (RWD) identified were platform based on ontology, hospital data warehouse, PDF documents, and electronic reports. Data extraction was performed by direct connection with structured data or text-mining technology. Two PoCs were performed, by which waiting time interval for radiotherapy and performance index of breast unit were tested and resulted available. Conclusions: GENERATOR Breast DataMart was created for supporting breast cancer pathways of care. An AI-based process automatically extracts data from different sources and uses them for generating trend studies and clinical evidence. Further studies and more proof of concepts are needed to exploit all the potentials of this system.
Collapse
Affiliation(s)
- Fabio Marazzi
- Dipartimento di Diagnostica per Immagini, Radioterapia Oncologica ed Ematologia, UOC di Radioterapia Oncologica, Fondazione Policlinico Universitario “A. Gemelli” IRCCS, 00186 Rome, Italy; (F.M.); (L.T.); (G.F.C.); (B.C.); (M.A.G.); (V.V.)
| | - Luca Tagliaferri
- Dipartimento di Diagnostica per Immagini, Radioterapia Oncologica ed Ematologia, UOC di Radioterapia Oncologica, Fondazione Policlinico Universitario “A. Gemelli” IRCCS, 00186 Rome, Italy; (F.M.); (L.T.); (G.F.C.); (B.C.); (M.A.G.); (V.V.)
| | - Valeria Masiello
- Dipartimento di Diagnostica per Immagini, Radioterapia Oncologica ed Ematologia, UOC di Radioterapia Oncologica, Fondazione Policlinico Universitario “A. Gemelli” IRCCS, 00186 Rome, Italy; (F.M.); (L.T.); (G.F.C.); (B.C.); (M.A.G.); (V.V.)
- Correspondence:
| | - Francesca Moschella
- Dipartimento di Scienze della Salute della Donna e del Bambino e di Sanità Pubblica, UOC di Chirurgia Senologica, Fondazione Policlinico Universitario “A. Gemelli” IRCCS, 00186 Roma, Italy; (F.M.); (A.M.S.); (G.F.); (R.M.)
| | - Giuseppe Ferdinando Colloca
- Dipartimento di Diagnostica per Immagini, Radioterapia Oncologica ed Ematologia, UOC di Radioterapia Oncologica, Fondazione Policlinico Universitario “A. Gemelli” IRCCS, 00186 Rome, Italy; (F.M.); (L.T.); (G.F.C.); (B.C.); (M.A.G.); (V.V.)
| | - Barbara Corvari
- Dipartimento di Diagnostica per Immagini, Radioterapia Oncologica ed Ematologia, UOC di Radioterapia Oncologica, Fondazione Policlinico Universitario “A. Gemelli” IRCCS, 00186 Rome, Italy; (F.M.); (L.T.); (G.F.C.); (B.C.); (M.A.G.); (V.V.)
| | - Alejandro Martin Sanchez
- Dipartimento di Scienze della Salute della Donna e del Bambino e di Sanità Pubblica, UOC di Chirurgia Senologica, Fondazione Policlinico Universitario “A. Gemelli” IRCCS, 00186 Roma, Italy; (F.M.); (A.M.S.); (G.F.); (R.M.)
| | - Nikola Dino Capocchiano
- Istituto di Radiologia, Università Cattolica del Sacro Cuore, 00186 Rome, Italy; (N.D.C.); (J.L.)
| | - Roberta Pastorino
- Fondazione Policlinico Universitario “A. Gemelli” IRCCS, 00186 Roma, Italy; (R.P.); (C.I.); (C.M.); (S.P.)
| | - Chiara Iacomini
- Fondazione Policlinico Universitario “A. Gemelli” IRCCS, 00186 Roma, Italy; (R.P.); (C.I.); (C.M.); (S.P.)
| | - Jacopo Lenkowicz
- Istituto di Radiologia, Università Cattolica del Sacro Cuore, 00186 Rome, Italy; (N.D.C.); (J.L.)
| | - Carlotta Masciocchi
- Fondazione Policlinico Universitario “A. Gemelli” IRCCS, 00186 Roma, Italy; (R.P.); (C.I.); (C.M.); (S.P.)
| | - Stefano Patarnello
- Fondazione Policlinico Universitario “A. Gemelli” IRCCS, 00186 Roma, Italy; (R.P.); (C.I.); (C.M.); (S.P.)
| | - Gianluca Franceschini
- Dipartimento di Scienze della Salute della Donna e del Bambino e di Sanità Pubblica, UOC di Chirurgia Senologica, Fondazione Policlinico Universitario “A. Gemelli” IRCCS, 00186 Roma, Italy; (F.M.); (A.M.S.); (G.F.); (R.M.)
- Istituto di Semeiotica Chirurgica, Università Cattolica del Sacro Cuore, 00186 Rome, Italy
| | - Maria Antonietta Gambacorta
- Dipartimento di Diagnostica per Immagini, Radioterapia Oncologica ed Ematologia, UOC di Radioterapia Oncologica, Fondazione Policlinico Universitario “A. Gemelli” IRCCS, 00186 Rome, Italy; (F.M.); (L.T.); (G.F.C.); (B.C.); (M.A.G.); (V.V.)
- Istituto di Radiologia, Università Cattolica del Sacro Cuore, 00186 Rome, Italy; (N.D.C.); (J.L.)
| | - Riccardo Masetti
- Dipartimento di Scienze della Salute della Donna e del Bambino e di Sanità Pubblica, UOC di Chirurgia Senologica, Fondazione Policlinico Universitario “A. Gemelli” IRCCS, 00186 Roma, Italy; (F.M.); (A.M.S.); (G.F.); (R.M.)
- Istituto di Semeiotica Chirurgica, Università Cattolica del Sacro Cuore, 00186 Rome, Italy
| | - Vincenzo Valentini
- Dipartimento di Diagnostica per Immagini, Radioterapia Oncologica ed Ematologia, UOC di Radioterapia Oncologica, Fondazione Policlinico Universitario “A. Gemelli” IRCCS, 00186 Rome, Italy; (F.M.); (L.T.); (G.F.C.); (B.C.); (M.A.G.); (V.V.)
- Istituto di Radiologia, Università Cattolica del Sacro Cuore, 00186 Rome, Italy; (N.D.C.); (J.L.)
| |
Collapse
|
18
|
|
19
|
Yedjou CG, Tchounwou SS, Aló RA, Elhag R, Mochona B, Latinwo L. Application of Machine Learning Algorithms in Breast Cancer Diagnosis and Classification. INTERNATIONAL JOURNAL OF SCIENCE ACADEMIC RESEARCH 2021; 2:3081-3086. [PMID: 34825131 PMCID: PMC8612371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Breast cancer continues to be the most frequent cancer in females, affecting about one in 8 women and causing the highest number of cancer-related deaths in females worldwide despite remarkable progress in early diagnosis, screening, and patient management. All breast lesions are not malignant, and all the benign lesions do not progress to cancer. However, the accuracy of diagnosis can be increased by a combination or preoperative tests such as physical examination, mammography, fine-needle aspiration cytology, and core needle biopsy. Despite some limitations, these procedures are more accurate, reliable, and acceptable, when compared with a single adopted diagnostic procedure. Recent studies have shown that breast cancer can be accurately predicted and diagnosed using machine learning (ML) technology. The objective of this study was to explore the application of ML approaches to classify breast cancer based on feature values generated from a digitized image of a fine-needle aspiration (FNA) of a breast mass. To achieve this objective, we used ML algorithms, collected a scientific dataset of 569 breast cancer patients from Kaggle (https://www.kaggle.com/uciml/breast-cancer-wisconsin-data), analyze and interpreted the data based on ten real-valued features of a breast mass FNA including the radius, texture, perimeter, area, smoothness, compactness, concavity, concave points, symmetry, and fractal dimension. Among the 569 patients tested, 63% were diagnosed with benign breast cancer and 37% were diagnosed with malignant breast cancer. Benign tumors grow slowly and do not spread while malignant tumors grow rapidly and spread to other parts of the body.
Collapse
Affiliation(s)
- Clement G Yedjou
- Department of Biological Sciences, College of Science and Technology, Florida Agricultural and Mechanical University, 1610 S. Martin Luther King Blvd, Tallahassee, FL 32307, United States
| | - Solange S Tchounwou
- Department of Pathology and Laboratory Medicine. School of Medicine, Tulane University, 1430 Tulane Avenue, New Orleans, LA, 70112, United States
| | - Richard A Aló
- Department of Computer and Information Science, College of Science and Technology, Florida Agricultural & Mechanical University, 1610 S. Martin Luther King Blvd, Tallahassee, FL 3230, United States
| | - Rashid Elhag
- Department of Biological Sciences, College of Science and Technology, Florida Agricultural and Mechanical University, 1610 S. Martin Luther King Blvd, Tallahassee, FL 32307, United States
| | - BereKet Mochona
- Department of Chemistry, College of Science and Technology, Florida Agricultural and Mechanical University, 1610 S. Martin Luther King Blvd, Tallahassee, FL 32307, United States
| | - Lekan Latinwo
- Department of Biological Sciences, College of Science and Technology, Florida Agricultural and Mechanical University, 1610 S. Martin Luther King Blvd, Tallahassee, FL 32307, United States
| |
Collapse
|
20
|
Singh R, Ahmed T, Kumar A, Singh AK, Pandey AK, Singh SK. Imbalanced Breast Cancer Classification Using Transfer Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:83-93. [PMID: 32175873 DOI: 10.1109/tcbb.2020.2980831] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Accurate breast cancer detection using automated algorithms remains a problem within the literature. Although a plethora of work has tried to address this issue, an exact solution is yet to be found. This problem is further exacerbated by the fact that most of the existing datasets are imbalanced, i.e., the number of instances of a particular class far exceeds that of the others. In this paper, we propose a framework based on the notion of transfer learning to address this issue and focus our efforts on histopathological and imbalanced image classification. We use the popular VGG-19 as the base model and complement it with several state-of-the-art techniques to improve the overall performance of the system. With the ImageNet dataset taken as the source domain, we apply the learned knowledge in the target domain consisting of histopathological images. With experimentation performed on a large-scale dataset consisting of 277,524 images, we show that the framework proposed in this paper gives superior performance than those available in the existing literature. Through numerical simulations conducted on a supercomputer, we also present guidelines for work in transfer learning and imbalanced image classification.
Collapse
|
21
|
Abstract
Medical data usually have missing values; hence, imputation methods have become an important issue. In previous studies, many imputation methods based on variable data had a multivariate normal distribution, such as expectation-maximization and regression-based imputation. These assumptions may lead to deviations in the results, which sometimes create a bottleneck. In addition, directly deleting instances with missing values may have several problems, such as losing important data, producing invalid research samples, and leading to research deviations. Therefore, this study proposed a safe-region imputation method for handling medical data with missing values; we also built a medical prediction model and compared the removed missing values with imputation methods in terms of the generated rules, accuracy, and AUC. First, this study used the kNN imputation, multiple imputation, and the proposed imputation to impute the missing data and then applied four attribute selection methods to select the important attributes. Then, we used the decision tree (C4.5), random forest, REP tree, and LMT classifier to generate the rules, accuracy, and AUC for comparison. Because there were four datasets with imbalanced classes (asymmetric classes), the AUC was an important criterion. In the experiment, we collected four open medical datasets from UCI and one international stroke trial dataset. The results show that the proposed safe-region imputation is better than the listing imputation methods and after imputing offers better results than directly deleting instances with missing values in the number of rules, accuracy, and AUC. These results will provide a reference for medical stakeholders.
Collapse
|
22
|
Gerber S, Pospisil L, Navandar M, Horenko I. Low-cost scalable discretization, prediction, and feature selection for complex systems. SCIENCE ADVANCES 2020; 6:eaaw0961. [PMID: 32064328 PMCID: PMC6989146 DOI: 10.1126/sciadv.aaw0961] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Accepted: 11/22/2019] [Indexed: 06/10/2023]
Abstract
Finding reliable discrete approximations of complex systems is a key prerequisite when applying many of the most popular modeling tools. Common discretization approaches (e.g., the very popular K-means clustering) are crucially limited in terms of quality, parallelizability, and cost. We introduce a low-cost improved quality scalable probabilistic approximation (SPA) algorithm, allowing for simultaneous data-driven optimal discretization, feature selection, and prediction. We prove its optimality, parallel efficiency, and a linear scalability of iteration cost. Cross-validated applications of SPA to a range of large realistic data classification and prediction problems reveal marked cost and performance improvements. For example, SPA allows the data-driven next-day predictions of resimulated surface temperatures for Europe with the mean prediction error of 0.75°C on a common PC (being around 40% better in terms of errors and five to six orders of magnitude cheaper than with common computational instruments used by the weather services).
Collapse
Affiliation(s)
- S. Gerber
- Center of Computational Sciences, Johannes-Gutenberg-University of Mainz, PhysMat/Staudingerweg 9, 55128 Mainz, Germany
| | - L. Pospisil
- Faculty of Informatics, Universita della Svizzera Italiana, Via G. Buffi 13, 6900 Lugano Switzerland
| | - M. Navandar
- Center of Computational Sciences, Johannes-Gutenberg-University of Mainz, PhysMat/Staudingerweg 9, 55128 Mainz, Germany
| | - I. Horenko
- Faculty of Informatics, Universita della Svizzera Italiana, Via G. Buffi 13, 6900 Lugano Switzerland
| |
Collapse
|
23
|
Alizadehsani R, Roshanzamir M, Abdar M, Beykikhoshk A, Khosravi A, Panahiazar M, Koohestani A, Khozeimeh F, Nahavandi S, Sarrafzadegan N. A database for using machine learning and data mining techniques for coronary artery disease diagnosis. Sci Data 2019; 6:227. [PMID: 31645559 PMCID: PMC6811630 DOI: 10.1038/s41597-019-0206-3] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Accepted: 08/16/2019] [Indexed: 12/28/2022] Open
Abstract
We present the coronary artery disease (CAD) database, a comprehensive resource, comprising 126 papers and 68 datasets relevant to CAD diagnosis, extracted from the scientific literature from 1992 and 2018. These data were collected to help advance research on CAD-related machine learning and data mining algorithms, and hopefully to ultimately advance clinical diagnosis and early treatment. To aid users, we have also built a web application that presents the database through various reports.
Collapse
Affiliation(s)
- R Alizadehsani
- Institute for Intelligent Systems Research and Innovation, Deakin University, Geelong, VIC 3216, Australia
| | - M Roshanzamir
- Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, 84156-83111, Iran
| | - M Abdar
- Département d'informatique, Université du Québec à Montréal, Montréal, Québec, Canada
| | - A Beykikhoshk
- Applied Artificial Intelligence Institute, Deakin University, Geelong, Australia
| | - A Khosravi
- Institute for Intelligent Systems Research and Innovation, Deakin University, Geelong, VIC 3216, Australia
| | - M Panahiazar
- University of California San Francisco, San Francisco, CA, USA.
| | - A Koohestani
- Institute for Intelligent Systems Research and Innovation, Deakin University, Geelong, VIC 3216, Australia
| | - F Khozeimeh
- Mashhad University of Medical Science, Mashhad, Iran
| | - S Nahavandi
- Institute for Intelligent Systems Research and Innovation, Deakin University, Geelong, VIC 3216, Australia
| | - N Sarrafzadegan
- Isfahan Cardiovascular Research Center, Cardiovascular Research Institute, Isfahan University of Medical Sciences, Isfahan, Iran
- School of Population and Public Health, Faculty of Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
24
|
Kadam VJ, Jadhav SM, Vijayakumar K. Breast Cancer Diagnosis Using Feature Ensemble Learning Based on Stacked Sparse Autoencoders and Softmax Regression. J Med Syst 2019; 43:263. [PMID: 31270634 DOI: 10.1007/s10916-019-1397-z] [Citation(s) in RCA: 69] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Accepted: 06/19/2019] [Indexed: 11/30/2022]
Abstract
Nowadays, the most frequent cancer in women is breast cancer (malignant tumor). If breast cancer is detected at the beginning stage, it can often be cured. Many researchers proposed numerous methods for early prediction of this Cancer. In this paper, we proposed feature ensemble learning based on Sparse Autoencoders and Softmax Regression for classification of Breast Cancer into benign (non-cancerous) and malignant (cancerous). We used Breast Cancer Wisconsin (Diagnostic) medical data sets from the UCI machine learning repository. The proposed method is assessed using various performance indices like true classification accuracy, specificity, sensitivity, recall, precision, f measure, and MCC. Simulation and result proved that the proposed approach gives better results in terms of different parameters. The prediction results obtained by the proposed approach were very promising (98.60% true accuracy). In addition, the proposed method outperforms the Stacked Sparse Autoencoders and Softmax Regression based (SSAE-SM) model and other State-of-the-art classifiers in terms of various performance indices. Experimental simulations, empirical results, and statistical analyses are also showing that the proposed model is an efficient and beneficial model for classification of Breast Cancer. It is also comparable with the existing machine learning and soft computing approaches present in the related literature.
Collapse
Affiliation(s)
- Vinod Jagannath Kadam
- Department of Information Technology, Dr. Babashaeb Ambedkar Technological University, Lonere, India.
| | | | - K Vijayakumar
- Department of Computer Science & Engineering, St. Joseph's Institute of Technology, Chennai, India
| |
Collapse
|
25
|
Enhanced Monarchy Butterfly Optimization Technique for effective breast cancer diagnosis. J Med Syst 2019; 43:206. [PMID: 31144128 DOI: 10.1007/s10916-019-1348-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Accepted: 05/20/2019] [Indexed: 10/26/2022]
Abstract
Breast cancer is the biggest curse for the women society in the world since the survival factor of the infected patients is ensured only when it is detected at the early localized stage. The majority of the intelligent schemes proposed for detecting the breast cancer relies on the human skill that helps in trustworthy determination of essential pattern that confirms the existence of the infected cancer cells for deciding upon the course of treatment. Further, most of the research works contributed in the literature for detecting breast cancer necessitates huge time and laborinvolved that increases the time of diagnosis. This Intelligent Artificial Bee Colony and Enhanced Monarchy Butterfly Optimization Technique (IABC-EMBOT) is proposed for effective breast cancer diagnosis. The core idea behind the formulation of IABC-EMBOT relies on two significant ameliorations that, i) focuses on the modification of Monarchy Butterfly Optimization that enhances the exploration degree based on the rate of exploitation of the searching space and ii) concentrates on the elimination in the limitations of the ABC scheme by enhancing the possibility of search diversification process through phenomenal update facilitated through the dynamic and adaptive butterfly operator that improves the search globally. The proposed IABC-EMBOT scheme investigated using the Wisconsin data set is proven to facilitate an improved average classification accuracy of 97.53%.
Collapse
|
26
|
Sidey-Gibbons JAM, Sidey-Gibbons CJ. Machine learning in medicine: a practical introduction. BMC Med Res Methodol 2019; 19:64. [PMID: 30890124 PMCID: PMC6425557 DOI: 10.1186/s12874-019-0681-4] [Citation(s) in RCA: 424] [Impact Index Per Article: 84.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Accepted: 02/14/2019] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Following visible successes on a wide range of predictive tasks, machine learning techniques are attracting substantial interest from medical researchers and clinicians. We address the need for capacity development in this area by providing a conceptual introduction to machine learning alongside a practical guide to developing and evaluating predictive algorithms using freely-available open source software and public domain data. METHODS We demonstrate the use of machine learning techniques by developing three predictive models for cancer diagnosis using descriptions of nuclei sampled from breast masses. These algorithms include regularized General Linear Model regression (GLMs), Support Vector Machines (SVMs) with a radial basis function kernel, and single-layer Artificial Neural Networks. The publicly-available dataset describing the breast mass samples (N=683) was randomly split into evaluation (n=456) and validation (n=227) samples. We trained algorithms on data from the evaluation sample before they were used to predict the diagnostic outcome in the validation dataset. We compared the predictions made on the validation datasets with the real-world diagnostic decisions to calculate the accuracy, sensitivity, and specificity of the three models. We explored the use of averaging and voting ensembles to improve predictive performance. We provide a step-by-step guide to developing algorithms using the open-source R statistical programming environment. RESULTS The trained algorithms were able to classify cell nuclei with high accuracy (.94 -.96), sensitivity (.97 -.99), and specificity (.85 -.94). Maximum accuracy (.96) and area under the curve (.97) was achieved using the SVM algorithm. Prediction performance increased marginally (accuracy =.97, sensitivity =.99, specificity =.95) when algorithms were arranged into a voting ensemble. CONCLUSIONS We use a straightforward example to demonstrate the theory and practice of machine learning for clinicians and medical researchers. The principals which we demonstrate here can be readily applied to other complex tasks including natural language processing and image recognition.
Collapse
Affiliation(s)
- Jenni A M Sidey-Gibbons
- Department of Engineering, University of Cambridge, Trumpington Street, Cambridge, CB2 1PZ, UK
| | - Chris J Sidey-Gibbons
- Department of Surgery, Harvard Medical School, 25 Shattuck Street, Boston, 01225, Massachusetts, USA.
- Department of Surgery, Brigham and Women's Hospital, 75 Francis Street, Boston, 01225, Massachusetts, USA.
- University of Cambridge Psychometrics Centre, Trumpington Street, Cambridge, CB2 1AG, UK.
| |
Collapse
|
27
|
Crowson MG, Ranisau J, Eskander A, Babier A, Xu B, Kahmke RR, Chen JM, Chan TCY. A contemporary review of machine learning in otolaryngology-head and neck surgery. Laryngoscope 2019; 130:45-51. [PMID: 30706465 DOI: 10.1002/lary.27850] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2018] [Accepted: 01/11/2019] [Indexed: 11/07/2022]
Abstract
One of the key challenges with big data is leveraging the complex network of information to yield useful clinical insights. The confluence of massive amounts of health data and a desire to make inferences and insights on these data has produced a substantial amount of interest in machine-learning analytic methods. There has been a drastic increase in the otolaryngology literature volume describing novel applications of machine learning within the past 5 years. In this timely contemporary review, we provide an overview of popular machine-learning techniques, and review recent machine-learning applications in otolaryngology-head and neck surgery including neurotology, head and neck oncology, laryngology, and rhinology. Investigators have realized significant success in validated models with model sensitivities and specificities approaching 100%. Challenges remain in the implementation of machine-learning algorithms. This may be in part the unfamiliarity of these techniques to clinician leaders on the front lines of patient care. Spreading awareness and confidence in machine learning will follow with further validation and proof-of-value analyses that demonstrate model performance superiority over established methods. We are poised to see a greater influx of machine-learning applications to clinical problems in otolaryngology-head and neck surgery, and it is prudent for providers to understand the potential benefits and limitations of these technologies. Laryngoscope, 130:45-51, 2020.
Collapse
Affiliation(s)
- Matthew G Crowson
- Department of Otolaryngology-Head and Neck Surgery, Sunnybrook Health Sciences Center, Toronto, Ontario, Canada
| | - Jonathan Ranisau
- Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, Ontario, Canada
| | - Antoine Eskander
- Department of Otolaryngology-Head and Neck Surgery, Sunnybrook Health Sciences Center, Toronto, Ontario, Canada
- Institute for Clinical Evaluative Sciences, Toronto, Ontario, Canada
| | - Aaron Babier
- Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, Ontario, Canada
| | - Bin Xu
- Department of Pathology, Sunnybrook Health Sciences Center, Toronto, Ontario, Canada
| | - Russel R Kahmke
- Division of Otolaryngology-Head and Neck Surgery, Duke University Medical Center, Durham, North Carolina, U.S.A
| | - Joseph M Chen
- Department of Otolaryngology-Head and Neck Surgery, Sunnybrook Health Sciences Center, Toronto, Ontario, Canada
| | - Timothy C Y Chan
- Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
28
|
Xu Y, Liu T, Daniels MJ, Kantor R, Mwangi A, Hogan JW. Classification using ensemble learning under weighted misclassification loss. Stat Med 2019; 38:2002-2012. [PMID: 30609090 DOI: 10.1002/sim.8082] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2017] [Revised: 10/09/2018] [Accepted: 12/07/2018] [Indexed: 11/07/2022]
Abstract
Binary classification rules based on covariates typically depend on simple loss functions such as zero-one misclassification. Some cases may require more complex loss functions. For example, individual-level monitoring of HIV-infected individuals on antiretroviral therapy requires periodic assessment of treatment failure, defined as having a viral load (VL) value above a certain threshold. In some resource limited settings, VL tests may be limited by cost or technology, and diagnoses are based on other clinical markers. Depending on scenario, higher premium may be placed on avoiding false-positives, which brings greater cost and reduced treatment options. Here, the optimal rule is determined by minimizing a weighted misclassification loss/risk. We propose a method for finding and cross-validating optimal binary classification rules under weighted misclassification loss. We focus on rules comprising a prediction score and an associated threshold, where the score is derived using an ensemble learner. Simulations and examples show that our method, which derives the score and threshold jointly, more accurately estimates overall risk and has better operating characteristics compared with methods that derive the score first and the cutoff conditionally on the score especially for finite samples.
Collapse
Affiliation(s)
- Yizhen Xu
- Department of Biostatistics, Brown University, Providence, RI
| | - Tao Liu
- Department of Biostatistics, Brown University, Providence, RI
| | - Michael J Daniels
- Department of Statistics and Data Sciences, University of Texas at Austin, Austin, TX
| | - Rami Kantor
- Division of Infectious Diseases, Brown University, Providence, RI
| | - Ann Mwangi
- Academic Model Providing Access to Healthcare (AMPATH), Eldoret, Kenya.,College of Health Sciences, School of Medicine, Eldoret, Kenya
| | - Joseph W Hogan
- Department of Biostatistics, Brown University, Providence, RI.,Academic Model Providing Access to Healthcare (AMPATH), Eldoret, Kenya
| |
Collapse
|
29
|
Litjens G, Bandi P, Ehteshami Bejnordi B, Geessink O, Balkenhol M, Bult P, Halilovic A, Hermsen M, van de Loo R, Vogels R, Manson QF, Stathonikos N, Baidoshvili A, van Diest P, Wauters C, van Dijk M, van der Laak J. 1399 H&E-stained sentinel lymph node sections of breast cancer patients: the CAMELYON dataset. Gigascience 2018; 7:5026175. [PMID: 29860392 PMCID: PMC6007545 DOI: 10.1093/gigascience/giy065] [Citation(s) in RCA: 135] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2017] [Accepted: 05/22/2018] [Indexed: 12/27/2022] Open
Abstract
Background The presence of lymph node metastases is one of the most important factors in breast cancer prognosis. The most common way to assess regional lymph node status is the sentinel lymph node procedure. The sentinel lymph node is the most likely lymph node to contain metastasized cancer cells and is excised, histopathologically processed, and examined by a pathologist. This tedious examination process is time-consuming and can lead to small metastases being missed. However, recent advances in whole-slide imaging and machine learning have opened an avenue for analysis of digitized lymph node sections with computer algorithms. For example, convolutional neural networks, a type of machine-learning algorithm, can be used to automatically detect cancer metastases in lymph nodes with high accuracy. To train machine-learning models, large, well-curated datasets are needed. Results We released a dataset of 1,399 annotated whole-slide images (WSIs) of lymph nodes, both with and without metastases, in 3 terabytes of data in the context of the CAMELYON16 and CAMELYON17 Grand Challenges. Slides were collected from five medical centers to cover a broad range of image appearance and staining variations. Each WSI has a slide-level label indicating whether it contains no metastases, macro-metastases, micro-metastases, or isolated tumor cells. Furthermore, for 209 WSIs, detailed hand-drawn contours for all metastases are provided. Last, open-source software tools to visualize and interact with the data have been made available. Conclusions A unique dataset of annotated, whole-slide digital histopathology images has been provided with high potential for re-use.
Collapse
Affiliation(s)
- Geert Litjens
- Diagnostic Image Analysis Group, Department of Pathology, Radboud University Medical Center, Huispost 824, Geert Grootteplein-Zuid 10, 6525GA Nijmegen, The Netherlands
| | - Peter Bandi
- Department of Pathology, University Medical Center Huispost H04.312, Heidelberglaan 100, 3584CX, Utrecht, The Netherlands
| | - Babak Ehteshami Bejnordi
- Diagnostic Image Analysis Group, Department of Pathology, Radboud University Medical Center, Huispost 824, Geert Grootteplein-Zuid 10, 6525GA Nijmegen, The Netherlands
| | - Oscar Geessink
- Diagnostic Image Analysis Group, Department of Pathology, Radboud University Medical Center, Huispost 824, Geert Grootteplein-Zuid 10, 6525GA Nijmegen, The Netherlands
| | - Maschenka Balkenhol
- Diagnostic Image Analysis Group, Department of Pathology, Radboud University Medical Center, Huispost 824, Geert Grootteplein-Zuid 10, 6525GA Nijmegen, The Netherlands
| | - Peter Bult
- Diagnostic Image Analysis Group, Department of Pathology, Radboud University Medical Center, Huispost 824, Geert Grootteplein-Zuid 10, 6525GA Nijmegen, The Netherlands
| | - Altuna Halilovic
- Diagnostic Image Analysis Group, Department of Pathology, Radboud University Medical Center, Huispost 824, Geert Grootteplein-Zuid 10, 6525GA Nijmegen, The Netherlands
| | - Meyke Hermsen
- Diagnostic Image Analysis Group, Department of Pathology, Radboud University Medical Center, Huispost 824, Geert Grootteplein-Zuid 10, 6525GA Nijmegen, The Netherlands
| | - Rob van de Loo
- Diagnostic Image Analysis Group, Department of Pathology, Radboud University Medical Center, Huispost 824, Geert Grootteplein-Zuid 10, 6525GA Nijmegen, The Netherlands
| | - Rob Vogels
- Diagnostic Image Analysis Group, Department of Pathology, Radboud University Medical Center, Huispost 824, Geert Grootteplein-Zuid 10, 6525GA Nijmegen, The Netherlands
| | - Quirine F Manson
- Department of Pathology, University Medical Center Huispost H04.312, Heidelberglaan 100, 3584CX, Utrecht, The Netherlands
| | - Nikolas Stathonikos
- Department of Pathology, University Medical Center Huispost H04.312, Heidelberglaan 100, 3584CX, Utrecht, The Netherlands
| | - Alexi Baidoshvili
- Laboratory for Pathology East Netherlands (LabPON), Postbus 516, 7550AM Hengelo, The Netherlands
| | - Paul van Diest
- Department of Pathology, University Medical Center Huispost H04.312, Heidelberglaan 100, 3584CX, Utrecht, The Netherlands
| | - Carla Wauters
- Department of Pathology, Canisius-Wilhelmina Hospital, Postbus 9015, 6500GS Nijmegen, The Netherlands
| | - Marcory van Dijk
- Department of Pathology, Rijnstate Hospital, Pathology-DNA, Postbus 9555, 6800TA Arnhem, The Netherlands
| | - Jeroen van der Laak
- Diagnostic Image Analysis Group, Department of Pathology, Radboud University Medical Center, Huispost 824, Geert Grootteplein-Zuid 10, 6525GA Nijmegen, The Netherlands
| |
Collapse
|
30
|
Wang Y, Li Y, Qiao C, Liu X, Hao M, Shugart YY, Xiong M, Jin L. Nuclear Norm Clustering: a promising alternative method for clustering tasks. Sci Rep 2018; 8:10873. [PMID: 30022093 PMCID: PMC6052164 DOI: 10.1038/s41598-018-29246-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2017] [Accepted: 07/02/2018] [Indexed: 11/09/2022] Open
Abstract
Clustering techniques are widely used in many applications. The goal of clustering is to identify patterns or groups of similar objects within a dataset of interest. However, many cluster methods are neither robust nor sensitive to noises and outliers in real data. In this paper, we present Nuclear Norm Clustering (NNC, available at https://sourceforge.net/projects/nnc/), an algorithm that can be used in various fields as a promising alternative to the k-means clustering method. The NNC algorithm requires users to provide a data matrix M and a desired number of cluster K. We employed simulated annealing techniques to choose an optimal label vector that minimizes nuclear norm of the pooled within cluster residual matrix. To evaluate the performance of the NNC algorithm, we compared the performance of both 15 public datasets and 2 genome-wide association studies (GWAS) on psoriasis, comparing our method with other classic methods. The results indicate that NNC method has a competitive performance in terms of F-score on 15 benchmarked public datasets and 2 psoriasis GWAS datasets. So NNC is a promising alternative method for clustering tasks.
Collapse
Affiliation(s)
- Yi Wang
- Ministry of Education Key Laboratory of Contemporary Anthropology, Department of Anthropology and Human Genetics, School of Life Sciences, Fudan University, Shanghai, China.,Human Phenome Institute, Fudan University, Shanghai, China
| | - Yi Li
- Ministry of Education Key Laboratory of Contemporary Anthropology, Department of Anthropology and Human Genetics, School of Life Sciences, Fudan University, Shanghai, China.,Six Industrial Research Institute, Fudan University, Shanghai, China.,Human Phenome Institute, Fudan University, Shanghai, China
| | - Chunhong Qiao
- Ministry of Education Key Laboratory of Contemporary Anthropology, Department of Anthropology and Human Genetics, School of Life Sciences, Fudan University, Shanghai, China.,Human Phenome Institute, Fudan University, Shanghai, China
| | - Xiaoyu Liu
- Ministry of Education Key Laboratory of Contemporary Anthropology, Department of Anthropology and Human Genetics, School of Life Sciences, Fudan University, Shanghai, China.,Human Phenome Institute, Fudan University, Shanghai, China
| | - Meng Hao
- Ministry of Education Key Laboratory of Contemporary Anthropology, Department of Anthropology and Human Genetics, School of Life Sciences, Fudan University, Shanghai, China.,Human Phenome Institute, Fudan University, Shanghai, China
| | - Yin Yao Shugart
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China. .,Unit on Statistical Genomics, Division of Intramural Division Programs, National, Institute of Mental Health, National Institutes of Health, Bethesda, MD, USA. .,Six Industrial Research Institute, Fudan University, Shanghai, China.
| | - Momiao Xiong
- Human Genetics Center, School of Public Health, University of Texas Houston Health Sciences Center, Houston, Texas, USA.
| | - Li Jin
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China. .,Six Industrial Research Institute, Fudan University, Shanghai, China. .,Human Phenome Institute, Fudan University, Shanghai, China.
| |
Collapse
|
31
|
Hamouda SKM, Wahed ME, Abo Alez RH, Riad K. Robust breast cancer prediction system based on rough set theory at National Cancer Institute of Egypt. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018; 153:259-268. [PMID: 29157458 DOI: 10.1016/j.cmpb.2017.10.016] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2017] [Revised: 09/10/2017] [Accepted: 10/12/2017] [Indexed: 06/07/2023]
Abstract
BACKGROUND Breast cancer is one of the major death causing diseases of the women in the world. Every year more than million women are diagnosed with breast cancer more than half of them will die because of inaccuracies and delays in diagnosis of the disease. High accuracy in cancer prediction is important to improve the treatment quality and the survivability rate of patients. OBJECTIVES In this paper, we are going to propose a new and robust breast cancer prediction and diagnosis system based on the Rough Set (RS). Also, introducing the robust classification process based on some new and most effective attributes. Comparing and evaluating the performance of our proposed approach with the clinical, Radial Basis Function, and Artificial Neural Networks classification schemes. METHODS The dataset used in our experiments consists of 60 samples obtained from the National Cancer Institute (NCI) of Egypt. We have used the RS theory to robustly find dependence relationships among data, and evaluate the importance of attributes through: Results: Conclusion: We have introduced the robustness of the RS theory in early predicting and diagnosing the breast cancer. This lay more importance to the contribution and efficiency of RS theory in the field of computational biology.
Collapse
Affiliation(s)
| | - Mohammed E Wahed
- Faculty Of Computers and Informatics, Suez Canal University, Ismailia, Egypt
| | | | - Khaled Riad
- Mathematics Department, Faculty of Science, Zagazig University, Zagazig 44519, Egypt.
| |
Collapse
|
32
|
ST-ONCODIAG: A semantic rule-base approach to diagnosing breast cancer base on Wisconsin datasets. INFORMATICS IN MEDICINE UNLOCKED 2018. [DOI: 10.1016/j.imu.2017.12.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
33
|
Paul A, Sil J, Mukhopadhyay CD. Gene selection for designing optimal fuzzy rule base classifier by estimating missing value. Appl Soft Comput 2017. [DOI: 10.1016/j.asoc.2017.01.046] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
34
|
Sengupta D, Bandyopadhyay S, Sinha D. A Scoring Scheme for Online Feature Selection: Simulating Model Performance Without Retraining. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:405-414. [PMID: 26812738 DOI: 10.1109/tnnls.2016.2514270] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Increasing the number of features increases the complexity of a model even if the additional feature does not improve its decision-making capacity. Irrelevant features may also cause overfitting and reduce interpretability of the concerned model. It is, therefore, important that the features are optimally selected before a model is built. In the case of online learning, new instances are periodically discovered, and the respective model is tactically retrained as required. Similarly, there are many real-life situations where hundreds of new features are discovered periodically, and the existing model needs to be retrained or tested for its performance improvement. Supervised selection of feature subset usually requires creation of multiple suboptimal models, thus incurring time-intensive computations. Unsupervised selections, although faster, largely rely on some subjective definition of feature relevance. In this paper, we introduce a score that accurately determines the importance of the features. The proposed score is appropriate for online feature selection scenarios for its low time complexity and ability to interpret performance improvement of the current model after the addition of a new feature, without invoking a retraining.
Collapse
|
35
|
An Enhanced Grey Wolf Optimization Based Feature Selection Wrapped Kernel Extreme Learning Machine for Medical Diagnosis. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2017; 2017:9512741. [PMID: 28246543 PMCID: PMC5299219 DOI: 10.1155/2017/9512741] [Citation(s) in RCA: 89] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2016] [Revised: 12/03/2016] [Accepted: 12/21/2016] [Indexed: 11/18/2022]
Abstract
In this study, a new predictive framework is proposed by integrating an improved grey wolf optimization (IGWO) and kernel extreme learning machine (KELM), termed as IGWO-KELM, for medical diagnosis. The proposed IGWO feature selection approach is used for the purpose of finding the optimal feature subset for medical data. In the proposed approach, genetic algorithm (GA) was firstly adopted to generate the diversified initial positions, and then grey wolf optimization (GWO) was used to update the current positions of population in the discrete searching space, thus getting the optimal feature subset for the better classification purpose based on KELM. The proposed approach is compared against the original GA and GWO on the two common disease diagnosis problems in terms of a set of performance metrics, including classification accuracy, sensitivity, specificity, precision, G-mean, F-measure, and the size of selected features. The simulation results have proven the superiority of the proposed method over the other two competitive counterparts.
Collapse
|
36
|
Saha M, Mukherjee R, Chakraborty C. Computer-aided diagnosis of breast cancer using cytological images: A systematic review. Tissue Cell 2016; 48:461-74. [DOI: 10.1016/j.tice.2016.07.006] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2016] [Revised: 06/16/2016] [Accepted: 07/27/2016] [Indexed: 12/13/2022]
|
37
|
Breast Cancer Detection with Reduced Feature Set. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2015; 2015:265138. [PMID: 26078774 PMCID: PMC4452509 DOI: 10.1155/2015/265138] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/12/2014] [Revised: 12/14/2014] [Accepted: 12/25/2014] [Indexed: 11/23/2022]
Abstract
This paper explores feature reduction properties of independent component analysis (ICA) on breast cancer decision support system. Wisconsin diagnostic breast cancer (WDBC) dataset is reduced to one-dimensional feature vector computing an independent component (IC). The original data with 30 features and reduced one feature (IC) are used to evaluate diagnostic accuracy of the classifiers such as k-nearest neighbor (k-NN), artificial neural network (ANN), radial basis function neural network (RBFNN), and support vector machine (SVM). The comparison of the proposed classification using the IC with original feature set is also tested on different validation (5/10-fold cross-validations) and partitioning (20%–40%) methods. These classifiers are evaluated how to effectively categorize tumors as benign and malignant in terms of specificity, sensitivity, accuracy, F-score, Youden's index, discriminant power, and the receiver operating characteristic (ROC) curve with its criterion values including area under curve (AUC) and 95% confidential interval (CI). This represents an improvement in diagnostic decision support system, while reducing computational complexity.
Collapse
|
38
|
Cao J, Zhang L, Wang B, Li F, Yang J. A fast gene selection method for multi-cancer classification using multiple support vector data description. J Biomed Inform 2014; 53:381-9. [PMID: 25549938 DOI: 10.1016/j.jbi.2014.12.009] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2014] [Revised: 12/14/2014] [Accepted: 12/18/2014] [Indexed: 01/31/2023]
Abstract
For cancer classification problems based on gene expression, the data usually has only a few dozen sizes but has thousands to tens of thousands of genes which could contain a large number of irrelevant genes. A robust feature selection algorithm is required to remove irrelevant genes and choose the informative ones. Support vector data description (SVDD) has been applied to gene selection for many years. However, SVDD cannot address the problems with multiple classes since it only considers the target class. In addition, it is time-consuming when applying SVDD to gene selection. This paper proposes a novel fast feature selection method based on multiple SVDD and applies it to multi-class microarray data. A recursive feature elimination (RFE) scheme is introduced to iteratively remove irrelevant features, so the proposed method is called multiple SVDD-RFE (MSVDD-RFE). To make full use of all classes for a given task, MSVDD-RFE independently selects a relevant gene subset for each class. The final selected gene subset is the union of these relevant gene subsets. The effectiveness and accuracy of MSVDD-RFE are validated by experiments on five publicly available microarray datasets. Our proposed method is faster and more effective than other methods.
Collapse
Affiliation(s)
- Jin Cao
- School of Computer Science and Technology & Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou 215006, Jiangsu, China
| | - Li Zhang
- School of Computer Science and Technology & Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou 215006, Jiangsu, China; Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210000, Jiangsu, China
| | - Bangjun Wang
- School of Computer Science and Technology & Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou 215006, Jiangsu, China
| | - Fanzhang Li
- School of Computer Science and Technology & Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou 215006, Jiangsu, China
| | - Jiwen Yang
- School of Computer Science and Technology & Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou 215006, Jiangsu, China; Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210000, Jiangsu, China
| |
Collapse
|
39
|
Gorunescu F, Belciug S. Evolutionary strategy to develop learning-based decision systems. Application to breast cancer and liver fibrosis stadialization. J Biomed Inform 2014; 49:112-8. [DOI: 10.1016/j.jbi.2014.02.001] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2013] [Revised: 12/09/2013] [Accepted: 02/03/2014] [Indexed: 11/26/2022]
|
40
|
Loukas C, Kostopoulos S, Tanoglidi A, Glotsos D, Sfikas C, Cavouras D. Breast cancer characterization based on image classification of tissue sections visualized under low magnification. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2013; 2013:829461. [PMID: 24069067 PMCID: PMC3773385 DOI: 10.1155/2013/829461] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2013] [Revised: 07/24/2013] [Accepted: 08/01/2013] [Indexed: 12/02/2022]
Abstract
Rapid assessment of tissue biopsies is a critical issue in modern histopathology. For breast cancer diagnosis, the shape of the nuclei and the architectural pattern of the tissue are evaluated under high and low magnifications, respectively. In this study, we focus on the development of a pattern classification system for the assessment of breast cancer images captured under low magnification (×10). Sixty-five regions of interest were selected from 60 images of breast cancer tissue sections. Texture analysis provided 30 textural features per image. Three different pattern recognition algorithms were employed (kNN, SVM, and PNN) for classifying the images into three malignancy grades: I-III. The classifiers were validated with leave-one-out (training) and cross-validation (testing) modes. The average discrimination efficiency of the kNN, SVM, and PNN classifiers in the training mode was close to 97%, 95%, and 97%, respectively, whereas in the test mode, the average classification accuracy achieved was 86%, 85%, and 90%, respectively. Assessment of breast cancer tissue sections could be applied in complex large-scale images using textural features and pattern classifiers. The proposed technique provides several benefits, such as speed of analysis and automation, and could potentially replace the laborious task of visual examination.
Collapse
Affiliation(s)
- C. Loukas
- Department of Medical Physics, Medical School, University of Athens, 75 Mikras Asias Street, 115 27 Athens, Greece
| | - S. Kostopoulos
- Medical Image and Signal Processing Laboratory, Department of Medical Instruments Technology, Technological Educational Institute of Athens, 12210 Athens, Greece
| | - A. Tanoglidi
- Department of Histopathology, Elena Venizelos Hospital, 106 72 Athens, Greece
| | - D. Glotsos
- Medical Image and Signal Processing Laboratory, Department of Medical Instruments Technology, Technological Educational Institute of Athens, 12210 Athens, Greece
| | - C. Sfikas
- Department of Histopathology, Elena Venizelos Hospital, 106 72 Athens, Greece
| | - D. Cavouras
- Medical Image and Signal Processing Laboratory, Department of Medical Instruments Technology, Technological Educational Institute of Athens, 12210 Athens, Greece
| |
Collapse
|
41
|
Longford NT. Screening as an application of decision theory. Stat Med 2013; 32:849-63. [PMID: 22899278 DOI: 10.1002/sim.5554] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2011] [Accepted: 07/14/2012] [Indexed: 11/10/2022]
Abstract
We develop a decision-theoretical approach to setting the threshold for a screening procedure that declares each examined subject as a positive or a negative. It is fundamentally different from maximising the Youden index. The method incorporates the consequences of the two kinds of bad decisions (false positives and false negatives) by means of a set of plausible loss functions elicited from a subject-matter expert or committee. We present details for several classes of loss functions and within-group distributions of the outcomes. We outline extensions related to mixture distributions and compositions of loss functions. We illustrate the method on simulated examples and apply it to real datasets.
Collapse
|
42
|
|
43
|
Lee KM, Street WN. An adaptive resource-allocating network for automated detection, segmentation, and classification of breast cancer nuclei topic area: image processing and recognition. ACTA ACUST UNITED AC 2012; 14:680-7. [PMID: 18238048 DOI: 10.1109/tnn.2003.810615] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
This paper presents a unified image analysis approach for automated detection, segmentation, and classification of breast cancer nuclei using a neural network, which learns to cluster shapes and to classify nuclei. The proposed neural network is incrementally grown by creating a new cluster whenever a previously unseen shape is presented. Each hidden node represents a cluster used as a template to provide faster and more accurate nuclei detection and segmentation. Online learning gives the system improved performance with continued use. The effectiveness of the resulting system is demonstrated on a task of cytological image analysis, with classification of individual nuclei used to diagnose the sample. This demonstrates the potential effectiveness of such a system on diagnostic tasks that require the classification of individual cells.
Collapse
Affiliation(s)
- Kyoung-Mi Lee
- Dept. of Comput. Sci., Duksung Women's Univ., Seoul, South Korea
| | | |
Collapse
|
44
|
Application of Pattern Recognition Techniques for the Analysis of Histopathological Images. ACTA ACUST UNITED AC 2011. [DOI: 10.1007/978-3-642-20320-6_65] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
|
45
|
Peng Y, Wu Z, Jiang J. A novel feature selection approach for biomedical data classification. J Biomed Inform 2010; 43:15-23. [PMID: 19647098 DOI: 10.1016/j.jbi.2009.07.008] [Citation(s) in RCA: 153] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2008] [Revised: 04/08/2009] [Accepted: 07/27/2009] [Indexed: 11/28/2022]
|
46
|
Mu T, Nandi AK, Rangayyan RM. Strict 2-Surface Proximal Classification of Knee-joint Vibroarthrographic Signals. ACTA ACUST UNITED AC 2008; 2007:4911-4. [PMID: 18003107 DOI: 10.1109/iembs.2007.4353441] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Externally detected vibroarthrographic (VAG) signals contain information that can be used to characterize certain pathological aspects of the knee joint. To classify VAG signals as normal or abnormal, we propose to apply both the linear and nonlinear strict 2-surface proximal (S2SP) classifiers based on statistical parameters derived from VAG signals and selected by using a genetic algorithm (GA). A database of VAG signals of 89 human knee joints (51 normal and 38 abnormal) was studied. The classification performance of the linear S2SP classifier reached 0.82 in terms of the area under the receiver operating characteristics curve (Az) and 74.2% in average classification accuracy with the leave-one-out (LOO) procedure. The classification performance of the nonlinear S2SP classifier reached 0.95 in Az value and 91.0% in average classification accuracy using the Gaussian kernel with the LOO procedure, and possessed good robustness around the selected kernel parameter.
Collapse
Affiliation(s)
- Tingting Mu
- Department of Electrical Engineering and Electronics, the University of Liverpool, Brownlow Hill, Liverpool, UK, L69 3GJ
| | | | | |
Collapse
|
47
|
Maglogiannis I, Zafiropoulos E, Anagnostopoulos I. An intelligent system for automated breast cancer diagnosis and prognosis using SVM based classifiers. APPL INTELL 2007. [DOI: 10.1007/s10489-007-0073-z] [Citation(s) in RCA: 116] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
48
|
|
49
|
Anagnostopoulos I, Maglogiannis I. Neural network-based diagnostic and prognostic estimations in breast cancer microscopic instances. Med Biol Eng Comput 2006; 44:773-84. [PMID: 16960744 DOI: 10.1007/s11517-006-0079-4] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2005] [Accepted: 06/02/2006] [Indexed: 11/25/2022]
Abstract
This paper deals with breast cancer diagnostic and prognostic estimations employing neural networks over the Wisconsin Breast Cancer datasets, which consist of measurements taken from breast cancer microscopic instances. A probabilistic approach is dedicated to solve the diagnosis problem, detecting malignancy among instances derived from the Fine Needle Aspirate test, while regression algorithms estimate the time interval that possibly correspond to the right end-point of the patients' disease-free survival time or the time where the tumour recurs (time-to-recur). For the diagnosis problem, the accuracy of the neural network in terms of sensitivity and specificity was measured at 98.6 and 97.5% respectively, using the leave-one-out test method. As far as the prognosis problem is concerned, the accuracy of the neural network was measured through a stratified tenfold cross-validation approach. Sensitivity ranged between 80.5 and 91.8%, while specificity ranged between 91.9 and 97.9%, depending on the tested fold and the partition of the predicted period. The prognostic recurrence predictions were then further evaluated using survival analysis and compared with other techniques found in literature.
Collapse
Affiliation(s)
- Ioannis Anagnostopoulos
- Department of Information and Communication Systems Engineering, University of the Aegean, Karlovassi, 83200, Samos, Greece.
| | | |
Collapse
|
50
|
Tsakonas A. A comparison of classification accuracy of four genetic programming-evolved intelligent structures. Inf Sci (N Y) 2006. [DOI: 10.1016/j.ins.2005.03.012] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|