1
|
Enhanced Preprocessing Approach Using Ensemble Machine Learning Algorithms for Detecting Liver Disease. Biomedicines 2023; 11:biomedicines11020581. [PMID: 36831118 PMCID: PMC9953600 DOI: 10.3390/biomedicines11020581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 01/30/2023] [Accepted: 01/31/2023] [Indexed: 02/18/2023] Open
Abstract
There has been a sharp increase in liver disease globally, and many people are dying without even knowing that they have it. As a result of its limited symptoms, it is extremely difficult to detect liver disease until the very last stage. In the event of early detection, patients can begin treatment earlier, thereby saving their lives. It has become increasingly popular to use ensemble learning algorithms since they perform better than traditional machine learning algorithms. In this context, this paper proposes a novel architecture based on ensemble learning and enhanced preprocessing to predict liver disease using the Indian Liver Patient Dataset (ILPD). Six ensemble learning algorithms are applied to the ILPD, and their results are compared to those obtained with existing studies. The proposed model uses several data preprocessing methods, such as data balancing, feature scaling, and feature selection, to improve the accuracy with appropriate imputations. Multivariate imputation is applied to fill in missing values. On skewed columns, log1p transformation was applied, along with standardization, min-max scaling, maximum absolute scaling, and robust scaling techniques. The selection of features is carried out based on several methods including univariate selection, feature importance, and correlation matrix. These enhanced preprocessed data are trained on Gradient boosting, XGBoost, Bagging, Random Forest, Extra Tree, and Stacking ensemble learning algorithms. The results of the six models were compared with each other, as well as with the models used in other research works. The proposed model using extra tree classifier and random forest, outperformed the other methods with the highest testing accuracy of 91.82% and 86.06%, respectively, portraying our method as a real-world solution for detecting liver disease.
Collapse
|
2
|
Research on Disease Prediction Method Based on R-Lookahead-LSTM. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:8431912. [PMID: 35463275 PMCID: PMC9020897 DOI: 10.1155/2022/8431912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Revised: 03/10/2022] [Accepted: 03/14/2022] [Indexed: 11/24/2022]
Abstract
Cardiovascular disease is one of the most serious diseases that threaten human health in the world today. Therefore, establishing a high-quality disease prediction model is of great significance for the prevention and treatment of cardiovascular disease. In the feature selection stage, three new strong feature vectors are constructed based on the background of disease prediction and added to the original data set, and the relationship between the feature vectors is analyzed by using the correlation coefficient map. At the same time, a random forest algorithm is introduced for feature selection, and the importance ranking of features is obtained. In order to further improve the prediction effect of the model, a cardiovascular disease prediction model based on R-Lookahead-LSTM is proposed. The model based on the stochastic gradient descent algorithm of the fast weight part of the Lookahead algorithm is optimized and improved to the Rectified Adam algorithm; the Tanh activation function is further improved to the Softsign activation function to promote model convergence; and the R-Lookahead algorithm is used to further optimize the long-term memory network model. Therefore, the long- and short-term memory network model can be better improved so that the model tends to be stable as soon as possible, and it is applied to the cardiovascular disease prediction model.
Collapse
|
3
|
Das A, Das P, Panda SS, Sabut S. Detection of Liver Cancer Using Modified Fuzzy Clustering and Decision Tree Classifier in CT Images. PATTERN RECOGNITION AND IMAGE ANALYSIS 2019. [DOI: 10.1134/s1054661819020056] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
4
|
Abstract
Machine learning is a branch of artificial intelligence that employs a variety of statistical, probabilistic and optimization techniques that allows computers to “learn” from past examples and to detect hard-to-discern patterns from large, noisy or complex data sets. This capability is particularly well-suited to medical applications, especially those that depend on complex proteomic and genomic measurements. As a result, machine learning is frequently used in cancer diagnosis and detection. More recently machine learning has been applied to cancer prognosis and prediction. This latter approach is particularly interesting as it is part of a growing trend towards personalized, predictive medicine. In assembling this review we conducted a broad survey of the different types of machine learning methods being used, the types of data being integrated and the performance of these methods in cancer prediction and prognosis. A number of trends are noted, including a growing dependence on protein biomarkers and microarray data, a strong bias towards applications in prostate and breast cancer, and a heavy reliance on “older” technologies such artificial neural networks (ANNs) instead of more recently developed or more easily interpretable machine learning methods. A number of published studies also appear to lack an appropriate level of validation or testing. Among the better designed and validated studies it is clear that machine learning methods can be used to substantially (15–25%) improve the accuracy of predicting cancer susceptibility, recurrence and mortality. At a more fundamental level, it is also evident that machine learning is also helping to improve our basic understanding of cancer development and progression.
Collapse
Affiliation(s)
- Joseph A. Cruz
- Departments of Biological Science and Computing Science, University of Alberta Edmonton, AB, Canada T6G 2E8
| | - David S. Wishart
- Departments of Biological Science and Computing Science, University of Alberta Edmonton, AB, Canada T6G 2E8
| |
Collapse
|
5
|
Abstract
We aim to develop warfarin dosing algorithm for African-Americans. We explored demographic, clinical, and genetic data from a previously collected cohort of 163 African-American patients with a stable warfarin dose. We explored 2 approaches to develop the algorithm: multiple linear regression and artificial neural network (ANN). The clinical significance of the 2 dosing algorithms was evaluated by calculating the percentage of patients whose predicted dose of warfarin was within 20% of the actual dose. Linear regression model and ANN model predicted the ideal dose in 52% and 48% of the patients, respectively. The mean absolute error using linear regression model was estimated to be 10.8 mg compared with 10.9 mg using ANN. Linear regression and ANN models identified several predictors of warfarin dose including age, weight, CYP2C9 genotype *1/*1, VKORC1 genotype, rs12777823 genotype, rs2108622 genotype, congestive heart failure, and amiodarone use. In conclusion, we developed a warfarin dosing algorithm for African-Americans. The proposed dosing algorithm has the potential to recommend warfarin doses that are close to the appropriate doses. The use of more sophisticated ANN approach did not result in improved predictive performance of the dosing algorithm except for patients of a dose of ≥49 mg/wk.
Collapse
|
6
|
Adams LJ, Bello G, Dumancas GG. Development and Application of a Genetic Algorithm for Variable Optimization and Predictive Modeling of Five-Year Mortality Using Questionnaire Data. Bioinform Biol Insights 2015; 9:31-41. [PMID: 26604716 PMCID: PMC4639510 DOI: 10.4137/bbi.s29469] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2015] [Revised: 09/22/2015] [Indexed: 12/31/2022] Open
Abstract
The problem of selecting important variables for predictive modeling of a specific outcome of interest using questionnaire data has rarely been addressed in clinical settings. In this study, we implemented a genetic algorithm (GA) technique to select optimal variables from questionnaire data for predicting a five-year mortality. We examined 123 questions (variables) answered by 5,444 individuals in the National Health and Nutrition Examination Survey. The GA iterations selected the top 24 variables, including questions related to stroke, emphysema, and general health problems requiring the use of special equipment, for use in predictive modeling by various parametric and nonparametric machine learning techniques. Using these top 24 variables, gradient boosting yielded the nominally highest performance (area under curve [AUC] = 0.7654), although there were other techniques with lower but not significantly different AUC. This study shows how GA in conjunction with various machine learning techniques could be used to examine questionnaire data to predict a binary outcome.
Collapse
Affiliation(s)
- Lucas J Adams
- Department of Chemistry, Oklahoma Baptist University, Shawnee, OK, USA
| | - Ghalib Bello
- Arthritis and Clinical Immunology Research Program, Oklahoma Medical Research Foundation, Oklahoma City, OK, USA
| | - Gerard G Dumancas
- Department of Chemistry, Oklahoma Baptist University, Shawnee, OK, USA
| |
Collapse
|
7
|
Abstract
Data mining, also known as Knowledge-Discovery in Databases (KDD), is the process of automatically searching large volumes of data for patterns. For instance, a clinical pattern might indicate a female who have diabetes or hypertension are easier suffered from stroke for 5 years in a future. Then, a physician can learn valuable knowledge from the data mining processes. Here, we present a study focused on the investigation of the application of artificial intelligence and data mining techniques to the prediction models of breast cancer. The artificial neural network, decision tree, logistic regression, and genetic algorithm were used for the comparative studies and the accuracy and positive predictive value of each algorithm were used as the evaluation indicators. 699 records acquired from the breast cancer patients at the University of Wisconsin, nine predictor variables, and one outcome variable were incorporated for the data analysis followed by the tenfold cross-validation. The results revealed that the accuracies of logistic regression model were 0.9434 (sensitivity 0.9716 and specificity 0.9482), the decision tree model 0.9434 (sensitivity 0.9615, specificity 0.9105), the neural network model 0.9502 (sensitivity 0.9628, specificity 0.9273), and the genetic algorithm model 0.9878 (sensitivity 1, specificity 0.9802). The accuracy of the genetic algorithm was significantly higher than the average predicted accuracy of 0.9612. The predicted outcome of the logistic regression model was higher than that of the neural network model but no significant difference was observed. The average predicted accuracy of the decision tree model was 0.9435 which was the lowest of all four predictive models. The standard deviation of the tenfold cross-validation was rather unreliable. This study indicated that the genetic algorithm model yielded better results than other data mining models for the analysis of the data of breast cancer patients in terms of the overall accuracy of the patient classification, the expression and complexity of the classification rule. The results showed that the genetic algorithm described in the present study was able to produce accurate results in the classification of breast cancer data and the classification rule identified was more acceptable and comprehensible.
Collapse
Affiliation(s)
- Der-Ming Liou
- Yang Ming University, No 155, Sec. 2, Li-Nong St., Taipei, 112, Taiwan R.O.C.,
| | | |
Collapse
|
8
|
Zhou X, Zhang Y, Shi M, Shi H, Zheng Z. Early detection of liver disease using data visualisation and classification method. Biomed Signal Process Control 2014. [DOI: 10.1016/j.bspc.2014.02.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
9
|
Saleh MI, Alzubiedi S. Dosage Individualization of Warfarin Using Artificial Neural Networks. Mol Diagn Ther 2014; 18:371-9. [DOI: 10.1007/s40291-014-0090-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
10
|
Lin RH. An intelligent model for liver disease diagnosis. Artif Intell Med 2009; 47:53-62. [PMID: 19540738 DOI: 10.1016/j.artmed.2009.05.005] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2008] [Revised: 04/29/2009] [Accepted: 05/10/2009] [Indexed: 11/24/2022]
Abstract
OBJECTIVES Liver disease, the most common disease in Taiwan, is not easily discovered in its initial stage; early diagnosis of this leading cause of mortality is therefore highly important. The design of an effective diagnosis model is therefore an important issue in liver disease treatment. This study accordingly employs classification and regression tree (CART) and case-based reasoning (CBR) techniques to structure an intelligent diagnosis model aiming to provide a comprehensive analytic framework to raise the accuracy of liver disease diagnosis. METHODS Based on the advice and assistance of doctors and medical specialists of liver conditions, 510 outpatient visitors using ICD-9 (International Classification of Diseases, 9th Revision) codes at a medical center in Taiwan from 2005 to 2006 were selected as the cases in the data set for liver disease diagnosis. Data on 340 patients was utilized for the development of the model and on 170 patients utilized to perform comparative analysis of the models. This paper accordingly suggests an intelligent model for the diagnosis of liver diseases which integrates CART and CBR. The major steps in applying the model include: (1) adopting CART to diagnose whether a patient suffers from liver disease; (2) for patients diagnosed with liver disease in the first step, employing CBR to diagnose the types of liver diseases. RESULTS In the first phase, CART is used to extract rules from health examination data to show whether the patient suffers from liver disease. The results indicate that the CART rate of accuracy is 92.94%. In the second phase, CBR is developed to diagnose the type of liver disease, and the new case triggers the CBR system to retrieve the most similar case from the case base in order to support the treatment of liver disease. The new case is supported by a similarity ratio, and the CBR diagnostic accuracy rate is 90.00%. Actual implementation shows that the intelligent diagnosis model is capable of integrating CART and CBR techniques to examine liver diseases with considerable accuracy. The model can be used as a supporting system in making decisions regarding liver disease diagnosis and treatment. The rules extracted from CART are helpful to physicians in diagnosing liver diseases. CBR can retrieve the most similar case from the case base in order to solve a new liver disease problem and can be of great assistance to physicians in identifying the type of liver disease, reducing diagnostic errors and improving the quality and effectiveness of medical treatment.
Collapse
Affiliation(s)
- Rong-Ho Lin
- Department of Industrial Engineering and Management, National Taipei University of Technology, Taiwan, ROC.
| |
Collapse
|
11
|
Kim YS, Yoon CN. Methodology of the thyroid gland disease decision-making using profiling in steroid hormone pathway. J Pharm Biomed Anal 2007; 43:1100-5. [PMID: 17081718 DOI: 10.1016/j.jpba.2006.09.032] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2006] [Revised: 09/18/2006] [Accepted: 09/20/2006] [Indexed: 11/25/2022]
Abstract
To find out the genetic factors of outbreak of thyroid gland disease, we developed the thyroid gland decision-making system, which processes the metabolic profile in steroid hormone map using a statistical method. Metabolic profile is a measured data of lots of mixed materials that includes not only known metabolites, but also unknown ones, which is estimated to have an influence on the thyroid gland disease. Therefore, to develop thyroid gland disease decision-making system, analyzing metabolic profile containing multi-materials would be useful for diagnosing thyroid gland disease. Because experimental values used for system construction are area values for the retention time, the observations are preprocessed through variable transition and t-test to use the area values concurrently and the highly correlated materials are estimated by principal component analysis. The thyroid gland decision-making system developed through the logistic regression is an excellent system demonstrating 98.7% accuracy in the classification table.
Collapse
Affiliation(s)
- Young Sun Kim
- Bioanalysis and Biotransformation Research Center, Korea Institute of Science and Technology, P.O. Box 131, Cheongryang, Seoul 130-650, Republic of Korea
| | | |
Collapse
|
12
|
Yamamura S, Kawada K, Takehira R, Nishizawa K, Katayama S, Hirano M, Momose Y. Artificial neural network modeling to predict the plasma concentration of aminoglycosides in burn patients. Biomed Pharmacother 2004; 58:239-44. [PMID: 15183849 DOI: 10.1016/j.biopha.2003.12.012] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2003] [Accepted: 12/23/2003] [Indexed: 10/26/2022] Open
Abstract
The goal was to use an artificial neural network model to predict the plasma concentration of aminoglycosides in burn patients and identify patients whose plasma antibiotic concentration would be sub-therapeutic based on the patients' physiological data and taking into account burn severity. Physiological data and some indicators of burn severity were collected from 30 burn patients who received arbekacin. A three-layer artificial neural network with five neurons in the hidden layer was used to predict the plasma concentration of arbekacin. Linear modeling for prediction of plasma concentration and logistic regression modeling for the classification of patients were also used and the predictive performance was compared to results from the artificial neural network model. Dose, body mass index, serum creatinine concentration and amount of parenteral fluid were selected as covariates for the plasma concentration of arbekacin. Area of burn after skin graft was a good covariate for indicating burn severity. Predictive performance of the artificial neural network model including burn severity was much better than linear modeling and logistic regression analysis. An artificial neural network model should be helpful for the prediction of plasma concentration using patients' physiological data, and burn severity should be included for improved prediction in burn patients. Because the relationship between burn severity and plasma concentration of aminoglycosides is thought to be nonlinear, it is not surprising that the artificial neural network model showed better predictive performance compared to the linear or logistic regression models.
Collapse
Affiliation(s)
- Shigeo Yamamura
- School of Pharmaceutical Sciences, Toho University, Miyama 2-2-1, Funabashi, Chiba 274-8510, Japan.
| | | | | | | | | | | | | |
Collapse
|