1
|
Lilhore UK, Manoharan P, Sandhu JK, Simaiya S, Dalal S, Baqasah AM, Alsafyani M, Alroobaea R, Keshta I, Raahemifar K. Hybrid model for precise hepatitis-C classification using improved random forest and SVM method. Sci Rep 2023; 13:12473. [PMID: 37528148 PMCID: PMC10394001 DOI: 10.1038/s41598-023-36605-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Accepted: 06/07/2023] [Indexed: 08/03/2023] Open
Abstract
Hepatitis C Virus (HCV) is a viral infection that causes liver inflammation. Annually, approximately 3.4 million cases of HCV are reported worldwide. A diagnosis of HCV in earlier stages helps to save lives. In the HCV review, the authors used a single ML-based prediction model in the current research, which encounters several issues, i.e., poor accuracy, data imbalance, and overfitting. This research proposed a Hybrid Predictive Model (HPM) based on an improved random forest and support vector machine to overcome existing research limitations. The proposed model improves a random forest method by adding a bootstrapping approach. The existing RF method is enhanced by adding a bootstrapping process, which helps eliminate the tree's minor features iteratively to build a strong forest. It improves the performance of the HPM model. The proposed HPM model utilizes a 'Ranker method' to rank the dataset features and applies an IRF with SVM, selecting higher-ranked feature elements to build the prediction model. This research uses the online HCV dataset from UCI to measure the proposed model's performance. The dataset is highly imbalanced; to deal with this issue, we utilized the synthetic minority over-sampling technique (SMOTE). This research performs two experiments. The first experiment is based on data splitting methods, K-fold cross-validation, and training: testing-based splitting. The proposed method achieved an accuracy of 95.89% for k = 5 and 96.29% for k = 10; for the training and testing-based split, the proposed method achieved 91.24% for 80:20 and 92.39% for 70:30, which is the best compared to the existing SVM, MARS, RF, DT, and BGLM methods. In experiment 2, the analysis is performed using feature selection (with SMOTE and without SMOTE). The proposed method achieves an accuracy of 41.541% without SMOTE and 96.82% with SMOTE-based feature selection, which is better than existing ML methods. The experimental results prove the importance of feature selection to achieve higher accuracy in HCV research.
Collapse
Affiliation(s)
- Umesh Kumar Lilhore
- Department of Computer Science and Engineering, Chandigarh University, Gharuan, Mohali, Punjab, 140413, India
| | - Poongodi Manoharan
- College of Science and Engineering, Qatar Foundation, Hamad Bin Khalifa University, Doha, Qatar.
| | - Jasminder Kaur Sandhu
- Department of Computer Science and Engineering, Chandigarh University, Gharuan, Mohali, Punjab, 140413, India
| | - Sarita Simaiya
- Apex Institute of Technology (CSE), Chandigarh University, Gharuan, Mohali, Punjab, 140413, India
| | - Surjeet Dalal
- Amity School of Engineering and Technology, Amity University Haryana, Gurugram, India
| | - Abdullah M Baqasah
- Department of Information Technology, College of Computers and Information Technology, Taif University, Taif, 21974, Saudi Arabia
| | - Majed Alsafyani
- Department of Computer Science, College of Computers and Information Technology, Taif University, P. O. Box 11099, Taif, 21944, Saudi Arabia
| | - Roobaea Alroobaea
- Department of Computer Science, College of Computers and Information Technology, Taif University, P. O. Box 11099, Taif, 21944, Saudi Arabia
| | - Ismail Keshta
- Computer Science and Information Systems Department, College of Applied Sciences, AlMaarefa University, Riyadh, Saudi Arabia
| | - Kaamran Raahemifar
- College of Information Sciences and Technology, Data Science and Artificial Intelligence Program, Penn State University, State College, PA, 16801, USA
- School of Optometry and Vision Science, Faculty of Science, University of Waterloo, 200 University, Waterloo, ON, N2L3G1, Canada
- Faculty of Engineering, University of Waterloo, 200 University Ave W, Waterloo, Canada
| |
Collapse
|
2
|
Reinharz V, Churkin A, Dahari H, Barash D. Advances in Parameter Estimation and Learning from Data for Mathematical Models of Hepatitis C Viral Kinetics. Mathematics 2022; 10:2136. [DOI: 10.3390/math10122136] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Mathematical models, some of which incorporate both intracellular and extracellular hepatitis C viral kinetics, have been advanced in recent years for studying HCV–host dynamics, antivirals mode of action, and their efficacy. The standard ordinary differential equation (ODE) hepatitis C virus (HCV) kinetic model keeps track of uninfected cells, infected cells, and free virus. In multiscale models, a fourth partial differential equation (PDE) accounts for the intracellular viral RNA (vRNA) kinetics in an infected cell. The PDE multiscale model is substantially more difficult to solve compared to the standard ODE model, with governing differential equations that are stiff. In previous contributions, we developed and implemented stable and efficient numerical methods for the multiscale model for both the solution of the model equations and parameter estimation. In this contribution, we perform sensitivity analysis on model parameters to gain insight into important properties and to ensure our numerical methods can be safely used for HCV viral dynamic simulations. Furthermore, we generate in-silico patients using the multiscale models to perform machine learning from the data, which enables us to remove HCV measurements on certain days and still be able to estimate meaningful observations with a sufficiently small error.
Collapse
|
3
|
Goyal A, Churkin A, Barash D, Cotler SJ, Shlomai A, Etzion O, Dahari H. Modeling-Based Response-Guided DAA Therapy for Chronic Hepatitis C to Identify Individuals for Shortening Treatment Duration. Open Forum Infect Dis 2022; 9:ofac157. [PMID: 35493122 PMCID: PMC9045946 DOI: 10.1093/ofid/ofac157] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Accepted: 03/18/2022] [Indexed: 01/04/2023] Open
Abstract
Shortening duration of direct-acting antiviral therapy for chronic hepatitis C could provide cost savings, reduce medication exposure, and foster adherence and treatment completion in special populations. The current analysis indicates that measuring hepatitis C virus at baseline and on days 7 and 14 of therapy can identify patients for shortening therapy duration.
Collapse
Affiliation(s)
- Ashish Goyal
- The Program for Experimental and Theoretical Modeling, Division of Hepatology, Stritch School of Medicine, Loyola University Chicago, Maywood, Illinois, USA,Current affiliation: Medicine Design, Pharmacokinetics, Dynamics, & Metabolism, Pfizer Worldwide R&D, Cambridge, Massachusetts, USA
| | - Alex Churkin
- Department of Software Engineering, Sami Shamoon College of Engineering, Beer-Sheba, Israel
| | - Danny Barash
- Department of Computer Science, Ben-Gurion University, Beer-Sheba, Israel
| | - Scott J Cotler
- The Program for Experimental and Theoretical Modeling, Division of Hepatology, Stritch School of Medicine, Loyola University Chicago, Maywood, Illinois, USA
| | - Amir Shlomai
- Department of Medicine D and The Liver Institute, Rabin Medical Center, Beilinson Hospital, Petah-Tikva and the Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Ohad Etzion
- Soroka University Medical Center, Beer-Sheba, Israel,The Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer-Sheba, Israel
| | - Harel Dahari
- The Program for Experimental and Theoretical Modeling, Division of Hepatology, Stritch School of Medicine, Loyola University Chicago, Maywood, Illinois, USA,Correspondence: Harel Dahari, PhD, Division of Hepatology, Stritch School of Medicine, Loyola University Chicago, 2160 S. First Ave, Maywood, IL 60153 ()
| |
Collapse
|
4
|
Lederman D, Patel R, Itani O, Rotstein HG. Parameter Estimation in the Age of Degeneracy and Unidentifiability. Mathematics 2022; 10:170. [DOI: 10.3390/math10020170] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Parameter estimation from observable or experimental data is a crucial stage in any modeling study. Identifiability refers to one’s ability to uniquely estimate the model parameters from the available data. Structural unidentifiability in dynamic models, the opposite of identifiability, is associated with the notion of degeneracy where multiple parameter sets produce the same pattern. Therefore, the inverse function of determining the model parameters from the data is not well defined. Degeneracy is not only a mathematical property of models, but it has also been reported in biological experiments. Classical studies on structural unidentifiability focused on the notion that one can at most identify combinations of unidentifiable model parameters. We have identified a different type of structural degeneracy/unidentifiability present in a family of models, which we refer to as the Lambda-Omega (Λ-Ω) models. These are an extension of the classical lambda-omega (λ-ω) models that have been used to model biological systems, and display a richer dynamic behavior and waveforms that range from sinusoidal to square wave to spike like. We show that the Λ-Ω models feature infinitely many parameter sets that produce identical stable oscillations, except possible for a phase shift (reflecting the initial phase). These degenerate parameters are not identifiable combinations of unidentifiable parameters as is the case in structural degeneracy. In fact, reducing the number of model parameters in the Λ-Ω models is minimal in the sense that each one controls a different aspect of the model dynamics and the dynamic complexity of the system would be reduced by reducing the number of parameters. We argue that the family of Λ-Ω models serves as a framework for the systematic investigation of degeneracy and identifiability in dynamic models and for the investigation of the interplay between structural and other forms of unidentifiability resulting on the lack of information from the experimental/observational data.
Collapse
|