1
|
Gao P, Chen Z, Liu X, Chen P, Matsubara Y, Sakurai Y. Antimicrobial resistance recommendations via electronic health records with graph representation and patient population modeling. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2025; 261:108616. [PMID: 39913994 DOI: 10.1016/j.cmpb.2025.108616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Revised: 11/24/2024] [Accepted: 01/22/2025] [Indexed: 02/21/2025]
Abstract
BACKGROUND Antimicrobial resistance (AMR), which refers to the ability of pathogenic bacteria to withstand the effects of antibiotics, is a critical global health issue. Traditional methods for identifying AMRs in clinical settings rely on in-lab testing, which hampers timely medical decision-making. Moreover, there is a notable delay in updating empirical treatment guidelines in response to the rapid evolution of pathogens. Recent advances in AMR research have illuminated the potential of machine learning-based patient information analysis using electronic health records (EHRs). METHODS Against this backdrop, our study introduces a novel deep learning framework designed to leverage EHR data for generating AMR recommendations. This framework is anchored in three critical innovations. Firstly, we employ a deep graph neural network to model the correlations between various medical events, using structural information to enhance the representation of binary medical events. Secondly, in acknowledgment of the commonalities in pathogen evolution among populations, we incorporate population-level observation by modeling patient graphical structures. This strategy also addresses the issue of imbalance in rare AMR labels. Finally, we adopt a multi-task learning strategy, enabling simultaneous recommendations on multiple AMRs. Extensive experimental evaluations on a large dataset of over 110,000 patients with urinary tract infections validate the superiority of our approach. RESULTS It achieves notable improvements in areas under receiver operating characteristic curves (AUROCs) for four distinct AMR labels, with increments of 0.04, 0.02, 0.06, and 0.10 surpassing the baselines. CONCLUSIONS Further medical analysis underscores the efficacy of our approach, demonstrating the potential of EHR-based systems in AMR recommendation.
Collapse
Affiliation(s)
- Pei Gao
- Nara Institute of Science and Technology (NAIST), Ikoma, 630-0101, Nara, Japan
| | - Zheng Chen
- ISIR, Osaka University, Suita, 567-0047, Osaka, Japan.
| | - Xin Liu
- National Institute of Advanced Industrial Science and Technology, 135-0064, Tokyo, Japan.
| | - Peng Chen
- RIKEN Center for Computational Science, Kobe, 650-0047, Hyogo, Japan
| | | | | |
Collapse
|
2
|
Elshewey AM, Shams MY, Tawfeek SM, Alharbi AH, Ibrahim A, Abdelhamid AA, Eid MM, Khodadadi N, Abualigah L, Khafaga DS, Tarek Z. Optimizing HCV Disease Prediction in Egypt: The hyOPTGB Framework. Diagnostics (Basel) 2023; 13:3439. [PMID: 37998575 PMCID: PMC10670002 DOI: 10.3390/diagnostics13223439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 11/04/2023] [Accepted: 11/08/2023] [Indexed: 11/25/2023] Open
Abstract
The paper focuses on the hepatitis C virus (HCV) infection in Egypt, which has one of the highest rates of HCV in the world. The high prevalence is linked to several factors, including the use of injection drugs, poor sterilization practices in medical facilities, and low public awareness. This paper introduces a hyOPTGB model, which employs an optimized gradient boosting (GB) classifier to predict HCV disease in Egypt. The model's accuracy is enhanced by optimizing hyperparameters with the OPTUNA framework. Min-Max normalization is used as a preprocessing step for scaling the dataset values and using the forward selection (FS) wrapped method to identify essential features. The dataset used in the study contains 1385 instances and 29 features and is available at the UCI machine learning repository. The authors compare the performance of five machine learning models, including decision tree (DT), support vector machine (SVM), dummy classifier (DC), ridge classifier (RC), and bagging classifier (BC), with the hyOPTGB model. The system's efficacy is assessed using various metrics, including accuracy, recall, precision, and F1-score. The hyOPTGB model outperformed the other machine learning models, achieving a 95.3% accuracy rate. The authors also compared the hyOPTGB model against other models proposed by authors who used the same dataset.
Collapse
Affiliation(s)
- Ahmed M. Elshewey
- Computer Science Department, Faculty of Computers and Information, Suez University, Suez 43533, Egypt
| | - Mahmoud Y. Shams
- Faculty of Artificial Intelligence, Kafrelsheikh University, Kafrelsheikh 33516, Egypt
| | - Sayed M. Tawfeek
- Department of Communications and Electronics, Delta Higher Institute of Engineering and Technology, Mansoura 35111, Egypt
| | - Amal H. Alharbi
- Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Abdelhameed Ibrahim
- Computer Engineering and Control Systems Department, Faculty of Engineering, Mansoura University, Mansoura 35516, Egypt
| | - Abdelaziz A. Abdelhamid
- Department of Computer Science, Faculty of Computer and Information Sciences, Ain Shams University, Cairo 11566, Egypt
- Department of Computer Science, College of Computing and Information Technology, Shaqra University, Shaqra 11961, Saudi Arabia
| | - Marwa M. Eid
- Department of Communications and Electronics, Delta Higher Institute of Engineering and Technology, Mansoura 35111, Egypt
- Faculty of Artificial Intelligence, Delta University for Science and Technology, Mansoura 35712, Egypt
| | - Nima Khodadadi
- Department of Civil and Architectural Engineering, University of Miami, Coral Gables, FL 33146, USA;
| | - Laith Abualigah
- Computer Science Department, Prince Hussein Bin Abdullah Faculty for Information Technology, Al al-Bayt University, Mafraq 25113, Jordan
- Department of Electrical and Computer Engineering, Lebanese American University, Byblos 13-5053, Lebanon
- Hourani Center for Applied Scientific Research, Al-Ahliyya Amman University, Amman 19328, Jordan
- MEU Research Unit, Middle East University, Amman 11831, Jordan
- Applied Science Research Center, Applied Science Private University, Amman 11931, Jordan
- School of Computer Sciences, Universiti Sains Malaysia, Gelugor 11800, Malaysia
- School of Engineering and Technology, Sunway University Malaysia, Petaling Jaya 27500, Malaysia
| | - Doaa Sami Khafaga
- Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Zahraa Tarek
- Computer Science Department, Faculty of Computers and Information, Mansoura University, Mansoura 35561, Egypt
| |
Collapse
|
3
|
Moulaei K, Sharifi H, Bahaadinbeigy K, Haghdoost AA, Nasiri N. Machine learning for prediction of viral hepatitis: A systematic review and meta-analysis. Int J Med Inform 2023; 179:105243. [PMID: 37806178 DOI: 10.1016/j.ijmedinf.2023.105243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 09/21/2023] [Accepted: 10/01/2023] [Indexed: 10/10/2023]
Abstract
BACKGROUND Lack of accurate and timely diagnosis of hepatitis poses obstacles to effective treatment, disease progression prevention, complication reduction, and life-saving interventions of patients. Utilizing machine learning can greatly enhance the achievement of timely and precise disease diagnosis. Therefore, we carried out this systematic review and meta-analysis to explore the performance of machine learning algorithms in predicting viral hepatitis. METHODS Using an extensive literature search in PubMed, Scopus, and Web of Science databases until June 15, 2023, English publications on hepatitis prediction using machine learning algorithms were included. Two authors independently extracted pertinent information from the selected studies. The PRISMA 2020 checklist was followed for study selection and result reporting. The risk of bias was checked using the International Journal of Medical Informatics (IJMEDI) checklist. Data were analyzed using the 'metandi' command in Stata 17. RESULTS Twenty-one original studies were included, covering 82 algorithms. Sixteen studies utilized five algorithms to predict hepatitis B. Ten studies used five algorithms for hepatitis C prediction. For hepatitis B prediction, the SVM algorithms demonstrated the highest sensitivity (90.0%; 95% confidence interval (CI): 77.0%-96.0%), specificity (94%; 95% CI: 90.0%-97.0%), and a diagnostic odds ratio (DOR) of 145 (95% CI: 37.0-559.0). In the case of hepatitis C, the KNN algorithms exhibited the highest sensitivity (80%; 95% CI:30.0%-97.0%), specificity (95%; 95% CI: 58.0%-99.0%), and DOR (72; 95% CI: 3.0-1644.0) for prediction. CONCLUSION SVM and KNN demonstrated superior performance in predicting hepatitis. The proper algorithm along with clinical practice could improve hepatitis prediction and management.
Collapse
Affiliation(s)
- Khadijeh Moulaei
- Department of Health Information Technology, Faculty of Paramedical, Ilam University of Medical Sciences, Ilam, Iran
| | - Hamid Sharifi
- HIV/STI Surveillance Research Center, and WHO Collaborating Center for HIV Surveillance, Institute for Futures Studies in Health, Kerman University of Medical Sciences, Kerman, Iran
| | | | - Ali Akbar Haghdoost
- Modeling in Health Research Center, Institute for Futures Studies in Health, Kerman University of Medical Sciences, Kerman, Iran
| | - Naser Nasiri
- School of Public Health, Jiroft University of Medical Sciences, Jiroft, Kerman, Iran.
| |
Collapse
|
4
|
Lilhore UK, Manoharan P, Sandhu JK, Simaiya S, Dalal S, Baqasah AM, Alsafyani M, Alroobaea R, Keshta I, Raahemifar K. Hybrid model for precise hepatitis-C classification using improved random forest and SVM method. Sci Rep 2023; 13:12473. [PMID: 37528148 PMCID: PMC10394001 DOI: 10.1038/s41598-023-36605-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Accepted: 06/07/2023] [Indexed: 08/03/2023] Open
Abstract
Hepatitis C Virus (HCV) is a viral infection that causes liver inflammation. Annually, approximately 3.4 million cases of HCV are reported worldwide. A diagnosis of HCV in earlier stages helps to save lives. In the HCV review, the authors used a single ML-based prediction model in the current research, which encounters several issues, i.e., poor accuracy, data imbalance, and overfitting. This research proposed a Hybrid Predictive Model (HPM) based on an improved random forest and support vector machine to overcome existing research limitations. The proposed model improves a random forest method by adding a bootstrapping approach. The existing RF method is enhanced by adding a bootstrapping process, which helps eliminate the tree's minor features iteratively to build a strong forest. It improves the performance of the HPM model. The proposed HPM model utilizes a 'Ranker method' to rank the dataset features and applies an IRF with SVM, selecting higher-ranked feature elements to build the prediction model. This research uses the online HCV dataset from UCI to measure the proposed model's performance. The dataset is highly imbalanced; to deal with this issue, we utilized the synthetic minority over-sampling technique (SMOTE). This research performs two experiments. The first experiment is based on data splitting methods, K-fold cross-validation, and training: testing-based splitting. The proposed method achieved an accuracy of 95.89% for k = 5 and 96.29% for k = 10; for the training and testing-based split, the proposed method achieved 91.24% for 80:20 and 92.39% for 70:30, which is the best compared to the existing SVM, MARS, RF, DT, and BGLM methods. In experiment 2, the analysis is performed using feature selection (with SMOTE and without SMOTE). The proposed method achieves an accuracy of 41.541% without SMOTE and 96.82% with SMOTE-based feature selection, which is better than existing ML methods. The experimental results prove the importance of feature selection to achieve higher accuracy in HCV research.
Collapse
Affiliation(s)
- Umesh Kumar Lilhore
- Department of Computer Science and Engineering, Chandigarh University, Gharuan, Mohali, Punjab, 140413, India
| | - Poongodi Manoharan
- College of Science and Engineering, Qatar Foundation, Hamad Bin Khalifa University, Doha, Qatar.
| | - Jasminder Kaur Sandhu
- Department of Computer Science and Engineering, Chandigarh University, Gharuan, Mohali, Punjab, 140413, India
| | - Sarita Simaiya
- Apex Institute of Technology (CSE), Chandigarh University, Gharuan, Mohali, Punjab, 140413, India
| | - Surjeet Dalal
- Amity School of Engineering and Technology, Amity University Haryana, Gurugram, India
| | - Abdullah M Baqasah
- Department of Information Technology, College of Computers and Information Technology, Taif University, Taif, 21974, Saudi Arabia
| | - Majed Alsafyani
- Department of Computer Science, College of Computers and Information Technology, Taif University, P. O. Box 11099, Taif, 21944, Saudi Arabia
| | - Roobaea Alroobaea
- Department of Computer Science, College of Computers and Information Technology, Taif University, P. O. Box 11099, Taif, 21944, Saudi Arabia
| | - Ismail Keshta
- Computer Science and Information Systems Department, College of Applied Sciences, AlMaarefa University, Riyadh, Saudi Arabia
| | - Kaamran Raahemifar
- College of Information Sciences and Technology, Data Science and Artificial Intelligence Program, Penn State University, State College, PA, 16801, USA
- School of Optometry and Vision Science, Faculty of Science, University of Waterloo, 200 University, Waterloo, ON, N2L3G1, Canada
- Faculty of Engineering, University of Waterloo, 200 University Ave W, Waterloo, Canada
| |
Collapse
|
5
|
Mamdouh Farghaly H, Shams MY, Abd El-Hafeez T. Hepatitis C Virus prediction based on machine learning framework: a real-world case study in Egypt. Knowl Inf Syst 2023; 65:2595-2617. [DOI: 10.1007/s10115-023-01851-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Revised: 01/26/2023] [Accepted: 02/13/2023] [Indexed: 03/06/2023]
Abstract
AbstractPrediction and classification of diseases are essential in medical science, as it attempts to immune the spread of the disease and discover the infected regions from the early stages. Machine learning (ML) approaches are commonly used for predicting and classifying diseases that are precisely utilized as an efficient tool for doctors and specialists. This paper proposes a prediction framework based on ML approaches to predict Hepatitis C Virus among healthcare workers in Egypt. We utilized real-world data from the National Liver Institute, founded at Menoufiya University (Menoufiya, Egypt). The collected dataset consists of 859 patients with 12 different features. To ensure the robustness and reliability of the proposed framework, we performed two scenarios: the first without feature selection and the second after the features are selected based on sequential forward selection (SFS). Furthermore, the feature subset selected based on the generated features from SFS is evaluated. Naïve Bayes, random forest (RF), K-nearest neighbor, and logistic regression are utilized as induction algorithms and classifiers for model evaluation. Then, the effect of parameter tuning on learning techniques is measured. The experimental results indicated that the proposed framework achieved higher accuracies after SFS selection than without feature selection. Moreover, the RF classifier achieved 94.06% accuracy with a minimum learning elapsed time of 0.54 s. Finally, after adjusting the hyperparameter values of the RF classifier, the classification accuracy is improved to 94.88% using only four features.
Collapse
|
6
|
Dhivya P, Bazilabanu A. Deep hyper optimization approach for disease classification using artificial intelligence. DATA KNOWL ENG 2023. [DOI: 10.1016/j.datak.2023.102147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
|
7
|
Iraji MS, Tanha J, Habibinejad M. Druggable protein prediction using a multi-canal deep convolutional neural network based on autocovariance method. Comput Biol Med 2022; 151:106276. [PMID: 36410099 DOI: 10.1016/j.compbiomed.2022.106276] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 10/18/2022] [Accepted: 10/30/2022] [Indexed: 11/09/2022]
Abstract
Drug targets must be identified and positioned correctly to research and manufacture new drugs. In this study, rather than using traditional methods for drug expansion, the drug target is determined using machine learning. Machine learning has generated significant interest and desire in recent years and extensive research due to its low cost and speed of operation. As a result, it is critical to develop an intelligent classification system for drug proteins. This study proposes two distinct models for the prediction of druggable protein classes based on the deep learning method. The translation of drug-protein sequences is based on six physicochemical properties of amino acids. Following the application of the autocovariance method, converted sequences are used as fixed-length input vectors in deep stacked sparse auto-encoders (DSSAEs) network. The coded protein sequences are also considered and utilized as a six-channel input vector for the deep convolutional neural network model. The experimental results contributing to the deep convolution model are more efficient than previous studies for classifying druggable proteins. The proposed approach achieved a sensitivity of 96.92%, a specificity of 99.51%, and an accuracy of 98.29%.
Collapse
Affiliation(s)
- Mohammad Saber Iraji
- Department of Computer Engineering and Information Technology, Payame Noor University, Tehran, Iran; Department of Computer Engineering, University of Tabriz, Tabriz, Iran.
| | - Jafar Tanha
- Department of Computer Engineering, University of Tabriz, Tabriz, Iran
| | - Mahboobeh Habibinejad
- Department of Computer Engineering and Information Technology, Payame Noor University, Tehran, Iran
| |
Collapse
|
8
|
Martínez JA, Alonso-Bernáldez M, Martínez-Urbistondo D, Vargas-Nuñez JA, Ramírez de Molina A, Dávalos A, Ramos-Lopez O. Machine learning insights concerning inflammatory and liver-related risk comorbidities in non-communicable and viral diseases. World J Gastroenterol 2022; 28:6230-6248. [PMID: 36504554 PMCID: PMC9730439 DOI: 10.3748/wjg.v28.i44.6230] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/11/2022] [Revised: 10/07/2022] [Accepted: 11/16/2022] [Indexed: 11/25/2022] Open
Abstract
The liver is a key organ involved in a wide range of functions, whose damage can lead to chronic liver disease (CLD). CLD accounts for more than two million deaths worldwide, becoming a social and economic burden for most countries. Among the different factors that can cause CLD, alcohol abuse, viruses, drug treatments, and unhealthy dietary patterns top the list. These conditions prompt and perpetuate an inflammatory environment and oxidative stress imbalance that favor the development of hepatic fibrogenesis. High stages of fibrosis can eventually lead to cirrhosis or hepatocellular carcinoma (HCC). Despite the advances achieved in this field, new approaches are needed for the prevention, diagnosis, treatment, and prognosis of CLD. In this context, the scientific com-munity is using machine learning (ML) algorithms to integrate and process vast amounts of data with unprecedented performance. ML techniques allow the integration of anthropometric, genetic, clinical, biochemical, dietary, lifestyle and omics data, giving new insights to tackle CLD and bringing personalized medicine a step closer. This review summarizes the investigations where ML techniques have been applied to study new approaches that could be used in inflammatory-related, hepatitis viruses-induced, and coronavirus disease 2019-induced liver damage and enlighten the factors involved in CLD development.
Collapse
Affiliation(s)
- J Alfredo Martínez
- Precision Nutrition and Cardiometabolic Health, Madrid Institute of Advanced Studies-Food Institute, Madrid 28049, Spain
| | - Marta Alonso-Bernáldez
- Precision Nutrition and Cardiometabolic Health, Madrid Institute of Advanced Studies-Food Institute, Madrid 28049, Spain
| | | | - Juan A Vargas-Nuñez
- Servicio de Medicina Interna, Hospital Universitario Puerta de Hierro Majadahonda, Madrid 28222, Majadahonda, Spain
| | - Ana Ramírez de Molina
- Molecular Oncology and Nutritional Genomics of Cancer, Madrid Institute of Advanced Studies-Food Institute, Madrid 28049, Spain
| | - Alberto Dávalos
- Laboratory of Epigenetics of Lipid Metabolism, Madrid Institute of Advanced Studies-Food Institute, Madrid 28049, Spain
| | - Omar Ramos-Lopez
- Medicine and Psychology School, Autonomous University of Baja California, Tijuana 22390, Baja California, Mexico
| |
Collapse
|
9
|
MAJZOOBI MOHAMMADMAHDI, NAMDAR SEPIDEH, NAJAFI-VOSOUGH ROYA, HAJILOOI ALIABBAS, MAHJUB HOSSEIN. Prediction of Hepatitis disease using ensemble learning methods. JOURNAL OF PREVENTIVE MEDICINE AND HYGIENE 2022; 63:E424-E428. [PMID: 36415304 PMCID: PMC9648545 DOI: 10.15167/2421-4248/jpmh2022.63.3.2515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 09/01/2022] [Indexed: 01/25/2023]
Abstract
OBJECTIVE Hepatitis is one of the chronic diseases that can lead to liver cirrhosis and hepatocellular carcinoma, which cause deaths around the world. Hence, early diagnosis is needed to control, treat, and reduce the effects of this disease. This study's main goal was to compare the performance of traditional and ensemble learning methods for predicting hepatitis B virus (HBV), and hepatitis C virus (HCV). Also, important variables related to HBV and HCV were identified. METHODS This case-control study was conducted in Hamadan Province, in the west of Iran, between 2014 to 2019. It included 534 subjects (267 cases and 267 controls). The bagging, random forest, AdaBoost, and logistic regression were used for predicting HBV and HCV. These methods' performance was evaluated using accuracy. RESULTS According to the results, the accuracy of bagging, random forest, Adaboost, and logistic regression were 0.65 ± 0.03, 0.66 ± 0.03, 0.62 ± 0.04, and 0.64 ± 0.03, respectively, with random forest showing the best performance for predicting HBV. This method showed that ALT was the most important variable for predicting HBV. The the accuracy of random forest was 0.77±0.03 for predicting HCV. Also, the random forest showed that the order of variable importance has belonged to AST, ALT, and age for predicting HCV. CONCLUSION This study showed that random forest performed better than other methods for predicting HBV and HCV.
Collapse
Affiliation(s)
- MOHAMMAD MAHDI MAJZOOBI
- Department of Infectious Diseases, Hamadan University of Medical Sciences, Hamadan, Iran
- Brucellosis Research Center, Hamadan University of Medical Sciences, Hamadan, Iran
| | - SEPIDEH NAMDAR
- Department of Infectious Diseases, Hamadan University of Medical Sciences, Hamadan, Iran
| | - ROYA NAJAFI-VOSOUGH
- Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran
| | | | - HOSSEIN MAHJUB
- Research Center for Health Sciences, Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran
- Correspondence: Hossein Mahjub, Center for Health Sciences, Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran. PO BOX: 65175-4171 - Tel.: +98 81 38380025 - Fax: +98 81 38380509 - E-mail:
| |
Collapse
|
10
|
Sharifi S, Lotfi Shahreza M, Pakdel A, Reecy JM, Ghadiri N, Atashi H, Motamedi M, Ebrahimie E. Systems Biology–Derived Genetic Signatures of Mastitis in Dairy Cattle: A New Avenue for Drug Repurposing. Animals (Basel) 2021; 12:ani12010029. [PMID: 35011134 PMCID: PMC8749881 DOI: 10.3390/ani12010029] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 11/21/2021] [Accepted: 12/17/2021] [Indexed: 02/07/2023] Open
Abstract
Simple Summary Therapeutic success of bovine mastitis depends mainly on accurately diagnosing the type of pathogen involved. Despite the development prospects for bovine mastitis diagnosis, including new biomarker discovery to target specific pathogens with high sensitivity and specificity, treatment studies have shown controversial results, and the most efficient, safe, and economical treatments for mastitis are still topics of scientific debate. The goal of this research is the integration of different levels of systems biology data to predict candidate drugs for the control and management of E. coli mastitis. We propose that the novel drugs could be used by pharmaceutical scientists or veterinarians to find commercially efficacious medicines. Abstract Mastitis, a disease with high incidence worldwide, is the most prevalent and costly disease in the dairy industry. Gram-negative bacteria such as Escherichia coli (E. coli) are assumed to be among the leading agents causing acute severe infection with clinical signs. E. Coli, environmental mastitis pathogens, are the primary etiological agents of bovine mastitis in well-managed dairy farms. Response to E. Coli infection has a complex pattern affected by genetic and environmental parameters. On the other hand, the efficacy of antibiotics and/or anti-inflammatory treatment in E. coli mastitis is still a topic of scientific debate, and studies on the treatment of clinical cases show conflicting results. Unraveling the bio-signature of mastitis in dairy cattle can open new avenues for drug repurposing. In the current research, a novel, semi-supervised heterogeneous label propagation algorithm named Heter-LP, which applies both local and global network features for data integration, was used to potentially identify novel therapeutic avenues for the treatment of E. coli mastitis. Online data repositories relevant to known diseases, drugs, and gene targets, along with other specialized biological information for E. coli mastitis, including critical genes with robust bio-signatures, drugs, and related disorders, were used as input data for analysis with the Heter-LP algorithm. Our research identified novel drugs such as Glibenclamide, Ipratropium, Salbutamol, and Carbidopa as possible therapeutics that could be used against E. coli mastitis. Predicted relationships can be used by pharmaceutical scientists or veterinarians to find commercially efficacious medicines or a combination of two or more active compounds to treat this infectious disease.
Collapse
Affiliation(s)
- Somayeh Sharifi
- Department of Animal Sciences, College of Agriculture, Isfahan University of Technology, Isfahan 84156-83111, Iran;
- Department of Animal Science, Iowa State University, Ames, IA 50011, USA;
- Correspondence: (S.S.); (E.E.)
| | - Maryam Lotfi Shahreza
- Department of Computer Engineering, Shahreza Campus, University of Isfahan, Isfahan 86149-56841, Iran;
| | - Abbas Pakdel
- Department of Animal Sciences, College of Agriculture, Isfahan University of Technology, Isfahan 84156-83111, Iran;
| | - James M. Reecy
- Department of Animal Science, Iowa State University, Ames, IA 50011, USA;
| | - Nasser Ghadiri
- Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan 84156-83111, Iran;
| | - Hadi Atashi
- Department of Animal Science, Shiraz University, Shiraz 71946-84334, Iran;
| | - Mahmood Motamedi
- Department of Animal Sciences, University of Tehran, Tehran 1417935840, Iran;
| | - Esmaeil Ebrahimie
- Genomics Research Platform, School of Life Sciences, College of Science, Health and Engineering, La Trobe University, Melbourne, VIC 3086, Australia
- School of Animal and Veterinary Sciences, The University of Adelaide, Adelaide, SA 5371, Australia
- School of BioSciences, The University of Melbourne, Melbourne, VIC 3010, Australia
- Correspondence: (S.S.); (E.E.)
| |
Collapse
|
11
|
He S, Leanse LG, Feng Y. Artificial intelligence and machine learning assisted drug delivery for effective treatment of infectious diseases. Adv Drug Deliv Rev 2021; 178:113922. [PMID: 34461198 DOI: 10.1016/j.addr.2021.113922] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Revised: 07/14/2021] [Accepted: 08/09/2021] [Indexed: 12/23/2022]
Abstract
In the era of antimicrobial resistance, the prevalence of multidrug-resistant microorganisms that resist conventional antibiotic treatment has steadily increased. Thus, it is now unquestionable that infectious diseases are significant global burdens that urgently require innovative treatment strategies. Emerging studies have demonstrated that artificial intelligence (AI) can transform drug delivery to promote effective treatment of infectious diseases. In this review, we propose to evaluate the significance, essential principles, and popular tools of AI in drug delivery for infectious disease treatment. Specifically, we will focus on the achievements and key findings of current research, as well as the applications of AI on drug delivery throughout the whole antimicrobial treatment process, with an emphasis on drug development, treatment regimen optimization, drug delivery system and administration route design, and drug delivery outcome prediction. To that end, the challenges of AI in drug delivery for infectious disease treatments and their current solutions and future perspective will be presented and discussed.
Collapse
Affiliation(s)
- Sheng He
- Boston Children's Hospital, Harvard Medical School, Harvard University, Boston, MA, USA.
| | - Leon G Leanse
- Massachusetts General Hospital, Harvard Medical School, Harvard University, Boston, MA, USA
| | - Yanfang Feng
- Massachusetts General Hospital, Harvard Medical School, Harvard University, Boston, MA, USA.
| |
Collapse
|
12
|
Feldman TC, Dienstag JL, Mandl KD, Tseng YJ. Machine-learning-based predictions of direct-acting antiviral therapy duration for patients with hepatitis C. Int J Med Inform 2021; 154:104562. [PMID: 34482150 DOI: 10.1016/j.ijmedinf.2021.104562] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Revised: 08/15/2021] [Accepted: 08/16/2021] [Indexed: 02/09/2023]
Abstract
INTRODUCTION Hepatitis C, which affects 71 million persons worldwide, is the most common blood-borne pathogen in the United States. Chronic infections can be treated effectively thanks to the availability of modern direct-acting antiviral (DAA) therapies. Real-world data on the duration of DAA therapy, which can be used to optimize and guide the course of therapy, may also be useful in determining quality of life enhancements based upon total required supply of medication and long-term improvements to quality of life. We developed a machine learning model to identify patient characteristics associated with prolonged DAA treatment duration. METHODS A nationwide U.S. commercial managed care plan with claims data that covers about 60 million beneficiaries from 2009 to 2019 were used in the retrospective study. We examined differences in age, gender, and multiple comorbidities among patients treated with different durations of DAA treatment. We also examined the performance of machine learning models for predicting a prolonged course of DAA based on the area under the receiver operating characteristic curve (AUC). RESULTS We identified 3943 cases with hepatitis C who received sofosbuvir/ledipasvir as the first course of DAA and were eligible for the study. Patients receiving prolonged treatment (n = 240, 6.1%) were more likely to have compensated cirrhosis, decompensated cirrhosis, and other comorbidities (P < 0.001). For distinguishing patients who received prolonged DAA treatment for hepatitis C from patients received standard treatment, the optimal predictive model, constructed with XGBoost, had an AUC of 0.745 ± 0.031 (P < 0.001). CONCLUSIONS The risk of antiviral resistance and the cost of DAA are strong motivators to ensure that first-round DAA therapy is effective. For the dominant DAA treatment during the course of this analysis, we present a model that identifies factors already captured in established guidelines and adds to those age, comorbidity burden, and type 2 diabetes status; patient characteristics that are predictive of extended treatment.
Collapse
Affiliation(s)
- Theodore C Feldman
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA; VA Boston Healthcare System, Boston, MA, USA
| | - Jules L Dienstag
- Gastrointestinal Unit, Massachusetts Hospital, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Kenneth D Mandl
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA; Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Yi-Ju Tseng
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA; Department of Information Management, National Central University, Taoyuan, Taiwan.
| |
Collapse
|
13
|
Karami K, Akbari M, Moradi MT, Soleymani B, Fallahi H. Survival prognostic factors in patients with acute myeloid leukemia using machine learning techniques. PLoS One 2021; 16:e0254976. [PMID: 34288963 PMCID: PMC8294525 DOI: 10.1371/journal.pone.0254976] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 07/07/2021] [Indexed: 12/26/2022] Open
Abstract
This paper identifies prognosis factors for survival in patients with acute myeloid leukemia (AML) using machine learning techniques. We have integrated machine learning with feature selection methods and have compared their performances to identify the most suitable factors in assessing the survival of AML patients. Here, six data mining algorithms including Decision Tree, Random Forrest, Logistic Regression, Naive Bayes, W-Bayes Net, and Gradient Boosted Tree (GBT) are employed for the detection model and implemented using the common data mining tool RapidMiner and open-source R package. To improve the predictive ability of our model, a set of features were selected by employing multiple feature selection methods. The accuracy of classification was obtained using 10-fold cross-validation for the various combinations of the feature selection methods and machine learning algorithms. The performance of the models was assessed by various measurement indexes including accuracy, kappa, sensitivity, specificity, positive predictive value, negative predictive value, and area under the ROC curve (AUC). Our results showed that GBT with an accuracy of 85.17%, AUC of 0.930, and the feature selection via the Relief algorithm has the best performance in predicting the survival rate of AML patients.
Collapse
Affiliation(s)
- Keyvan Karami
- Medical Biology Research Center, Kermanshah University of Medical Sciences, Kermanshah, Iran
- Department of Animal Science, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Mahboubeh Akbari
- Department of Statistics, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Mohammad-Taher Moradi
- Medical Biology Research Center, Kermanshah University of Medical Sciences, Kermanshah, Iran
| | - Bijan Soleymani
- Medical Biology Research Center, Kermanshah University of Medical Sciences, Kermanshah, Iran
- * E-mail: , (HF); (BS)
| | - Hossein Fallahi
- Department of Biology, School of Sciences, Razi University, Kermanshah, Iran
- * E-mail: , (HF); (BS)
| |
Collapse
|
14
|
Ahsan R, Tahsili MR, Ebrahimi F, Ebrahimie E, Ebrahimi M. Image processing unravels the evolutionary pattern of SARS-CoV-2 against SARS and MERS through position-based pattern recognition. Comput Biol Med 2021; 134:104471. [PMID: 34004573 PMCID: PMC8106241 DOI: 10.1016/j.compbiomed.2021.104471] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Revised: 04/27/2021] [Accepted: 05/02/2021] [Indexed: 12/16/2022]
Abstract
SARS-COV-2, Severe Acute Respiratory Syndrome (SARS), and the Middle East respiratory syndrome-related coronavirus (MERS) viruses are from the coronaviridae family; the former became a global pandemic (with low mortality rate) while the latter were confined to a limited region (with high mortality rates). To investigate the possible structural differences at basic levels for the three viruses, genomic and proteomic sequences were downloaded and converted to polynomial datasets. Seven attribute weighting (feature selection) models were employed to find the key differences in their genome's nucleotide sequence. Most attribute weighting models selected the final nucleotide sequences (from 29,000th nucleotide positions to the end of the genome) as significantly different among the three virus classes. The genome and proteome sequences of this hot zone area (which corresponds to the 3'UTR region and encodes for nucleoprotein (N)) and Spike (S) protein sequences (as the most important viral protein) were converted into binary images and were analyzed by image processing techniques and Convolutional deep Neural Network (CNN). Although the predictive accuracy of CNN for Spike (S) proteins was low (0.48%), the machine-based learning algorithms were able to classify the three members of coronaviridae viruses with 100% accuracy based on 3'UTR region. For the first time ever, the relationship between the possible structural differences of coronaviruses at the sequential levels and their pathogenesis are being reported, which paves the road to deciphering the high pathogenicity of the SARS-COV-2 virus.
Collapse
Affiliation(s)
- Reza Ahsan
- Department of Computer Engineering, Qom Branch, Islamic Azad University, Qom, Iran
| | | | - Faezeh Ebrahimi
- Faculty of Life Sciences and Biotechnology, Department of Microbiology and Microbial Biotechnology, Shahid Beheshti University, Tehran, Iran
| | - Esmaeil Ebrahimie
- Genomics Research Platform, School of Life Sciences, College of Science, Health and Engineering, La Trobe University, Melbourne, Victoria, 3086, Australia,School of Animal and Veterinary Sciences, The University of Adelaide, South Australia, 5371, Australia
| | - Mansour Ebrahimi
- Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran,School of Animal and Veterinary Sciences, The University of Adelaide, South Australia, 5371, Australia,Corresponding author. Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran
| |
Collapse
|
15
|
Nayak J, Naik B, Dinesh P, Vakula K, Rao BK, Ding W, Pelusi D. Intelligent system for COVID-19 prognosis: a state-of-the-art survey. APPL INTELL 2021; 51:2908-2938. [PMID: 34764577 PMCID: PMC7786871 DOI: 10.1007/s10489-020-02102-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/27/2020] [Indexed: 01/31/2023]
Abstract
This 21st century is notable for experiencing so many disturbances at economic, social, cultural, and political levels in the entire world. The outbreak of novel corona virus 2019 (COVID-19) has been treated as a Public Health crisis of global Concern by the World Health Organization (WHO). Various outbreak models for COVID-19 are being utilized by researchers throughout the world to get well-versed decisions and impose significant control measures. Amid the standard methods for COVID-19 worldwide epidemic prediction, easy statistical, as well as epidemiological methods have got more consideration by researchers and authorities. One main difficulty in controlling the spreading of COVID-19 is the inadequacy and lack of medical tests for detecting as well as identifying a solution. To solve this problem, a few statistical-based advances are being enhanced and turn into a partial resolution up-to some level. To deal with the challenges of the medical field, a broad range of intelligent based methods, frameworks, and equipment have been recommended by Machine Learning (ML) and Deep Learning. As ML and DL have the ability of identifying and predicting patterns in complex large datasets, they are recognized as a suitable procedure for producing effective solutions for the diagnosis of COVID-19. In this paper, a perspective research has been conducted in the applicability of intelligent systems such as ML, DL and others in solving COVID-19 related outbreak issues. The main intention behind this study is (i) to understand the importance of intelligent approaches such as ML and DL for COVID-19 pandemic, (ii) discussing the efficiency and impact of these methods in the prognosis of COVID-19, (iii) the growth in the development of type of ML and advanced ML methods for COVID-19 prognosis,(iv) analyzing the impact of data types and the nature of data along with challenges in processing the data for COVID-19,(v) to focus on some future challenges in COVID-19 prognosis to inspire the researchers for innovating and enhancing their knowledge and research on other impacted sectors due to COVID-19.
Collapse
Affiliation(s)
- Janmenjoy Nayak
- Department of Computer Science and Engineering, Aditya Institute of Technology and Management (AITAM), K Kotturu, Tekkali, AP 532201 India
| | - Bighnaraj Naik
- Department of Computer Application, Veer Surendra Sai University of Technology, Burla, Odisha 768018 India
| | - Paidi Dinesh
- Department of Computer Science and Engineering, Sri Sivani College of Engineering, Srikakulam, AP 532402 India
| | - Kanithi Vakula
- Department of Computer Science and Engineering, Sri Sivani College of Engineering, Srikakulam, AP 532402 India
| | - B. Kameswara Rao
- Department of Computer Science and Engineering, Aditya Institute of Technology and Management (AITAM), K Kotturu, Tekkali, AP 532201 India
| | - Weiping Ding
- School of Information Science and Technology, Nantong University, Nantong, China
| | - Danilo Pelusi
- Faculty of Communication Sciences, University of Teramo, Coste Sant', Agostino Campus, Teramo, Italy
| |
Collapse
|
16
|
Hu Y, Zhao T, Zhang N, Zhang Y, Cheng L. A Review of Recent Advances and Research on Drug Target Identification Methods. Curr Drug Metab 2019; 20:209-216. [PMID: 30251599 DOI: 10.2174/1389200219666180925091851] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2017] [Revised: 01/01/2018] [Accepted: 08/02/2018] [Indexed: 12/14/2022]
Abstract
BACKGROUND From a therapeutic viewpoint, understanding how drugs bind and regulate the functions of their target proteins to protect against disease is crucial. The identification of drug targets plays a significant role in drug discovery and studying the mechanisms of diseases. Therefore the development of methods to identify drug targets has become a popular issue. METHODS We systematically review the recent work on identifying drug targets from the view of data and method. We compiled several databases that collect data more comprehensively and introduced several commonly used databases. Then divided the methods into two categories: biological experiments and machine learning, each of which is subdivided into different subclasses and described in detail. RESULTS Machine learning algorithms are the majority of new methods. Generally, an optimal set of features is chosen to predict successful new drug targets with similar properties. The most widely used features include sequence properties, network topological features, structural properties, and subcellular locations. Since various machine learning methods exist, improving their performance requires combining a better subset of features and choosing the appropriate model for the various datasets involved. CONCLUSION The application of experimental and computational methods in protein drug target identification has become increasingly popular in recent years. Current biological and computational methods still have many limitations due to unbalanced and incomplete datasets or imperfect feature selection methods.
Collapse
Affiliation(s)
- Yang Hu
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Tianyi Zhao
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Ningyi Zhang
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Ying Zhang
- Department of Pharmacy, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin 150088, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| |
Collapse
|
17
|
Kargarfard F, Sami A, Hemmatzadeh F, Ebrahimie E. Identifying mutation positions in all segments of influenza genome enables better differentiation between pandemic and seasonal strains. Gene 2019; 697:78-85. [PMID: 30769139 DOI: 10.1016/j.gene.2019.01.014] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2018] [Revised: 12/29/2018] [Accepted: 01/17/2019] [Indexed: 01/08/2023]
Abstract
Influenza has a negative sense, single-stranded, and segmented RNA. In the context of pandemic influenza research, most studies have focused on variations in the surface proteins (Hemagglutinin and Neuraminidase). However, new findings suggest that all internal and external proteins of influenza viruses can contribute in pandemic emergence, pathogenicity and increasing host range. The occurrence of the 2009 influenza pandemic and the availability of many external and internal segments of pandemic and non-pandemic sequences offer a unique opportunity to evaluate the performance of machine learning models in discrimination of pandemic from seasonal sequences using mutation positions in all segments. In this study, we hypothesized that identifying mutation positions in all segments (proteins) encoded by the influenza genome would enable pandemic and seasonal strains to be more reliably distinguished. In a large scale study, we applied a range of data mining techniques to all segments of influenza for rule discovery and discrimination of pandemic from seasonal strains. CBA (classification based on association rule mining), Ripper and Decision tree algorithms were utilized to extract association rules among mutations. CBA outperformed the other models. Our approach could discriminate pandemic sequences from seasonal ones with more than 95% accuracy for PA and NP, 99.33% accuracy for NA and 100% accuracy, precision, specificity and sensitivity (recall) for M1, M2, PB1, NS1, and NS2. The values of precision, specificity, and sensitivity were more than 90% for other segments except PB2. If sequences of all segments of one strain were available, the accuracy of discrimination of pandemic strains was 100%. General rules extracted by rule base classification approaches, such as M1-V147I, NP-N334H, NS1-V112I, and PB1-L364I, were able to detect pandemic sequences with high accuracy. We observed that mutations on internal proteins of influenza can contribute in distinguishing the pandemic viruses, similar to the external ones.
Collapse
Affiliation(s)
- Fatemeh Kargarfard
- Faculty of Engineering and IT, University of Technology Sydney, New South Wales, Australia; Department of Computer Science and Engineering, School of Electrical Engineering and Computer, Shiraz University, Shiraz, Iran
| | - Ashkan Sami
- Department of Computer Science and Engineering, School of Electrical Engineering and Computer, Shiraz University, Shiraz, Iran
| | - Farhid Hemmatzadeh
- School of Animal and Veterinary Sciences, The University of Adelaide, Adelaide, Australia
| | - Esmaeil Ebrahimie
- School of Animal and Veterinary Sciences, The University of Adelaide, Adelaide, Australia; Genomics Research Platform, La Trobe University, Melbourne, Victoria 3086, Australia; School of Information Technology and Mathematical Sciences, Division of Information Technology Engineering & Environment, University of South Australia, Adelaide, Australia; School of Biological Sciences, Faculty of Science and Engineering, Flinders University, Adelaide, Australia.
| |
Collapse
|
18
|
Alanazi IO, Al Shehri ZS, Ebrahimie E, Giahi H, Mohammadi-Dehcheshmeh M. Non-coding and coding genomic variants distinguish prostate cancer, castration-resistant prostate cancer, familial prostate cancer, and metastatic castration-resistant prostate cancer from each other. Mol Carcinog 2019; 58:862-874. [PMID: 30644608 DOI: 10.1002/mc.22975] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Revised: 01/07/2019] [Accepted: 01/08/2019] [Indexed: 12/11/2022]
Abstract
A considerable number of deposited variants has provided new possibilities for knowledge discovery in different types of prostate cancer. Here, we analyzed variants located on 3'UTR, 5'UTR, CDs, Intergenic, and Intronic regions in castration-resistant prostate cancer (8496 variants), familial prostate cancer (3241 variants), metastatic castration-resistant prostate cancer (3693 variants), and prostate cancer (16599 variants). Chromosome regions 10p15-p14 and 2p13 were highly enriched (P < 0.00001) for variants located in 3'UTR, 5'UTR, CDs, intergenic, and intronic regions in castration-resistant prostate cancer. In contrast, 10p15-p14, 10q23.3, 12q13.11, 13q12.3, 1q25, and 8p22 regions were enriched (P < 0.001) in familial prostate cancer. In metastatic castration-resistant prostate cancer, 10p15-p14, 10q23.3, 11q22-q23, 14q21.1, and 14q32.13 were highly variant regions (P < 0.001). Chromosome 2 and chromosome 1 hosted many enriched variant regions. AKR1C3, BRCA1, BRCA2, CHGA, CYP19A1, HOXB13, KLK3, and PTEN contained the highest number of 3'UTR, 5'UTR, CDs, Intergenic, and Intronic variants. Network analysis showed that these genes are upstream of important functions including prostate gland development, tumor recurrence, prostate cancer-specific survival, tumor progression, cancer mortality, long-term survival, cancer recurrence, angiogenesis, and AR. Interestingly, all of EGFR, JAK2, NR3C1, PDZD2, and SEMA3C genes had single nucleotide polymorphisms (SNP) in castration-resistant prostate cancer, consistent with high selection pressure on these genes during drug treatment and consequent resistance. High occurrence of variants in 3'UTRs suggests the importance of regulatory variants in different types of prostate cancer; an area that has been neglected compared with coding variants. This study provides a comprehensive overview of genomic regions contributing to different types of prostate cancer.
Collapse
Affiliation(s)
- Ibrahim O Alanazi
- National Center for Biotechnology, Life Science and Environment Research Institute, King Abdulaziz City for Science and Technology (KACST), Riyadh, Saudi Arabia
| | - Zafer S Al Shehri
- Clinical Laboratory Department, College of Applied Medical Sciences, Shaqra University, KSA, Al dawadmi, Saudi Arabia
| | - Esmaeil Ebrahimie
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, South Australia, Australia.,School of Information Technology and Mathematical Sciences, Division of Information Technology, Engineering and the Environment, The University of South Australia, Adelaide, SA, Australia.,Institute of Biotechnology, Shiraz University, Shiraz, Iran.,Faculty of Science and Engineering, School of Biological Sciences, Flinders University, Adelaide, SA, Australia
| | - Hassan Giahi
- Institute of Biotechnology, Shiraz University, Shiraz, Iran
| | - Manijeh Mohammadi-Dehcheshmeh
- Australian Centre for Antimicrobial Resistance Ecology, School of Animal and Veterinary Sciences, The University of Adelaide, South Australia, Australia
| |
Collapse
|
19
|
A large-scale study of indicators of sub-clinical mastitis in dairy cattle by attribute weighting analysis of milk composition features: highlighting the predictive power of lactose and electrical conductivity. J DAIRY RES 2018; 85:193-200. [PMID: 29785910 DOI: 10.1017/s0022029918000249] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Sub-clinical mastitis (SCM) affects milk composition. In this study, we hypothesise that large-scale mining of milk composition features by pattern recognition models can identify the best predictors of SCM within the milk composition features. To this end, using data mining algorithms, we conducted a large-scale and longitudinal study to evaluate the ability of various milk production parameters as indicators of SCM. SCM is the most prevalent disease of dairy cattle, causing substantial economic loss for the dairy industry. Developing new techniques to diagnose SCM in its early stages improves herd health and is of great importance. Test-day Somatic Cell Count (SCC) is the most common indicator of SCM and the primary mastitis surveillance approach worldwide. However, test-day SCC fluctuates widely between days, causing major concerns for its reliability. Consequently, there would be great benefit to identifying additional efficient indicators from large-scale and longitudinal studies. With this intent, data was collected at every milking (twice per day) for a period of 2 months from a single farm using in-line electronic equipment (346 248 records in total). The following data were analysed: milk volume, protein concentration, lactose concentration, electrical conductivity (EC), milking time and peak flow. Three SCC cut-offs were used to estimate the prevalence of SCM: Australian ≥ 250 000 cells/ml, European ≥200 000 cells/ml and New Zealand ≥ 150 000 cells/ml. At first, 10 different Attribute Weighting Algorithms (AWM) were applied to the data. In the absence of SCC, lactose concentration featured as the most important variable, followed by EC. For the first time, using attribute weighted modelling, we showed that the concentration of lactose in milk can be used as a strong indicator of SCM. The development of machine-learning expert systems using two or more milk variables (such as lactose concentration and EC) may produce a predictive pattern for early SCM detection.
Collapse
|
20
|
Bioinformatics Techniques used in Hepatitis C Virus Research. JOURNAL OF PURE AND APPLIED MICROBIOLOGY 2017. [DOI: 10.22207/jpam.11.2.32] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
|
21
|
Cuypers L, Libin P, Schrooten Y, Theys K, Di Maio VC, Cento V, Lunar MM, Nevens F, Poljak M, Ceccherini-Silberstein F, Nowé A, Van Laethem K, Vandamme AM. Exploring resistance pathways for first-generation NS3/4A protease inhibitors boceprevir and telaprevir using Bayesian network learning. INFECTION GENETICS AND EVOLUTION 2017; 53:15-23. [PMID: 28499845 DOI: 10.1016/j.meegid.2017.05.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Revised: 04/25/2017] [Accepted: 05/08/2017] [Indexed: 12/19/2022]
Abstract
Resistance-associated variants (RAVs) have been shown to influence treatment response to direct-acting antivirals (DAAs) and first generation NS3/4A protease inhibitors (PIs) in particular. Interpretation of hepatitis C virus (HCV) genotypic drug resistance remains a challenge, especially in patients who previously failed DAA therapy and need to be retreated with a second DAA based regimen. Bayesian network (BN) learning on HCV sequence data from patients treated with DAAs could provide insight in resistance pathways against PIs for HCV subtypes 1a and 1b, in a similar way as applied before for HIV. The publicly available 'Rega-BN' tool chain was developed to study associative analyses for various pathogens. Our first analysis, comparing sequences from PI-naïve and PI-experienced patients, determined that NS3 substitutions R155K and V36M arise with PI-exposure in HCV1a infected patients, and were defined as major and minor resistance-associated variants respectively. NS3 variant 174H was newly identified as potentially related to PI resistance. In a second analysis, NS3 sequences from PI-naïve patients who cleared the virus during PI therapy and from PI-naïve patients who failed PI therapy were compared, showing that NS3 baseline variant 67S predisposes to treatment-failure and variant 72I to treatment success. This approach has the potential to better characterize the role of more RAVs, if sufficient therapy annotated sequence data becomes available in curated public databases. In addition, polymorphisms present in baseline sequences that predispose patients to therapy failure can be identified using this approach.
Collapse
Affiliation(s)
- Lize Cuypers
- KU Leuven, University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Herestraat 49, box 1040, 3000 Leuven, Belgium.
| | - Pieter Libin
- KU Leuven, University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Herestraat 49, box 1040, 3000 Leuven, Belgium; Artificial Intelligence Lab, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium.
| | - Yoeri Schrooten
- KU Leuven, University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Herestraat 49, box 1040, 3000 Leuven, Belgium.
| | - Kristof Theys
- KU Leuven, University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Herestraat 49, box 1040, 3000 Leuven, Belgium.
| | - Velia Chiara Di Maio
- Department of Experimental Medicine and Surgery, University of Rome "Tor Vergata", Rome, Italy.
| | - Valeria Cento
- Department of Experimental Medicine and Surgery, University of Rome "Tor Vergata", Rome, Italy.
| | - Maja M Lunar
- Institute of Microbiology and Immunology, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia.
| | - Frederik Nevens
- University Hospitals Leuven, Department of Hepatology, Herestraat 49, 3000 Leuven, Belgium.
| | - Mario Poljak
- Institute of Microbiology and Immunology, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia.
| | | | - Ann Nowé
- Artificial Intelligence Lab, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium.
| | - Kristel Van Laethem
- KU Leuven, University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Herestraat 49, box 1040, 3000 Leuven, Belgium.
| | - Anne-Mieke Vandamme
- KU Leuven, University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Herestraat 49, box 1040, 3000 Leuven, Belgium; Center for Global Health and Tropical Medicine, Microbiology Unit, Institute for Hygiene and Tropical Medicine, University Nova de Lisboa, Rua da Junqueira 100, 1349-008 Lisbon, Portugal.
| |
Collapse
|
22
|
Kargarfard F, Sami A, Mohammadi-Dehcheshmeh M, Ebrahimie E. Novel approach for identification of influenza virus host range and zoonotic transmissible sequences by determination of host-related associative positions in viral genome segments. BMC Genomics 2016; 17:925. [PMID: 27852224 PMCID: PMC5112743 DOI: 10.1186/s12864-016-3250-9] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2016] [Accepted: 11/02/2016] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Recent (2013 and 2009) zoonotic transmission of avian or porcine influenza to humans highlights an increase in host range by evading species barriers. Gene reassortment or antigenic shift between viruses from two or more hosts can generate a new life-threatening virus when the new shuffled virus is no longer recognized by antibodies existing within human populations. There is no large scale study to help understand the underlying mechanisms of host transmission. Furthermore, there is no clear understanding of how different segments of the influenza genome contribute in the final determination of host range. METHODS To obtain insight into the rules underpinning host range determination, various supervised machine learning algorithms were employed to mine reassortment changes in different viral segments in a range of hosts. Our multi-host dataset contained whole segments of 674 influenza strains organized into three host categories: avian, human, and swine. Some of the sequences were assigned to multiple hosts. In point of fact, the datasets are a form of multi-labeled dataset and we utilized a multi-label learning method to identify discriminative sequence sites. Then algorithms such as CBA, Ripper, and decision tree were applied to extract informative and descriptive association rules for each viral protein segment. RESULT We found informative rules in all segments that are common within the same host class but varied between different hosts. For example, for infection of an avian host, HA14V and NS1230S were the most important discriminative and combinatorial positions. CONCLUSION Host range identification is facilitated by high support combined rules in this study. Our major goal was to detect discriminative genomic positions that were able to identify multi host viruses, because such viruses are likely to cause pandemic or disastrous epidemics.
Collapse
Affiliation(s)
- Fatemeh Kargarfard
- Department of Computer Science and Engineering, School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran
| | - Ashkan Sami
- Department of Computer Science and Engineering, School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran
| | - Manijeh Mohammadi-Dehcheshmeh
- School of Animal and Veterinary Sciences, The University of Adelaide, Adelaide, Australia
- Institute of Biotechnology, Shiraz University, Shiraz, Iran
| | - Esmaeil Ebrahimie
- School of Animal and Veterinary Sciences, The University of Adelaide, Adelaide, Australia
- School of Medicine, Faculty of Health Sciences, The University of Adelaide, Adelaide, Australia
- Institute of Biotechnology, Shiraz University, Shiraz, Iran
- School of Information Technology and Mathematical Sciences, Division of Information Technology, Engineering and the Environment, University of South Australia, Adelaide, Australia
- School of Biological Sciences, Faculty of Science and Engineering, Flinders University, Adelaide, Australia
| |
Collapse
|
23
|
Jamali AA, Ferdousi R, Razzaghi S, Li J, Safdari R, Ebrahimie E. DrugMiner: comparative analysis of machine learning algorithms for prediction of potential druggable proteins. Drug Discov Today 2016; 21:718-24. [PMID: 26821132 DOI: 10.1016/j.drudis.2016.01.007] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2015] [Revised: 12/05/2015] [Accepted: 01/19/2016] [Indexed: 12/14/2022]
Abstract
Application of computational methods in drug discovery has received increased attention in recent years as a way to accelerate drug target prediction. Based on 443 sequence-derived protein features, we applied the most commonly used machine learning methods to predict whether a protein is druggable as well as to opt for superior algorithm in this task. In addition, feature selection procedures were used to provide the best performance of each classifier according to the optimum number of features. When run on all features, Neural Network was the best classifier, with 89.98% accuracy, based on a k-fold cross-validation test. Among all the algorithms applied, the optimum number of most-relevant features was 130, according to the Support Vector Machine-Feature Selection (SVM-FS) algorithm. This study resulted in the discovery of new drug target which potentially can be employed in cell signaling pathways, gene expression, and signal transduction. The DrugMiner web tool was developed based on the findings of this study to provide researchers with the ability to predict druggable proteins. DrugMiner is freely available at www.DrugMiner.org.
Collapse
Affiliation(s)
- Ali Akbar Jamali
- Research Center for Pharmaceutical Nanotechnology (RCPN), Tabriz University of Medical Sciences, Tabriz, Iran
| | - Reza Ferdousi
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| | - Saeed Razzaghi
- Information Technology Center, The University of Zanjan, Zanjan, Iran
| | - Jiuyong Li
- School of Information Technology and Mathematical Sciences, Division of Information Technology, Engineering and the Environment, The University of South Australia, Adelaide, SA, Australia
| | - Reza Safdari
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran.
| | - Esmaeil Ebrahimie
- School of Information Technology and Mathematical Sciences, Division of Information Technology, Engineering and the Environment, The University of South Australia, Adelaide, SA, Australia; Department of Genetics & Evolution, School of Biological Sciences, The University of Adelaide, Adelaide, SA, Australia; School of Biological Sciences, Faculty of Science and Engineering, Flinders University, Adelaide, SA, Australia.
| |
Collapse
|