1
|
Elshawi R, Sakr S, Al-Mallah MH, Keteyian SJ, Brawner CA, Ehrman JK. FIT calculator: a multi-risk prediction framework for medical outcomes using cardiorespiratory fitness data. Sci Rep 2024; 14:8745. [PMID: 38627439 PMCID: PMC11021455 DOI: 10.1038/s41598-024-59401-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 04/10/2024] [Indexed: 04/19/2024] Open
Abstract
Accurately predicting patients' risk for specific medical outcomes is paramount for effective healthcare management and personalized medicine. While a substantial body of literature addresses the prediction of diverse medical conditions, existing models predominantly focus on singular outcomes, limiting their scope to one disease at a time. However, clinical reality often entails patients concurrently facing multiple health risks across various medical domains. In response to this gap, our study proposes a novel multi-risk framework adept at simultaneous risk prediction for multiple clinical outcomes, including diabetes, mortality, and hypertension. Leveraging a concise set of features extracted from patients' cardiorespiratory fitness data, our framework minimizes computational complexity while maximizing predictive accuracy. Moreover, we integrate a state-of-the-art instance-based interpretability technique into our framework, providing users with comprehensive explanations for each prediction. These explanations afford medical practitioners invaluable insights into the primary health factors influencing individual predictions, fostering greater trust and utility in the underlying prediction models. Our approach thus stands to significantly enhance healthcare decision-making processes, facilitating more targeted interventions and improving patient outcomes in clinical practice. Our prediction framework utilizes an automated machine learning framework, Auto-Weka, to optimize machine learning models and hyper-parameter configurations for the simultaneous prediction of three medical outcomes: diabetes, mortality, and hypertension. Additionally, we employ a local interpretability technique to elucidate predictions generated by our framework. These explanations manifest visually, highlighting key attributes contributing to each instance's prediction for enhanced interpretability. Using automated machine learning techniques, the models simultaneously predict hypertension, mortality, and diabetes risks, utilizing only nine patient features. They achieved an average AUC of 0.90 ± 0.001 on the hypertension dataset, 0.90 ± 0.002 on the mortality dataset, and 0.89 ± 0.001 on the diabetes dataset through tenfold cross-validation. Additionally, the models demonstrated strong performance with an average AUC of 0.89 ± 0.001 on the hypertension dataset, 0.90 ± 0.001 on the mortality dataset, and 0.89 ± 0.001 on the diabetes dataset using bootstrap evaluation with 1000 resamples.
Collapse
Affiliation(s)
- Radwa Elshawi
- Institute of Computer Science, University of Tartu, Tartu, Estonia.
| | - Sherif Sakr
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| | | | - Steven J Keteyian
- Division of Cardiovascular Medicine, Henry Ford Hospital, 6525 Second Ave., Detroit, MI, 48202, USA
| | - Clinton A Brawner
- Division of Cardiovascular Medicine, Henry Ford Hospital, 6525 Second Ave., Detroit, MI, 48202, USA
| | - Jonathan K Ehrman
- Division of Cardiovascular Medicine, Henry Ford Hospital, 6525 Second Ave., Detroit, MI, 48202, USA
| |
Collapse
|
2
|
Komisarenko V, Voormansik K, Elshawi R, Sakr S. Exploiting time series of Sentinel-1 and Sentinel-2 to detect grassland mowing events using deep learning with reject region. Sci Rep 2022; 12:983. [PMID: 35046488 PMCID: PMC8770799 DOI: 10.1038/s41598-022-04932-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Accepted: 01/04/2022] [Indexed: 12/05/2022] Open
Abstract
Governments pay agencies to control the activities of farmers who receive governmental support. Field visits are costly and highly time-consuming; hence remote sensing is widely used for monitoring farmers’ activities. Nowadays, a vast amount of available data from the Sentinel mission significantly boosted research in agriculture. Estonia is among the first countries to take advantage of this data source to automate mowing and ploughing events detection across the country. Although techniques that rely on optical data for monitoring agriculture events are favourable, the availability of such data during the growing season is limited. Thus, alternative data sources have to be evaluated. In this paper, we developed a deep learning model with an integrated reject option for detecting grassland mowing events using time series of Sentinel-1 and Sentinel-2 optical images acquired from 2000 fields in Estonia in 2018 during the vegetative season. The rejection mechanism is based on a threshold over the prediction confidence of the proposed model. The proposed model significantly outperforms the state-of-the-art technique and achieves event accuracy of 73.3% and end of season accuracy of 94.8%.
Collapse
Affiliation(s)
| | - Kaupo Voormansik
- Tartu Observatory, University of Tartu, Tartu, Estonia.,KappaZeta Ltd., Tartu, Estonia
| | - Radwa Elshawi
- Institute of Computer Science, University of Tartu, Tartu, Estonia.
| | - Sherif Sakr
- Institute of Computer Science, University of Tartu, Tartu, Estonia
| |
Collapse
|
3
|
Abd Elrahman A, El Helw M, Elshawi R, Sakr S. D-SmartML: A Distributed Automated Machine Learning Framework. 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS) 2020. [DOI: 10.1109/icdcs47774.2020.00115] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
|
4
|
Elshawi R, Al-Mallah MH, Sakr S. On the interpretability of machine learning-based model for predicting hypertension. BMC Med Inform Decis Mak 2019; 19:146. [PMID: 31357998 PMCID: PMC6664803 DOI: 10.1186/s12911-019-0874-0] [Citation(s) in RCA: 75] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Accepted: 07/18/2019] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Although complex machine learning models are commonly outperforming the traditional simple interpretable models, clinicians find it hard to understand and trust these complex models due to the lack of intuition and explanation of their predictions. The aim of this study to demonstrate the utility of various model-agnostic explanation techniques of machine learning models with a case study for analyzing the outcomes of the machine learning random forest model for predicting the individuals at risk of developing hypertension based on cardiorespiratory fitness data. METHODS The dataset used in this study contains information of 23,095 patients who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 10-year follow-up. Five global interpretability techniques (Feature Importance, Partial Dependence Plot, Individual Conditional Expectation, Feature Interaction, Global Surrogate Models) and two local interpretability techniques (Local Surrogate Models, Shapley Value) have been applied to present the role of the interpretability techniques on assisting the clinical staff to get better understanding and more trust of the outcomes of the machine learning-based predictions. RESULTS Several experiments have been conducted and reported. The results show that different interpretability techniques can shed light on different insights on the model behavior where global interpretations can enable clinicians to understand the entire conditional distribution modeled by the trained response function. In contrast, local interpretations promote the understanding of small parts of the conditional distribution for specific instances. CONCLUSIONS Various interpretability techniques can vary in their explanations for the behavior of the machine learning model. The global interpretability techniques have the advantage that it can generalize over the entire population while local interpretability techniques focus on giving explanations at the level of instances. Both methods can be equally valid depending on the application need. Both methods are effective methods for assisting clinicians on the medical decision process, however, the clinicians will always remain to hold the final say on accepting or rejecting the outcome of the machine learning models and their explanations based on their domain expertise.
Collapse
Affiliation(s)
- Radwa Elshawi
- Data Systems Group, Institute of Computer Science, University of Tartu, 2 J. Liivi St., 50409 Tartu, Estonia
| | | | - Sherif Sakr
- Data Systems Group, Institute of Computer Science, University of Tartu, 2 J. Liivi St., 50409 Tartu, Estonia
| |
Collapse
|
5
|
Daghistani TA, Elshawi R, Sakr S, Ahmed AM, Al-Thwayee A, Al-Mallah MH. Predictors of in-hospital length of stay among cardiac patients: A machine learning approach. Int J Cardiol 2019; 288:140-147. [PMID: 30685103 DOI: 10.1016/j.ijcard.2019.01.046] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/24/2018] [Revised: 12/20/2018] [Accepted: 01/14/2019] [Indexed: 11/18/2022]
Abstract
OBJECTIVE The In-hospital length of stay (LOS) is expected to increase as cardiovascular diseases complexity increases and the population ages. This will affect healthcare systems especially with the current situation of decreased bed capacity and increasing costs. Therefore, accurately predicting LOS would have a positive impact on healthcare metrics. The aim of this study is to develop a machine learning-based model approach for predicting in-hospital LOS for cardiac patients. DESIGN Using electronic medical records, we retrospectively extracted all records of patients' visits that were admitted under adult cardiology service. Admission diagnosis and primary treating physician were reviewed to verify selection criteria. A predictive machine learning-based model approach was applied to incorporate simple baseline health data at admission time to predict LOS. Patients were divided into three groups based on their LOS: short (<3 days), intermediate (3-5 days) and long (>5 days). Information gain algorithm was utilized to select the most relevant attributes. Only attributes with information gain of more than zero were used in model building. Four different machine learning techniques were evaluated and their diagnostic accuracy measures were compared. SETTING The dataset of this study included adult patients who were admitted between 2008 and 2016 in King Abdulaziz Cardiac Center (KACC). The center is located in King Abdulaziz Medical City Complex in Riyadh, the capital of Saudi Arabia. PARTICIPANTS (DATASET) A total of 16,414 consecutive inpatient visits for 12,769 unique patients (mean age of 58.8 ± 16 years of which 68.2% were males) between 2008 and 2016 were included. The study cohort had a high prevalence of cardiovascular risk factors (hypertension 56%, diabetes 56%, dyslipidemia 52%, obesity 33% and smoking 24%). The most common admitting diagnosis was acute coronary syndrome (36%). RESULTS The variables with highest impact on the prediction of in-hospital LOS were on admission heart rate, on admission systolic and diastolic blood pressure, age and insurance status (eligibility). Using machine learning models; Random Forest (RF) model outperformed among all other models (sensitivity (0.80), accuracy (0.80), and AUROC (0.94)). CONCLUSION We showed that machine learning methods provide accurate prediction of LOS for cardiac patients. This is can be used in clinical bed management and resources allocation.
Collapse
Affiliation(s)
| | | | - Sherif Sakr
- King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia; University of Tartu, Tartu, Estonia.
| | - Amjad M Ahmed
- King Abdulaziz Cardiac Center, King Abdulaziz Medical city for National Guard, Riyadh, Saudi Arabia
| | - Abdullah Al-Thwayee
- King Abdulaziz Cardiac Center, King Abdulaziz Medical city for National Guard, Riyadh, Saudi Arabia
| | - Mouaz H Al-Mallah
- King Abdullah International Medical Research Center, Riyadh, Saudi Arabia; King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia; King Abdulaziz Cardiac Center, King Abdulaziz Medical city for National Guard, Riyadh, Saudi Arabia
| |
Collapse
|
6
|
Sakr S, Elshawi R, Ahmed A, Qureshi WT, Brawner C, Keteyian S, Blaha MJ, Al-Mallah MH. Using machine learning on cardiorespiratory fitness data for predicting hypertension: The Henry Ford ExercIse Testing (FIT) Project. PLoS One 2018; 13:e0195344. [PMID: 29668729 PMCID: PMC5905952 DOI: 10.1371/journal.pone.0195344] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Accepted: 03/20/2018] [Indexed: 12/17/2022] Open
Abstract
This study evaluates and compares the performance of different machine learning techniques on predicting the individuals at risk of developing hypertension, and who are likely to benefit most from interventions, using the cardiorespiratory fitness data. The dataset of this study contains information of 23,095 patients who underwent clinician- referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 10-year follow-up. The variables of the dataset include information on vital signs, diagnosis and clinical laboratory measurements. Six machine learning techniques were investigated: LogitBoost (LB), Bayesian Network classifier (BN), Locally Weighted Naive Bayes (LWB), Artificial Neural Network (ANN), Support Vector Machine (SVM) and Random Tree Forest (RTF). Using different validation methods, the RTF model has shown the best performance (AUC = 0.93) and outperformed all other machine learning techniques examined in this study. The results have also shown that it is critical to carefully explore and evaluate the performance of the machine learning models using various model evaluation methods as the prediction accuracy can significantly differ.
Collapse
Affiliation(s)
- Sherif Sakr
- King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia
- King Abdullah International Medical Research Center, Riyadh, Saudia Arabia
- University of Taru, Taru, Estonia
| | - Radwa Elshawi
- Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
- University of Taru, Taru, Estonia
| | - Amjad Ahmed
- King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia
- King Abdullah International Medical Research Center, Riyadh, Saudia Arabia
| | - Waqas T. Qureshi
- Wake Forest School of Medicine, Medical Center Boulevard, Winston-Salem, NC, United States of America
| | - Clinton Brawner
- Heart and Vascular Institute, Henry Ford Hospital System, Detroit, MI, United States of America
| | - Steven Keteyian
- Heart and Vascular Institute, Henry Ford Hospital System, Detroit, MI, United States of America
| | - Michael J. Blaha
- Johns Hopkins Medicine, Baltimore, Maryland, United States of America
| | - Mouaz H. Al-Mallah
- King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia
- King Abdullah International Medical Research Center, Riyadh, Saudia Arabia
- Heart and Vascular Institute, Henry Ford Hospital System, Detroit, MI, United States of America
| |
Collapse
|
7
|
Sakr S, Elshawi R, Ahmed AM, Qureshi WT, Brawner CA, Keteyian SJ, Blaha MJ, Al-Mallah MH. Comparison of machine learning techniques to predict all-cause mortality using fitness data: the Henry ford exercIse testing (FIT) project. BMC Med Inform Decis Mak 2017; 17:174. [PMID: 29258510 PMCID: PMC5735871 DOI: 10.1186/s12911-017-0566-6] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Accepted: 11/22/2017] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Prior studies have demonstrated that cardiorespiratory fitness (CRF) is a strong marker of cardiovascular health. Machine learning (ML) can enhance the prediction of outcomes through classification techniques that classify the data into predetermined categories. The aim of this study is to present an evaluation and comparison of how machine learning techniques can be applied on medical records of cardiorespiratory fitness and how the various techniques differ in terms of capabilities of predicting medical outcomes (e.g. mortality). METHODS We use data of 34,212 patients free of known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems Between 1991 and 2009 and had a complete 10-year follow-up. Seven machine learning classification techniques were evaluated: Decision Tree (DT), Support Vector Machine (SVM), Artificial Neural Networks (ANN), Naïve Bayesian Classifier (BC), Bayesian Network (BN), K-Nearest Neighbor (KNN) and Random Forest (RF). In order to handle the imbalanced dataset used, the Synthetic Minority Over-Sampling Technique (SMOTE) is used. RESULTS Two set of experiments have been conducted with and without the SMOTE sampling technique. On average over different evaluation metrics, SVM Classifier has shown the lowest performance while other models like BN, BC and DT performed better. The RF classifier has shown the best performance (AUC = 0.97) among all models trained using the SMOTE sampling. CONCLUSIONS The results show that various ML techniques can significantly vary in terms of its performance for the different evaluation metrics. It is also not necessarily that the more complex the ML model, the more prediction accuracy can be achieved. The prediction performance of all models trained with SMOTE is much better than the performance of models trained without SMOTE. The study shows the potential of machine learning methods for predicting all-cause mortality using cardiorespiratory fitness data.
Collapse
Affiliation(s)
- Sherif Sakr
- King AbdulAziz Cardiac Center, Ministry of National Guard, Health Affairs, King Abdulaziz Medical City for National Guard - Health affairs, King Abdullah International Medical Research Center, King Saud bin Abdulaziz University for Health Sciences, Department Mail Code: 1413, P.O. Box 22490, Riyadh, 11426, Kingdom of Saudi Arabia
| | - Radwa Elshawi
- Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Amjad M Ahmed
- King AbdulAziz Cardiac Center, Ministry of National Guard, Health Affairs, King Abdulaziz Medical City for National Guard - Health affairs, King Abdullah International Medical Research Center, King Saud bin Abdulaziz University for Health Sciences, Department Mail Code: 1413, P.O. Box 22490, Riyadh, 11426, Kingdom of Saudi Arabia
| | - Waqas T Qureshi
- Wake Forest School of Medicine, Medical Center Boulevard, Winston-Salem, NC, USA
| | - Clinton A Brawner
- Division of Cardiovascular Medicine, Henry Ford Hospital, Detroit, MI, USA
| | - Steven J Keteyian
- King AbdulAziz Cardiac Center, Ministry of National Guard, Health Affairs, King Abdulaziz Medical City for National Guard - Health affairs, King Abdullah International Medical Research Center, King Saud bin Abdulaziz University for Health Sciences, Department Mail Code: 1413, P.O. Box 22490, Riyadh, 11426, Kingdom of Saudi Arabia
| | | | - Mouaz H Al-Mallah
- King AbdulAziz Cardiac Center, Ministry of National Guard, Health Affairs, King Abdulaziz Medical City for National Guard - Health affairs, King Abdullah International Medical Research Center, King Saud bin Abdulaziz University for Health Sciences, Department Mail Code: 1413, P.O. Box 22490, Riyadh, 11426, Kingdom of Saudi Arabia. .,Division of Cardiovascular Medicine, Henry Ford Hospital, Detroit, MI, USA.
| |
Collapse
|
8
|
Al-Mallah MH, Elshawi R, Ahmed AM, Qureshi WT, Brawner CA, Blaha MJ, Ahmed HM, Ehrman JK, Keteyian SJ, Sakr S. Using Machine Learning to Define the Association between Cardiorespiratory Fitness and All-Cause Mortality (from the Henry Ford Exercise Testing Project). Am J Cardiol 2017; 120:2078-2084. [PMID: 28951020 DOI: 10.1016/j.amjcard.2017.08.029] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/17/2017] [Revised: 08/02/2017] [Accepted: 08/08/2017] [Indexed: 10/19/2022]
Abstract
Previous studies have demonstrated that cardiorespiratory fitness is a strong marker of cardiovascular health. Machine learning (ML) can enhance the prediction of outcomes through classification techniques that classify the data into predetermined categories. The aim of the analysis is to compare the prediction of 10 years of all-cause mortality (ACM) using statistical logistic regression (LR) and ML approaches in a cohort of patients who underwent exercise stress testing. We included 34,212 patients (55% males, mean age 54 ± 13 years) free of coronary artery disease or heart failure who underwent exercise treadmill stress testing between 1991 and 2009 and had complete 10-year follow-up. The primary outcome of this analysis was ACM at 10 years. The probability of 10-years ACM was calculated using statistical LR and ML, and the accuracy of these methods was calculated and compared. A total of 3,921 patients died at 10 years. Using statistical LR, the sensitivity to predict ACM was 44.9% (95% confidence interval [CI] 43.3% to 46.5%), whereas the specificity was 93.4% (95% CI 93.1% to 93.7%). The sensitivity of ML to predict ACM was 87.4% (95% CI 86.3% to 88.4%), whereas the specificity was 97.2% (95% CI 97.0% to 97.4%). The ML approach was associated with improved model discrimination (area under the curve for ML [0.923 (95% CI 0.917 to 0.928)]) compared with statistical LR (0.836 [95% CI 0.829 to 0.846], p<0.0001). In conclusion, our analysis demonstrates that ML provides better accuracy and discrimination of the prediction of ACM among patients undergoing stress testing.
Collapse
|
9
|
Al-Khateeb M, Qureshi WT, Odeh R, Ahmed AM, Sakr S, Elshawi R, Bdeir MB, Al-Mallah MH. The impact of digoxin on mortality in patients with chronic systolic heart failure: A propensity-matched cohort study. Int J Cardiol 2016; 228:214-218. [PMID: 27865188 DOI: 10.1016/j.ijcard.2016.11.021] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/29/2016] [Revised: 10/30/2016] [Accepted: 11/04/2016] [Indexed: 11/16/2022]
Abstract
BACKGROUND Prior Studies showed mixed results in association of digoxin use with all-cause mortality (ACM). The aim of this analysis is to identify the impact of digoxin use on ACM in a contemporary heart failure (HF) cohort treated with guideline based therapy. METHODS We included 2298 consecutive patients seen in an HF clinic between 2000 and 2015. Patients were considered to be a digoxin user if he/she received digoxin at any point during the enrollment period in the HF clinic. Patients were matched based on digoxin utility using propensity matching in 2-3:1 fashion. The primary outcome was ACM. RESULT Of 2298 patients, 325 digoxin users were matched with 750 non-digoxin users. The Matched cohort did not have differences among demographics and clinical variables except for worse HF symptomatology and increased prevalence of atrial fibrillation. Overall, the prevalence of the use of guideline suggested therapies was 96%. After a median follow-up duration of 4years (IQR 2-6years), digoxin use was associated with increased ACM (21.8% versus 12.9%, unadjusted HR=1.81; 95% CI=1.33 to 2.45; p=0.001). This association remained significant after adjusting for the propensity score, atrial fibrillation, ejection fraction, and New York HF Class (HR=1.74; 95% CI=1.20 to 2.38; p<0.0001). CONCLUSION In this analysis of well-treated HF patients, digoxin was associated with increased ACM. Further randomized controlled trials are needed to determine whether digoxin therapy should be used in well-treated HF patients. Until then, routine use of digoxin in clinical practice should be discouraged.
Collapse
Affiliation(s)
- May Al-Khateeb
- King Abdulaziz Medical City for National Guard, Riyadh, Saudi Arabia; King Abdulaziz Cardiac Centre, Riyadh, Saudi Arabia
| | - Waqas T Qureshi
- Department of Internal Medicine, Wake Forest School of Medicine, Medical Center Boulevard, Winston-Salem, NC, USA
| | - Raed Odeh
- King Abdulaziz Medical City for National Guard, Riyadh, Saudi Arabia; King Abdulaziz Cardiac Centre, Riyadh, Saudi Arabia
| | - Amjad M Ahmed
- King Abdulaziz Medical City for National Guard, Riyadh, Saudi Arabia; King Abdulaziz Cardiac Centre, Riyadh, Saudi Arabia
| | - Sherif Sakr
- King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia; King Abdullah International Medical Research Centre, Riyadh, Saudi Arabia
| | - Radwa Elshawi
- Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - M Bassam Bdeir
- King Abdulaziz Medical City for National Guard, Riyadh, Saudi Arabia; King Abdulaziz Cardiac Centre, Riyadh, Saudi Arabia
| | - Mouaz H Al-Mallah
- King Abdulaziz Medical City for National Guard, Riyadh, Saudi Arabia; King Abdulaziz Cardiac Centre, Riyadh, Saudi Arabia; King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia; King Abdullah International Medical Research Centre, Riyadh, Saudi Arabia.
| |
Collapse
|
10
|
Batarfi O, Elshawi R, Fayoumi A, Barnawi A, Sakr S. A distributed query execution engine of big attributed graphs. Springerplus 2016; 5:665. [PMID: 27350905 PMCID: PMC4899405 DOI: 10.1186/s40064-016-2251-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/15/2016] [Accepted: 04/27/2016] [Indexed: 12/03/2022]
Abstract
A graph is a popular data model that has become pervasively used for modeling structural relationships between objects. In practice, in many real-world graphs, the graph vertices and edges need to be associated with descriptive attributes. Such type of graphs are referred to as attributed graphs. G-SPARQL has been proposed as an expressive language, with a centralized execution engine, for querying attributed graphs. G-SPARQL supports various types of graph querying operations including reachability, pattern matching and shortest path where any G-SPARQL query may include value-based predicates on the descriptive information (attributes) of the graph edges/vertices in addition to the structural predicates. In general, a main limitation of centralized systems is that their vertical scalability is always restricted by the physical limits of computer systems. This article describes the design, implementation in addition to the performance evaluation of DG-SPARQL, a distributed, hybrid and adaptive parallel execution engine of G-SPARQL queries. In this engine, the topology of the graph is distributed over the main memory of the underlying nodes while the graph data are maintained in a relational store which is replicated on the disk of each of the underlying nodes. DG-SPARQL evaluates parts of the query plan via SQL queries which are pushed to the underlying relational stores while other parts of the query plan, as necessary, are evaluated via indexless memory-based graph traversal algorithms. Our experimental evaluation shows the efficiency and the scalability of DG-SPARQL on querying massive attributed graph datasets in addition to its ability to outperform the performance of Apache Giraph, a popular distributed graph processing system, by orders of magnitudes.
Collapse
Affiliation(s)
| | - Radwa Elshawi
- Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia
| | | | | | - Sherif Sakr
- University of New South Wales, Sydney, Australia ; King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia
| |
Collapse
|
11
|
Elshawi R, Sakr S. On Analyzing the Impact of Authors and Their Collaboration Patterns in the Major Computer Algorithms Research Conferences. Collnet Journal of Scientometrics and Information Management 2016. [DOI: 10.1080/09737766.2016.1177951] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
12
|
Elshawi R, Sakr S. International conferences on computer system: Analysis of EuroSys, SOSP, and OSDI during 2006-2014. Collnet Journal of Scientometrics and Information Management 2016. [DOI: 10.1080/09737766.2016.1177953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|