1
|
Hennebelle A, Ismail L, Materwala H, Al Kaabi J, Ranjan P, Janardhanan R. Secure and privacy-preserving automated machine learning operations into end-to-end integrated IoT-edge-artificial intelligence-blockchain monitoring system for diabetes mellitus prediction. Comput Struct Biotechnol J 2024; 23:212-233. [PMID: 38169966 PMCID: PMC10758733 DOI: 10.1016/j.csbj.2023.11.038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 11/20/2023] [Accepted: 11/20/2023] [Indexed: 01/05/2024] Open
Abstract
Diabetes Mellitus, one of the leading causes of death worldwide, has no cure to date and can lead to severe health complications, such as retinopathy, limb amputation, cardiovascular diseases, and neuronal disease, if left untreated. Consequently, it becomes crucial to be able to monitor and predict the incidence of diabetes. Machine learning approaches have been proposed and evaluated in the literature for diabetes prediction. This paper proposes an IoT-edge-Artificial Intelligence (AI)-blockchain system for diabetes prediction based on risk factors. The proposed system is underpinned by blockchain to obtain a cohesive view of the risk factors data from patients across different hospitals and ensure security and privacy of the user's data. We provide a comparative analysis of different medical sensors, devices, and methods to measure and collect the risk factors values in the system. Numerical experiments and comparative analysis were carried out within our proposed system, using the most accurate random forest (RF) model, and the two most used state-of-the-art machine learning approaches, Logistic Regression (LR) and Support Vector Machine (SVM), using three real-life diabetes datasets. The results show that the proposed system predicts diabetes using RF with 4.57% more accuracy on average in comparison with the other models LR and SVM, with 2.87 times more execution time. Data balancing without feature selection does not show significant improvement. When using feature selection, the performance is improved by 1.14% for PIMA Indian and 0.02% for Sylhet datasets, while it is reduced by 0.89% for MIMIC III.
Collapse
Affiliation(s)
- Alain Hennebelle
- School of Computing and Information Systems, The University of Melbourne, Australia
| | - Leila Ismail
- School of Computing and Information Systems, The University of Melbourne, Australia
- Intelligent Distributed Computing and Systems Lab, Department of Computer Science and Software Engineering, College of Information Technology, United Arab Emirates University, United Arab Emirates
- National Water and Energy Center, United Arab Emirates University, United Arab Emirates
| | - Huned Materwala
- Intelligent Distributed Computing and Systems Lab, Department of Computer Science and Software Engineering, College of Information Technology, United Arab Emirates University, United Arab Emirates
- National Water and Energy Center, United Arab Emirates University, United Arab Emirates
| | - Juma Al Kaabi
- College of Medicine and Health Sciences, Department of Internal Medicine, United Arab Emirates University, United Arab Emirates
- Tawam and Mediclinic Hospitals, Al Ain, Abu Dhabi, United Arab Emirates
| | - Priya Ranjan
- School of Computer Science, Internet of Things Center of Excellence, University of Petroleum and Energy Studies, India
| | - Rajiv Janardhanan
- Faculty of Medical & Health Sciences, SRM Institute of Science & Technology, India
| |
Collapse
|
2
|
Bayar Kapici O, Kapici Y, Tekın A, Şırık M. A novel diagnosis method for schizophrenia based on globus pallidus data. Psychiatry Res Neuroimaging 2023; 336:111732. [PMID: 37922672 DOI: 10.1016/j.pscychresns.2023.111732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 08/25/2023] [Accepted: 10/09/2023] [Indexed: 11/07/2023]
Abstract
This research aims to diagnose schizophrenia with machine learning-based algorithms. Bayesian neural network, logistic regression, decision tree, k-nearest neighbor, and gaussian kernel classification techniques are investigated to diagnose schizophrenia with data from 125 persons. This study showed that left lateral ventricles and left globus pallidus volumes and their percentages in the brain were significantly lower than HCs in FEP patients. Using brain volumes, we were able to diagnose FEP with an accuracy of 73.6 % via logistic regression and with an accuracy of 86.4 % using the SVM kernel classifier method. Therefore, brain volumes can be used to diagnose FEP with the SVM kernel classifier method.
Collapse
Affiliation(s)
- Olga Bayar Kapici
- Department of Radiology, Adıyaman Training and Research Hospital, Adıyaman, Turkey
| | - Yaşar Kapici
- Department of Psychiatry, Kahta State Hospital, Adıyaman, Turkey.
| | - Atilla Tekın
- Department of Psychiatry, Adıyaman University Faculty of Medicine, Adıyaman, Turkey
| | - Mehmet Şırık
- Department of Radiology, Adıyaman University Faculty of Medicine, Adıyaman, Turkey
| |
Collapse
|
3
|
Pourhashemi S, Asadi MAZ, Boroughani M, Azadi H. Mapping of dust source susceptibility by remote sensing and machine learning techniques (case study: Iran-Iraq border). Environ Sci Pollut Res Int 2023; 30:27965-27979. [PMID: 36394809 DOI: 10.1007/s11356-022-23982-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Accepted: 10/30/2022] [Indexed: 06/16/2023]
Abstract
A dust storm is a major environmental problem affecting many arid regions worldwide. The novel contribution of this study is combining indicators extracted from RS- and statistic-based predictive models to spatial mapping of land susceptibility to dust emissions in a very important dust source area in the borders of Iran and Iraq (Khuzestan province in Iran and Al-Basrah and Maysan provinces in Iraq). In this research, remote sensing (RS) techniques and machine learning techniques, including multivariate adaptive regression spline (MARS), random forest (RF), and logistic regression (LR), were used for dust source identification and susceptibility map preparation. To this end, 152 DSA for the period of 2005-2020 were identified in the study area. Of these DSA data, 70% was assigned to the Dust Source Susceptibility Mapping (DSSM) (training dataset) and 30% to model validation. Consequently, six factors (i.e., soil, lithology, slope, normalized vegetation differential index (NDVI), geomorphology, and land use units) were prepared as DSA's independent and effective variables. The results of all three models indicated that land use had the most impact on DSA. The validation results of these models using the test data showed sub-curves of 0.92, 0.86, and 0.76 for the RF, MARS, and LR models, respectively. Also, results showed that the RF model outperformed MARS (AUC = 0.89) and LR (AUC = 0.78) methods. In all three models, high and very high susceptibility classes generally covered a large percentage of the case study. The highest percentage of dust source points was also in this susceptibility category. Overall, the results of this study can be useful for planners and managers to control and reduce the risk of negative dust consequences.
Collapse
Affiliation(s)
- Sima Pourhashemi
- Department of Geography, Hakim Sabzevari University, Sabzevar, Iran
| | | | - Mahdi Boroughani
- Research Center for Geosciences and Social Studies, Hakim Sabzevari University, Sabzevar, Iran
| | - Hossein Azadi
- Department of Geography, Ghent University, Ghent, Belgium
| |
Collapse
|
4
|
Raheja S, Kasturia S, Cheng X, Kumar M. Machine learning-based diffusion model for prediction of coronavirus-19 outbreak. Neural Comput Appl 2021; 35:13755-13774. [PMID: 34400853 PMCID: PMC8358916 DOI: 10.1007/s00521-021-06376-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 07/26/2021] [Indexed: 11/23/2022]
Abstract
The coronavirus pandemic has been globally impacting the health and prosperity of people. A persistent increase in the number of positive cases has boost the stress among governments across the globe. There is a need of approach which gives more accurate predictions of outbreak. This paper presents a novel approach called diffusion prediction model for prediction of number of coronavirus cases in four countries: India, France, China and Nepal. Diffusion prediction model works on the diffusion process of the human contact. Model considers two forms of spread: when the spread takes time after infecting one person and when the spread is immediate after infecting one person. It makes the proposed model different over other state-of-the art models. It is giving more accurate results than other state-of-the art models. The proposed diffusion prediction model forecasts the number of new cases expected to occur in next 4 weeks. The model has predicted the number of confirmed cases, recovered cases, deaths and active cases. The model can facilitate government to be well prepared for any abrupt rise in this pandemic. The performance is evaluated in terms of accuracy and error rate and compared with the prediction results of support vector machine, logistic regression model and convolution neural network. The results prove the efficiency of the proposed model.
Collapse
Affiliation(s)
- Supriya Raheja
- Department of Computer Science, Amity University, Noida, India
| | - Shreya Kasturia
- Department of Computer Science, Amity University, Noida, India
| | - Xiaochun Cheng
- Department of Computer Science, Middlesex University, London, UK
| | - Manoj Kumar
- School of Computer Science, University of Petroleum and Energy Studies, Dehradun, India
| |
Collapse
|
5
|
Pahar M, Klopper M, Warren R, Niesler T. COVID-19 cough classification using machine learning and global smartphone recordings. Comput Biol Med 2021; 135:104572. [PMID: 34182331 PMCID: PMC8213969 DOI: 10.1016/j.compbiomed.2021.104572] [Citation(s) in RCA: 85] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 06/09/2021] [Accepted: 06/09/2021] [Indexed: 12/15/2022]
Abstract
We present a machine learning based COVID-19 cough classifier which can discriminate COVID-19 positive coughs from both COVID-19 negative and healthy coughs recorded on a smartphone. This type of screening is non-contact, easy to apply, and can reduce the workload in testing centres as well as limit transmission by recommending early self-isolation to those who have a cough suggestive of COVID-19. The datasets used in this study include subjects from all six continents and contain both forced and natural coughs, indicating that the approach is widely applicable. The publicly available Coswara dataset contains 92 COVID-19 positive and 1079 healthy subjects, while the second smaller dataset was collected mostly in South Africa and contains 18 COVID-19 positive and 26 COVID-19 negative subjects who have undergone a SARS-CoV laboratory test. Both datasets indicate that COVID-19 positive coughs are 15%–20% shorter than non-COVID coughs. Dataset skew was addressed by applying the synthetic minority oversampling technique (SMOTE). A leave-p-out cross-validation scheme was used to train and evaluate seven machine learning classifiers: logistic regression (LR), k-nearest neighbour (KNN), support vector machine (SVM), multilayer perceptron (MLP), convolutional neural network (CNN), long short-term memory (LSTM) and a residual-based neural network architecture (Resnet50). Our results show that although all classifiers were able to identify COVID-19 coughs, the best performance was exhibited by the Resnet50 classifier, which was best able to discriminate between the COVID-19 positive and the healthy coughs with an area under the ROC curve (AUC) of 0.98. An LSTM classifier was best able to discriminate between the COVID-19 positive and COVID-19 negative coughs, with an AUC of 0.94 after selecting the best 13 features from a sequential forward selection (SFS). Since this type of cough audio classification is cost-effective and easy to deploy, it is potentially a useful and viable means of non-contact COVID-19 screening.
Collapse
Affiliation(s)
- Madhurananda Pahar
- Department of Electrical and Electronic Engineering, Stellenbosch University, South Africa.
| | - Marisa Klopper
- SAMRC Centre for Tuberculosis Research, DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, South Africa.
| | - Robin Warren
- SAMRC Centre for Tuberculosis Research, DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, South Africa.
| | - Thomas Niesler
- Department of Electrical and Electronic Engineering, Stellenbosch University, South Africa.
| |
Collapse
|
6
|
Kesmen Z, Kılıç Ö, Gormez Y, Çelik M, Bakir-Gungor B. Multi fragment melting analysis system (MFMAS) for one-step identification of lactobacilli. J Microbiol Methods 2020; 177:106045. [PMID: 32890569 DOI: 10.1016/j.mimet.2020.106045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Revised: 08/18/2020] [Accepted: 08/19/2020] [Indexed: 11/23/2022]
Abstract
The accurate identification of lactobacilli is essential for the effective management of industrial practices associated with lactobacilli strains, such as the production of fermented foods or probiotic supplements. For this reason, in this study, we proposed the Multi Fragment Melting Analysis System (MFMAS)-lactobacilli based on high resolution melting (HRM) analysis of multiple DNA regions that have high interspecies heterogeneity for fast and reliable identification and characterization of lactobacilli. The MFMAS-lactobacilli is a new and customized version of the MFMAS, which was developed by our research group. MFMAS-lactobacilli is a combined system that consists of i) a ready-to-use plate, which is designed for multiple HRM analysis, and ii) a data analysis software, which is used to characterize lactobacilli species via incorporating machine learning techniques. Simultaneous HRM analysis of multiple DNA fragments yields a fingerprint for each tested strain and the identification is performed by comparing the fingerprints of unknown strains with those of known lactobacilli species registered in the MFMAS. In this study, a total of 254 isolates, which were recovered from fermented foods and probiotic supplements, were subjected to MFMAS analysis, and the results were confirmed by a combination of different molecular techniques. All of the analyzed isolates were exactly differentiated and accurately identified by applying the single-step procedure of MFMAS, and it was determined that all of the tested isolates belonged to 18 different lactobacilli species. The individual analysis of each target DNA region provided identification with an accuracy range from 59% to 90% for all tested isolates. However, when each target DNA region was analyzed simultaneously, perfect discrimination and 100% accurate identification were obtained even in closely related species. As a result, it was concluded that MFMAS-lactobacilli is a multi-purpose method that can be used to differentiate, classify, and identify lactobacilli species. Hence, our proposed system could be a potential alternative to overcome the inconsistencies and difficulties of the current methods.
Collapse
|
7
|
Mahato S, Paul S. Classification of Depression Patients and Normal Subjects Based on Electroencephalogram (EEG) Signal Using Alpha Power and Theta Asymmetry. J Med Syst 2019; 44:28. [PMID: 31834531 DOI: 10.1007/s10916-019-1486-z] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2018] [Accepted: 10/15/2019] [Indexed: 10/25/2022]
Abstract
Depression or Major Depressive Disorder (MDD) is a mental illness which negatively affects how a person thinks, acts or feels. MDD has become a major disease affecting millions of people presently. The diagnosis of depression is questionnaire based and is not based on any objective criteria. In this paper, feature extracted from EEG signal are used for the diagnosis of depression. Alpha, alpha1, alpha2, beta, delta and theta power and theta asymmetry was used as feature. Alpha1, alpha2 along with theta asymmetry was also used as a feature. Multi-Cluster Feature Selection (MCFS) was used for feature selection when feature combination was used. The classifiers used were Support Vector Machine (SVM), Logistic Regression (LR), Naïve-Bayesian (NB) and Decision Tree (DT). Alpha2 showed higher classification accuracy than alpha1 and alpha power in all applied classifier. From t-test it was found that there was a significant difference in the theta power of left and right hemisphere of normal subjects, but there was no significant difference in depression patients. Average theta asymmetry in normal subjects is higher than MDD patients but the difference in theta asymmetry in normal subjects and MDD patients is not significant. The combination of alpha2 and theta asymmetry showed the highest classification accuracy of 88.33% in SVM.
Collapse
|
8
|
Abstract
The gender recognition is an important research field to study evidence regarding some personal characteristics in the information and data society. However, some current traditional methods such as vision and sound have been exposed their own security weaknesses. Recently, biometric gender recognition based on Electroencephalography (EEG) signals has been widely used in information safety and medical fields. It is necessary to explore potential of using EEG to present a more robust and accurate result with larger training data based on sophisticated machine learning approaches. In this contribution, we present an automated gender recognition system by a hybrid model based on EEG data of resting state from twenty-eight subjects. These data are useful and handy to get insights into assessing the differences in personal gender. For achieving a good performance and a strong robustness, the system develops a hybrid model of combining random forest and logistic regression, and employs four common entropy measures to analyze the non-stationary EEG signals. Result also suggests that the recognition performance achieve an improved progress with an accuracy of 0.9982 and AUC of 0.9926 based on a nested tenfold cross-validation loop, implying that show a significant potential applicability of the proposed approach and is capable of recognizing personal gender.
Collapse
Affiliation(s)
- Ping Wang
- The Center of Collaboration and Innovation, Jiangxi University of Technology, Nanchang, 330098 China
| | - Jianfeng Hu
- The Center of Collaboration and Innovation, Jiangxi University of Technology, Nanchang, 330098 China
| |
Collapse
|
9
|
Hagedorn B, Clarke N, Ruane M, Faulkner K. Assessing aquifer vulnerability from lumped parameter modeling of modern water proportions in groundwater mixtures: Application to California's South Coast Range. Sci Total Environ 2018; 624:1550-1560. [PMID: 29929264 DOI: 10.1016/j.scitotenv.2017.12.115] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2017] [Revised: 12/09/2017] [Accepted: 12/11/2017] [Indexed: 06/08/2023]
Abstract
Groundwater in agriculture-dominated regions of California has historically experienced nitrate pollution due to the application of excess nitrogen fertilizers. This study examines the nitrate pollution vulnerability of groundwater in sedimentary aquifers of California's South Coast Range using stepwise logistic regression (LR) modeling. Our results indicate an overall excellent model fit, but an acceptable statistical significance, according to a Wald statistic (p-Wald) cutoff of 0.1, for only two explanatory variables: (1) the dissolved oxygen (DO) concentration, and (2) the modern (i.e., less than ~60year old) water proportion (MWP) in the groundwater mixture. The latter parameter was estimated via Lumped Parameter Modeling (LPM) of groundwater tritium, helium and radiocarbon data that have been corrected for isotopic dilution and exchange using a modified Fontes and Garnier (F&G) approach. The observation that other explanatory variables on land cover (i.e., percentage of agricultural land use, abundance of septic tanks and leaking underground fuel tanks, etc.) were statistically insignificant points out the limitations of low-resolution land cover data in groundwater vulnerability assessments. Our results highlight the utility of quantitative groundwater age and mixing data to evaluate pollution probability in the saturated zone. The herein presented approach can thus provide valuable results in comparable settings where the availability of fertilizer application, crop nitrogen uptake, and soil texture data is limited.
Collapse
Affiliation(s)
- Benjamin Hagedorn
- Department of Geological Sciences, Long Beach State University, CA 90840, USA.
| | - Natalie Clarke
- Department of Geological Sciences, Long Beach State University, CA 90840, USA.
| | - Merik Ruane
- Department of Geological Sciences, Long Beach State University, CA 90840, USA.
| | - Kirsten Faulkner
- Department of Geological Sciences, Long Beach State University, CA 90840, USA.
| |
Collapse
|