1
|
Montgomery BW, Tong X, Vsevolozhskaya O, Anthony JC. Using publicly available data to predict recreational cannabis legalization at the county-level: A machine learning approach. Int J Drug Policy 2024; 125:104340. [PMID: 38342052 PMCID: PMC11031282 DOI: 10.1016/j.drugpo.2024.104340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 01/26/2024] [Accepted: 01/28/2024] [Indexed: 02/13/2024]
Abstract
BACKGROUND There is substantial geographic variability in local cannabis policies within states that have legalized recreational cannabis. This study develops an interpretable machine learning model that uses county-level population demographics, sociopolitical factors, and estimates of substance use and mental illness prevalences to predict the legality of recreational cannabis sales within each U.S. county. METHODS We merged data and selected 14 model inputs from the 2010 Census, 2012 County Presidential Data from the MIT Elections Lab, and Small Area Estimates from the National Surveys on Drug Use and Health (NSDUH) from 2010 to 2012 at the county level. County policies were labeled as having recreational cannabis legal (RCL) if the sale of recreational cannabis was allowed anywhere in the county in 2014, resulting in 92 RCL and 3002 non-RCL counties. We used synthetic data augmentation and minority oversampling techniques to build an ensemble of 1000 logistic regressions on random sub-samples of the data, withholding one state at a time and building models from all remaining states. Performance was evaluated by comparing the predicted policy conditions with the actual outcomes in 2014. RESULTS When compared to the actual RCL policies in 2014, the ensemble estimated predictions of counties transitioning to RCL had a macro f1 average score of 0.61. The main factors associated with legalizing county-level recreational cannabis sales were the prevalences of past-month cannabis use and past-year cocaine use. CONCLUSION By leveraging publicly available data from 2010 to 2012, our model was able to achieve appreciable discrimination in predicting counties with legal recreational cannabis sales in 2014, however, there is room for improvement. Having demonstrated model performance in the first handful of states to legalize cannabis, additional testing with more recent data using time to event models is warranted.
Collapse
Affiliation(s)
| | - Xiaoran Tong
- Department of Biostatistics, College of Public Health, University of Kentucky, Research Facility No.1, 111 Washington Ave, Lexington, KY 40508, United States
| | - Olga Vsevolozhskaya
- Department of Biostatistics, College of Public Health, University of Kentucky, Research Facility No.1, 111 Washington Ave, Lexington, KY 40508, United States
| | - James C Anthony
- Department of Epidemiology and Biostatistics, College of Human Medicine, Michigan State University, B601 West Fee Hall, 909 Wilson Road, East Lansing, MI 48824-1030, United States
| |
Collapse
|
2
|
Lee H, Lee SH, Park H, Kim JH, Jung HS. ESG2PreEM: Automated ESG grade assessment framework using pre-trained ensemble models. Heliyon 2024; 10:e26404. [PMID: 38404885 PMCID: PMC10884917 DOI: 10.1016/j.heliyon.2024.e26404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Revised: 12/20/2023] [Accepted: 02/13/2024] [Indexed: 02/27/2024] Open
Abstract
Incorporating environmental, social, and governance (ESG) criteria is essential for promoting sustainability in business and is considered a set of principles that can increase a firm's value. This research proposes a strategy using text-based automated techniques to rate ESG. For autonomous classification, data were collected from the news archive LexisNexis and classified as E, S, or G based on the ESG materials provided by the Refinitiv-Sustainable Leadership Monitor, which has over 450 metrics. In addition, Bidirectional Encoder Representations from Transformers (BERT), Robustly optimized BERT approach (RoBERTa), and A Lite BERT (ALBERT) models were trained to accurately categorize preprocessed ESG documents using a voting ensemble model, and their performances were measured. The accuracy of the ensemble model utilizing BERT and ALBERT was found to be 80.79% with batch size 20. Additionally, this research validated the performance of the framework for companies included in the Dow Jones Industrial Average (DJIA) and compared it with the grade provided by Morgan Stanley Capital International (MSCI), a globally renowned ESG rating agency known for having the highest creditworthiness. This study supports the use of sophisticated natural language processing (NLP) techniques to attain important knowledge from large amounts of text-based data to improve ESG assessment criteria established by different rating agencies.
Collapse
Affiliation(s)
- Haein Lee
- Department of Applied Artificial Intelligence/ Department of Human Artificial Intelligence Interaction, Sungkyunkwan University, 03063, Seoul, South Korea
| | - Seon Hong Lee
- Department of Applied Artificial Intelligence/ Department of Human Artificial Intelligence Interaction, Sungkyunkwan University, 03063, Seoul, South Korea
| | - Heungju Park
- SKK Business School, Sungkyunkwan University, 03063, Seoul, South Korea
| | - Jang Hyun Kim
- Department of Interaction Science/ Department of Human Artificial Intelligence Interaction, Sungkyunkwan University, 03063, Seoul, South Korea
| | - Hae Sun Jung
- Department of Applied Artificial Intelligence, Sungkyunkwan University, 03063, Seoul, South Korea
| |
Collapse
|
3
|
Yuan S, Arellano AF, Knickrehm L, Chang HI, Castro CL, Furlong M. Towards quantifying atmospheric dispersion of pesticide spray drift in Yuma County Arizona. Atmos Environ (1994) 2024; 319:120262. [PMID: 38250567 PMCID: PMC10798238 DOI: 10.1016/j.atmosenv.2023.120262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2024]
Abstract
While pesticide vapor and particles from agricultural spray drift have been reported to pose a risk to public health, limited baseline ambient measurements exist to warrant an accurate assessment of their impacts at community-to-county-wide scale. Here, we present an initial modeling investigation of the transport and deposition of applied pesticides in an agricultural county in Arizona (Yuma County), to provide initial estimates on the corresponding enhancements in ambient levels of these spray drifts downwind of application sites. With a 50 × 50 km domain, we use the dispersion model CALPUFF with meteorology from the Weather Research and Forecasting (WRF) to investigate the spatiotemporal distribution of pesticide abundance due to spray drift from a representative sample of nine application sites. Data records for nine application days in September and October 2011, which are the peak months of pesticide application, were retroactively simulated for 48-h for all nine application sites using an active ingredient lambda-cyhalothrin, which is a commonly-used pesticide in the county. Twenty-one WRF/CALPUFF simulations were conducted with varying emissions, chemical lifetime, deposition rate, application height, and meteorology inputs, allowing for an ensemble-based analysis on the possible ranges in modeled abundance. Our results show that dispersion of vapors released at time of application heavily depends on prevailing meteorology, particularly wind speed and direction. Dispersion is limited to thin plumes that are easily transported out of the domain. The ensemble-mean vapor concentrations of the 48-h average (> 90 percentile domain-wide) range from 0.2 nanograms (ng)/m3 to 200 ng/m3, and the peak can be as high as 1000 ng/m3 near the application sites. Pesticide particles are mainly deposited within 1-2 km from the application sites at an average rate of 106 ng/km2/h but vary with particle mean diameter and standard deviation. While these findings are generally consistent with reported ambient levels in the literature, the associated ensemble-spread on these estimates are in the same order of magnitude as their ensemble-mean. At the two nearby communities downwind of these sites, we find that peak vapor concentrations are less than 50 ng/m3 with exposure times of less than an hour, as approximately 99.4% of the vapors are advected out and 99.5% of the particles deposit within the domain. Results of this study indicate pesticide spray drift from a sample of application sites and representative days in Fall may have a limited impact on neighboring communities. However, we strongly suggest that field measurements should be collected for model validation and more rigorous investigation of the actual scale of these impacts when the bulk of pesticide applications across the county, variation in active pesticide ingredients, and potential resuspension of deposited particles are considered.
Collapse
Affiliation(s)
- Sunyi Yuan
- Department of Hydrology and Atmospheric Sciences, University of Arizona, United States
- Now at COMAC Flight Test Center, 201323, Shanghai, China
| | - Avelino F. Arellano
- Department of Hydrology and Atmospheric Sciences, University of Arizona, United States
| | - Lauren Knickrehm
- Department of Hydrology and Atmospheric Sciences, University of Arizona, United States
| | - Hsin-I Chang
- Department of Hydrology and Atmospheric Sciences, University of Arizona, United States
| | - Christopher L. Castro
- Department of Hydrology and Atmospheric Sciences, University of Arizona, United States
| | - Melissa Furlong
- Community, Environment and Policy, Mel & Enid Zuckerman College of Public Health, University of Arizona, United States
| |
Collapse
|
4
|
Stein RA, Mchaourab HS. Rosetta Energy Analysis of AlphaFold2 models: Point Mutations and Conformational Ensembles. bioRxiv 2024:2023.09.05.556364. [PMID: 37732281 PMCID: PMC10508732 DOI: 10.1101/2023.09.05.556364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
There has been an explosive growth in the applications of AlphaFold2, and other structure prediction platforms, to accurately predict protein structures from a multiple sequence alignment (MSA) for downstream structural analysis. However, two outstanding questions persist in the field regarding the robustness of AlphaFold2 predictions of the consequences of point mutations and the completeness of its prediction of protein conformational ensembles. We combined our previously developed method SPEACH_AF with model relaxation and energetic analysis with Rosetta to address these questions. SPEACH_AF introduces residue substitutions across the MSA and not just within the input sequence. With respect to conformational ensembles, we combined SPEACH_AF and a new MSA subsampling method, AF_cluster, and for a benchmarked set of proteins, we found that the energetics of the conformational ensembles generated by AlphaFold2 correspond to those of experimental structures and explored by standard molecular dynamic methods. With respect to point mutations, we compared the structural and energetic consequences of having the mutation(s) in the input sequence versus in the whole MSA (SPEACH_AF). Both methods yielded models different from the wild-type sequence, with more robust changes when the mutation(s) were in the whole MSA. While our findings demonstrate the robustness of AlphaFold2 in analyzing point mutations and exploring conformational ensembles, they highlight the need for multi parameter structural and energetic analyses of these models to generate experimentally testable hypotheses.
Collapse
Affiliation(s)
- Richard A Stein
- Department of Molecular Physiology and Biophysics and Center for Applied AI in Protein Dynamics Vanderbilt University
| | - Hassane S Mchaourab
- Department of Molecular Physiology and Biophysics and Center for Applied AI in Protein Dynamics Vanderbilt University
| |
Collapse
|
5
|
Wang F, Liu CB, Wang Y, Wang XX, Yang YY, Jiang CY, Le QM, Liu X, Ma L, Wang FF. Morphine- and foot shock-responsive neuronal ensembles in the VTA possess different connectivity and biased GPCR signaling pathway. Theranostics 2024; 14:1126-1146. [PMID: 38250036 PMCID: PMC10797299 DOI: 10.7150/thno.90792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 01/02/2024] [Indexed: 01/23/2024] Open
Abstract
Background: Neurons in the ventral tegmental area (VTA) are sensitive to stress and their maladaptation have been implicated in the psychiatric disorders such as anxiety and addiction, etc. The cellular properties of the VTA neurons in response to different stressors related to different emotional processing remain to be investigated. Methods: By combining immediate early gene (IEG)-dependent labeling, rabies virus tracing, ensemble-specific transcriptomic analysis and fiber photometry recording in the VTA of male mice, the spatial distribution, brain-wide connectivity and cellular signaling pathways in the VTA neuronal ensembles in response to morphine (Mor-Ens) or foot shock (Shock-Ens) stimuli were investigated. Results: Optogenetic activation of the Mor-Ens drove approach behavior, whereas chemogenetic activation of the Shock-Ens increased the anxiety level in mice. Mor-Ens were clustered and enriched in the ventral VTA, contained a higher proportion of dopaminergic neurons, received more inputs from the dorsal medial striatum and the medial hypothalamic zone, and exhibited greater axonal arborization in the zona incerta and ventral pallidum. Whereas Shock-Ens were more dispersed, contained a higher proportion of GABAergic neurons, and received more inputs from the ventral pallidum and the lateral hypothalamic area. The downstream targets of the G protein and β-arrestin pathways, PLCβ3 and phosphorylated AKT1Thr308, were relatively enriched in the Mor-Ens and Shock-Ens, respectively. Cariprazine, the G-protein-biased agonist for the dopamine D2 receptor, increased the response of Mor-Ens to sucrose water and decreased the anxiety-like behavior during morphine withdrawal, whereas the β-arrestin-biased agonist UNC9994 decreased the response of Shock-Ens to tail suspension. Conclusions: Taken together, these findings reveal the heterogeneous connectivity and signaling pathways of the VTA neurons in response to morphine and foot shock, providing new insights for development of specific interventions for psychiatric disorders caused by various stressors associated with different VTA neuronal functions.
Collapse
Affiliation(s)
- Fan Wang
- School of Basic Medical Sciences, MOE Frontiers Center for Brain Science, Institutes of Brain Science, State Key Laboratory of Medical Neurobiology, Pharmacology Research Center, Department of Neurology, Huashan Hospital, Fudan University, Shanghai 200032, China
| | - Chao-bao Liu
- School of Basic Medical Sciences, MOE Frontiers Center for Brain Science, Institutes of Brain Science, State Key Laboratory of Medical Neurobiology, Pharmacology Research Center, Department of Neurology, Huashan Hospital, Fudan University, Shanghai 200032, China
| | - Yi Wang
- School of Basic Medical Sciences, MOE Frontiers Center for Brain Science, Institutes of Brain Science, State Key Laboratory of Medical Neurobiology, Pharmacology Research Center, Department of Neurology, Huashan Hospital, Fudan University, Shanghai 200032, China
| | - Xi-xi Wang
- School of Basic Medical Sciences, MOE Frontiers Center for Brain Science, Institutes of Brain Science, State Key Laboratory of Medical Neurobiology, Pharmacology Research Center, Department of Neurology, Huashan Hospital, Fudan University, Shanghai 200032, China
| | - Yuan-yao Yang
- School of Basic Medical Sciences, MOE Frontiers Center for Brain Science, Institutes of Brain Science, State Key Laboratory of Medical Neurobiology, Pharmacology Research Center, Department of Neurology, Huashan Hospital, Fudan University, Shanghai 200032, China
| | - Chang-you Jiang
- School of Basic Medical Sciences, MOE Frontiers Center for Brain Science, Institutes of Brain Science, State Key Laboratory of Medical Neurobiology, Pharmacology Research Center, Department of Neurology, Huashan Hospital, Fudan University, Shanghai 200032, China
- Research Unit of Addiction Memory, Chinese Academy of Medical Sciences (2021RU009), Shanghai 200032, China
| | - Qiu-min Le
- School of Basic Medical Sciences, MOE Frontiers Center for Brain Science, Institutes of Brain Science, State Key Laboratory of Medical Neurobiology, Pharmacology Research Center, Department of Neurology, Huashan Hospital, Fudan University, Shanghai 200032, China
- Research Unit of Addiction Memory, Chinese Academy of Medical Sciences (2021RU009), Shanghai 200032, China
| | - Xing Liu
- School of Basic Medical Sciences, MOE Frontiers Center for Brain Science, Institutes of Brain Science, State Key Laboratory of Medical Neurobiology, Pharmacology Research Center, Department of Neurology, Huashan Hospital, Fudan University, Shanghai 200032, China
- Research Unit of Addiction Memory, Chinese Academy of Medical Sciences (2021RU009), Shanghai 200032, China
| | - Lan Ma
- School of Basic Medical Sciences, MOE Frontiers Center for Brain Science, Institutes of Brain Science, State Key Laboratory of Medical Neurobiology, Pharmacology Research Center, Department of Neurology, Huashan Hospital, Fudan University, Shanghai 200032, China
- Research Unit of Addiction Memory, Chinese Academy of Medical Sciences (2021RU009), Shanghai 200032, China
| | - Fei-fei Wang
- School of Basic Medical Sciences, MOE Frontiers Center for Brain Science, Institutes of Brain Science, State Key Laboratory of Medical Neurobiology, Pharmacology Research Center, Department of Neurology, Huashan Hospital, Fudan University, Shanghai 200032, China
- Research Unit of Addiction Memory, Chinese Academy of Medical Sciences (2021RU009), Shanghai 200032, China
| |
Collapse
|
6
|
Amin J, Almas Anjum M, Ahmad A, Sharif MI, Kadry S, Kim J. Microscopic parasite malaria classification using best feature selection based on generalized normal distribution optimization. PeerJ Comput Sci 2024; 10:e1744. [PMID: 38196949 PMCID: PMC10773915 DOI: 10.7717/peerj-cs.1744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Accepted: 11/16/2023] [Indexed: 01/11/2024]
Abstract
Malaria disease can indeed be fatal if not identified and treated promptly. Due to advancements in the malaria diagnostic process, microscopy techniques are employed for blood cell analysis. Unfortunately, the diagnostic process of malaria via microscopy depends on microscopic skills. To overcome such issues, machine/deep learning algorithms can be proposed for more accurate and efficient detection of malaria. Therefore, a method is proposed for classifying malaria parasites that consist of three phases. The bilateral filter is applied to enhance image quality. After that shape-based and deep features are extracted. In shape-based pyramid histograms of oriented gradients (PHOG) features are derived with the dimension of N × 300. Deep features are derived from the residual network (ResNet)-50, and ResNet-18 at fully connected layers having the dimension of N × 1,000 respectively. The features obtained are fused serially, resulting in a dimensionality of N × 2,300. From this set, N × 498 features are chosen using the generalized normal distribution optimization (GNDO) method. The proposed method is accessed on a microscopic malarial parasite imaging dataset providing 99% classification accuracy which is better than as compared to recently published work.
Collapse
Affiliation(s)
- Javeria Amin
- University of Wah, Department of Computer Science, Wah Cantt, Pakistan
| | | | - Abraz Ahmad
- University of Wah, Department of Computer Science, Wah Cantt, Pakistan
| | - Muhammad Irfan Sharif
- Department of Information Sciences, University of Education Lahore, Jauharabad Campus, Jauharabad, Pakistan
| | - Seifedine Kadry
- Noroff University College, Kristiansand, Norway
- Artificial Intelligence Research Center (AIRC), Ajman University, Ajman, UAE
- MEU Research Unit, Middle East University, Amman, Jordan
- Department of Electrical and Computer Engineering, Lebanese American University, Byblos, Lebanon
| | - Jungeun Kim
- Department of Software, Kongju National University, Cheonan, Korea
| |
Collapse
|
7
|
Chen H, Tan C, Lin Z. Geographical origin identification of ginseng using near-infrared spectroscopy coupled with subspace-based ensemble classifiers. Spectrochim Acta A Mol Biomol Spectrosc 2024; 304:123315. [PMID: 37672885 DOI: 10.1016/j.saa.2023.123315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Revised: 08/19/2023] [Accepted: 08/29/2023] [Indexed: 09/08/2023]
Abstract
Ginseng is a well-known traditional herbal medicine and the ginseng available on the market may not actually be produced in a certain place as claimed. Traditional methods of identifying the geographical origin of Ginseng are subjective, time-consuming or destructive. A more efficient approach is desirable. The feasibility of combining near-infrared (NIR) spectroscopy with ensemble learning for discriminating ginseng producing area was explored. A total of 270 samples were collected and evenly partitioned into the training and test sets. Random subspace ensemble (RSE) that uses linear discriminant classifier (LDA) as weak learner (abbreviated RSE-LDA) was used to construct predictive models. Two parameters including the size of subspace and the number of learners in ensemble were optimized. Classic partial least algorithm (PLS) was applied to build the reference model. The sensitivity, specificity, and total accuracy of final RSE-LDA and PLS models were 97.8 %, 100 %, 99.3 %, and 93.3 %, 96.7 %, 95.6 %, respectively. In order to study the impact of training set composition on the results, the samples were randomly divided 200 times and the algorithm was run repeatedly to statistically analyze the sensitivity and specificity on the test set. Similar results were obtained. The effect of training set size was also investigated. It indicates that the combination of NIR spectroscopy with the RSE algorithm is a potential tool of discriminating the origin of Ginseng.
Collapse
Affiliation(s)
- Hui Chen
- Key Lab of Process Analysis and Control of Sichuan Universities, Yibin University, Yibin, Sichuan 644000, China; Hospital, Yibin University, Yibin, Sichuan 644000, China
| | - Chao Tan
- Key Lab of Process Analysis and Control of Sichuan Universities, Yibin University, Yibin, Sichuan 644000, China.
| | - Zan Lin
- Department of Knee Sports Injury, Sichuan Province Orthopedic Hospital, Chengdu, Sichuan 610041, China
| |
Collapse
|
8
|
Kaur I, Ahmad T. A cluster-based ensemble approach for congenital heart disease prediction. Comput Methods Programs Biomed 2024; 243:107922. [PMID: 37984098 DOI: 10.1016/j.cmpb.2023.107922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Revised: 10/24/2023] [Accepted: 11/06/2023] [Indexed: 11/22/2023]
Abstract
BACKGROUND One of the most prevalent birth disorders is congenital heart diseases (CHD). Although CHD risk factors have been the subject of numerous studies, their propensity to cause CHD has not been tested. Particularly few research has attempted to forecast CHD risk using population-based cross-sectional data, which is inherently imbalanced. OBJECTIVE The main goals of this study are to create a reliable data analysis model that can help with (i) a better understanding of congenital heart disease prediction in the presence of missing and unbalanced data and (ii) creating cohorts of expectant mothers with similar lifestyle characteristics. METHODS Clusters of patient cohorts are produced using the unsupervised data mining technique density-based spatial clustering of applications with noise (DBSCAN). For more accurate CHD prediction, a random forest model was trained using these clusters and their corresponding patterns. This study uses a dataset of 33,831 expectant mothers to make its prediction. Missing data were handled using the k-NN imputation approach, while extremely unbalanced data were balanced using SMOTE. These techniques are all data-driven and need little to no user or expert involvement. RESULTS AND CONCLUSION Using DBSCAN, three cohorts were found. The cluster information enhanced the random forest-based CHD prediction and revealed intricate factors that influence prediction accuracy. The proposed approach gave the highest results with 99 % accuracy and 0.91 AUC and performed better than the state-of-the-art methodologies. Hence, the suggested method using unsupervised learning can provide intricate information to the classifier and further enhance the performance of the classification.
Collapse
Affiliation(s)
- Ishleen Kaur
- Sri Guru Tegh Bahadur Khalsa College, University of Delhi, Delhi, India.
| | - Tanvir Ahmad
- Department of Computer Engineering, Jamia Millia Islamia, New Delhi, India
| |
Collapse
|
9
|
Ho JC, Sotoodeh M, Zhang W, Simpson RL, Hertzberg VS. An AdaBoost-based algorithm to detect hospital-acquired pressure injury in the presence of conflicting annotations. Comput Biol Med 2024; 168:107754. [PMID: 38016372 PMCID: PMC10843556 DOI: 10.1016/j.compbiomed.2023.107754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 11/07/2023] [Accepted: 11/20/2023] [Indexed: 11/30/2023]
Abstract
Hospital-acquired pressure injury is one of the most harmful events in clinical settings. Patients who do not receive early prevention and treatment can experience a significant financial burden and physical trauma. Several hospital-acquired pressure injury prediction algorithms have been developed to tackle this problem, but these models assume a consensus, gold-standard label (i.e., presence of pressure injury or not) is present for all training data. Existing definitions for identifying hospital-acquired pressure injuries are inconsistent due to the lack of high-quality documentation surrounding pressure injuries. To address this issue, we propose in this paper an ensemble-based algorithm that leverages truth inference methods to resolve label inconsistencies between various case definitions and the level of disagreements in annotations. Application of our method to MIMIC-III, a publicly available intensive care unit dataset, gives empirical results that illustrate the promise of learning a prediction model using truth inference-based labels and observed conflict among annotators.
Collapse
Affiliation(s)
- Joyce C Ho
- Department of Computer Science, Emory University, 400 Dowman Drive, Atlanta, 30322, GA, USA.
| | - Mani Sotoodeh
- Canadian Institute for Health Information, 495 Richmond Road, Suite 600 - WS-602, Ottawa, K2A 4H6, Ontario, Canada
| | - Wenhui Zhang
- Center for Data Science, Nell Hodgson Woodruff School of Nursing, Emory University, 1520 Clifton Road, Atlanta, 30322, GA, USA
| | - Roy L Simpson
- Center for Data Science, Nell Hodgson Woodruff School of Nursing, Emory University, 1520 Clifton Road, Atlanta, 30322, GA, USA
| | - Vicki Stover Hertzberg
- Center for Data Science, Nell Hodgson Woodruff School of Nursing, Emory University, 1520 Clifton Road, Atlanta, 30322, GA, USA
| |
Collapse
|
10
|
Gaudêncio AS, Azami H, Cardoso JM, Vaz PG, Humeau-Heurtier A. Bidimensional ensemble entropy: Concepts and application to emphysema lung computerized tomography scans. Comput Methods Programs Biomed 2023; 242:107855. [PMID: 37852145 DOI: 10.1016/j.cmpb.2023.107855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 10/01/2023] [Accepted: 10/08/2023] [Indexed: 10/20/2023]
Abstract
BACKGROUND AND OBJECTIVE Bidimensional entropy algorithms provide meaningful quantitative information on image textures. These algorithms have the advantage of relying on well-known one-dimensional entropy measures dedicated to the analysis of time series. However, uni- and bidimensional algorithms require the adjustment of some parameters that influence the obtained results or even findings. To address this, ensemble entropy techniques have recently emerged as a solution for signal analysis, offering greater stability and reduced bias in data patterns during entropy estimation. However, such algorithms have not yet been extended to their two-dimensional forms. METHODS We therefore propose six bidimensional algorithms, namely ensemble sample entropy, ensemble permutation entropy, ensemble dispersion entropy, ensemble distribution entropy, and two versions of ensemble fuzzy entropy based on different models or parameters initialization of an entropy algorithm. These new measures are first tested on synthetic images and further applied to a biomedical dataset. RESULTS The results suggest that ensemble techniques are able to detect different levels of image dynamics and their degrees of randomness. These methods lead to more stable entropy values (lower coefficients of variations) for the synthetic data. The results also show that these new measures can obtain up to 92.7% accuracy and 88.4% sensitivity when classifying patients with pulmonary emphysema through a k-nearest neighbors algorithm. CONCLUSIONS This is a further step towards the potential clinical deployment of bidimensional ensemble approaches to detect different levels of image dynamics and their successful performance on emphysema lung computerized tomography scans. These bidimensional ensemble entropy algorithms have potential to be used in various imaging applications thanks to their ability to distinguish more stable and less biased image patterns compared to their original counterparts.
Collapse
Affiliation(s)
- Andreia S Gaudêncio
- LIBPhys, Department of Physics, University of Coimbra, Coimbra, P-3004 516, Portugal; Univ Angers, LARIS, SFR MATHSTIC, F-49000 Angers, France.
| | - Hamed Azami
- Centre for Addiction and Mental Health, Toronto Dementia Research Alliance, Univ Toronto, Toronto, ON, Canada
| | - João M Cardoso
- LIBPhys, Department of Physics, University of Coimbra, Coimbra, P-3004 516, Portugal
| | - Pedro G Vaz
- LIBPhys, Department of Physics, University of Coimbra, Coimbra, P-3004 516, Portugal
| | | |
Collapse
|
11
|
Chatterjee A, Pahari N, Prinz A, Riegler M. AI and semantic ontology for personalized activity eCoaching in healthy lifestyle recommendations: a meta-heuristic approach. BMC Med Inform Decis Mak 2023; 23:278. [PMID: 38041041 PMCID: PMC10693173 DOI: 10.1186/s12911-023-02364-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 11/03/2023] [Indexed: 12/03/2023] Open
Abstract
BACKGROUND Automated coaches (eCoach) can help people lead a healthy lifestyle (e.g., reduction of sedentary bouts) with continuous health status monitoring and personalized recommendation generation with artificial intelligence (AI). Semantic ontology can play a crucial role in knowledge representation, data integration, and information retrieval. METHODS This study proposes a semantic ontology model to annotate the AI predictions, forecasting outcomes, and personal preferences to conceptualize a personalized recommendation generation model with a hybrid approach. This study considers a mixed activity projection method that takes individual activity insights from the univariate time-series prediction and ensemble multi-class classification approaches. We have introduced a way to improve the prediction result with a residual error minimization (REM) technique and make it meaningful in recommendation presentation with a Naïve-based interval prediction approach. We have integrated the activity prediction results in an ontology for semantic interpretation. A SPARQL query protocol and RDF Query Language (SPARQL) have generated personalized recommendations in an understandable format. Moreover, we have evaluated the performance of the time-series prediction and classification models against standard metrics on both imbalanced and balanced public PMData and private MOX2-5 activity datasets. We have used Adaptive Synthetic (ADASYN) to generate synthetic data from the minority classes to avoid bias. The activity datasets were collected from healthy adults (n = 16 for public datasets; n = 15 for private datasets). The standard ensemble algorithms have been used to investigate the possibility of classifying daily physical activity levels into the following activity classes: sedentary (0), low active (1), active (2), highly active (3), and rigorous active (4). The daily step count, low physical activity (LPA), medium physical activity (MPA), and vigorous physical activity (VPA) serve as input for the classification models. Subsequently, we re-verify the classifiers on the private MOX2-5 dataset. The performance of the ontology has been assessed with reasoning and SPARQL query execution time. Additionally, we have verified our ontology for effective recommendation generation. RESULTS We have tested several standard AI algorithms and selected the best-performing model with optimized configuration for our use case by empirical testing. We have found that the autoregression model with the REM method outperforms the autoregression model without the REM method for both datasets. Gradient Boost (GB) classifier outperforms other classifiers with a mean accuracy score of 98.00%, and 99.00% for imbalanced PMData and MOX2-5 datasets, respectively, and 98.30%, and 99.80% for balanced PMData and MOX2-5 datasets, respectively. Hermit reasoner performs better than other ontology reasoners under defined settings. Our proposed algorithm shows a direction to combine the AI prediction forecasting results in an ontology to generate personalized activity recommendations in eCoaching. CONCLUSION The proposed method combining step-prediction, activity-level classification techniques, and personal preference information with semantic rules is an asset for generating personalized recommendations.
Collapse
Affiliation(s)
- Ayan Chatterjee
- Department of Information and Communication Technology, Centre for E-Health, University of Agder, Grimstad, Norway.
- Department of Holistic Systems, Simula Metropolitan Center for Digital Engineering (SimulaMet), Oslo, Norway.
| | - Nibedita Pahari
- Department of Computer Science and Engineering, Maulana Abul Kalam Azad University of Technology, Kolkata, India
| | - Andreas Prinz
- Department of Information and Communication Technology, Centre for E-Health, University of Agder, Grimstad, Norway
| | - Michael Riegler
- Department of Holistic Systems, Simula Metropolitan Center for Digital Engineering (SimulaMet), Oslo, Norway
| |
Collapse
|
12
|
Choubin B, Shirani K, Hosseini FS, Taheri J, Rahmati O. Scrutinization of land subsidence rate using a supportive predictive model: Incorporating radar interferometry and ensemble soft-computing. J Environ Manage 2023; 345:118685. [PMID: 37517093 DOI: 10.1016/j.jenvman.2023.118685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 07/03/2023] [Accepted: 07/25/2023] [Indexed: 08/01/2023]
Abstract
Land subsidence is a huge challenge that land and water resource managers are still facing. Radar datasets revolutionize the way and give us the ability to provide information about it, thanks to their low cost. But identifying the most important drivers need for the modeling process. Machine learning methods are especially top of mind amid the prediction studies of natural hazards and hit new heights over the last couple of years. Hence, putting an efficient approach like integrated radar-and-ensemble-based method into practice for land subsidence rate simulation is not available yet which is the main aim of this research. In this study, the number of 52 pairs of radar images were used to identify subsidence from 2014 to 2019. Then, using the simulated annealing (SA) algorithm the key variables affecting land subsidence were identified among the topographical parameters, aquifer information, land use, hydroclimatic variables, and geological and soil factors. Afterward, three individual machine learning models (including Support Vector Machine, SVM; Gaussian Process, GP; Bayesian Additive Regression Tree, BART) along with three ensemble learning approaches were considered for land subsidence rate modeling. The results indicated that the subsidence varies between 0 and 59 cm in this period. Comparing the Radar results with the permanent geodynamic station exhibited a very strong correlation between the ground station and the radar images (R2 = 0.99, RMSE = 0.008). Parsing the input data by the SA indicated that key drivers are precipitation, elevation, percentage of fine-grained materials in the saturated zone, groundwater withdrawal, distance to road, groundwater decline, and aquifer thickness. The performance comparison indicated that ensemble models perform better than individual models, and among ensemble models, the nonlinear ensemble approach (i.e., BART model combination) provided better performance (RMSE = 0.061, RSR = 0.42, R2 = 0.83, PBIAS = 2.2). Also, the distribution shape of the probability density function in the non-linear ensemble model is much closer to the observations. Results indicated that the presence of significant fine-grained materials in unconsolidated aquifer systems can clarify the response of the aquifer system to groundwater decline, low recharge, and subsequent land subsidence. Therefore, the interaction between these factors can be very dangerous and intensify subsidence.
Collapse
Affiliation(s)
- Bahram Choubin
- Soil Conservation and Watershed Management Research Department, West Azarbaijan Agricultural and Natural Resources Research and Education Center, AREEO, Urmia, Iran.
| | - Kourosh Shirani
- Soil Conservation and Watershed Management Research Institute, Agricultural Research, Education and Extension Organization (AREEO), Tehran, Iran
| | - Farzaneh Sajedi Hosseini
- Reclamation of Arid and Mountainous Regions Department, Faculty of Natural Resources, University of Tehran, Karaj, Iran; University of Public Service, Budapest, Hungary
| | - Javad Taheri
- Soil Conservation and Watershed Management Research Department, West Azarbaijan Agricultural and Natural Resources Research and Education Center, AREEO, Urmia, Iran
| | - Omid Rahmati
- Soil Conservation and Watershed Management Research Department, Kurdistan Agricultural and Natural Resources Research and Education Center, AREEO, Sanandaj, Iran
| |
Collapse
|
13
|
Li H, Wu P, Dai J, Zou X. A Monte Carlo resampling based multiple feature-spaces ensemble (MFE) strategy for consistency-enhanced spectral variable selection. Anal Chim Acta 2023; 1279:341782. [PMID: 37827679 DOI: 10.1016/j.aca.2023.341782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 09/03/2023] [Accepted: 09/04/2023] [Indexed: 10/14/2023]
Abstract
BACKGROUND Variable selection has gained significant attention as a means to enhance spectroscopic calibration performance. However, existing methods still have certain limitations. Firstly, the selection results are sensitive to the choice of training samples, indicating that the selected variables may not be truly relevant. Secondly, the number of the selected variables is still too large in some situations, and modelling with too many predictors may lead to over-fitting issues. To address these challenges, we propose and implement a novel multiple feature-spaces ensemble (MFE) strategy with the least absolute shrinkage and selection operator (LASSO) method. RESULTS The MFE strategy synergizes the advantages of LASSO regression and ensemble strategy, thereby facilitating a more robust identification of key variables. We demonstrated the efficacy of our approach through extensive experimentation on publicly available datasets. The results not only demonstrate enhanced consistency in variable selection but also manifest improved prediction performance compared to benchmark methods. SIGNIFICANT The MFE strategy provided a comprehensive framework for conducting variable importance analysis, leading to robust and consistent variable selection. Furthermore, the improved consistency in variable selection contributes to enhanced prediction performance for spectroscopic calibration, making it more robust and accurate.
Collapse
Affiliation(s)
- Haoran Li
- School of Electrical and Information Engineering, Jiangsu University, Zhenjiang, 212013, China.
| | - Pengcheng Wu
- School of Electrical and Information Engineering, Jiangsu University, Zhenjiang, 212013, China.
| | - Jisheng Dai
- School of Electrical and Information Engineering, Jiangsu University, Zhenjiang, 212013, China; College of Information Science and Technology, Donghua University, Shanghai, 201620, China.
| | - Xiaobo Zou
- School of Food and Biological Engineering, Jiangsu University, Zhenjiang, 212013, China.
| |
Collapse
|
14
|
C Pereira S, Rocha J, Campilho A, Sousa P, Mendonça AM. Lightweight multi-scale classification of chest radiographs via size-specific batch normalization. Comput Methods Programs Biomed 2023; 236:107558. [PMID: 37087944 DOI: 10.1016/j.cmpb.2023.107558] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 04/17/2023] [Accepted: 04/17/2023] [Indexed: 05/03/2023]
Abstract
BACKGROUND AND OBJECTIVE Convolutional neural networks are widely used to detect radiological findings in chest radiographs. Standard architectures are optimized for images of relatively small size (for example, 224 × 224 pixels), which suffices for most application domains. However, in medical imaging, larger inputs are often necessary to analyze disease patterns. A single scan can display multiple types of radiological findings varying greatly in size, and most models do not explicitly account for this. For a given network, whose layers have fixed-size receptive fields, smaller input images result in coarser features, which better characterize larger objects in an image. In contrast, larger inputs result in finer grained features, beneficial for the analysis of smaller objects. By compromising to a single resolution, existing frameworks fail to acknowledge that the ideal input size will not necessarily be the same for classifying every pathology of a scan. The goal of our work is to address this shortcoming by proposing a lightweight framework for multi-scale classification of chest radiographs, where finer and coarser features are combined in a parameter-efficient fashion. METHODS We experiment on CheXpert, a large chest X-ray database. A lightweight multi-resolution (224 × 224, 448 × 448 and 896 × 896 pixels) network is developed based on a Densenet-121 model where batch normalization layers are replaced with the proposed size-specific batch normalization. Each input size undergoes batch normalization with dedicated scale and shift parameters, while the remaining parameters are shared across sizes. Additional external validation of the proposed approach is performed on the VinDr-CXR data set. RESULTS The proposed approach (AUC 83.27±0.17, 7.1M parameters) outperforms standard single-scale models (AUC 81.76±0.18, 82.62±0.11 and 82.39±0.13 for input sizes 224 × 224, 448 × 448 and 896 × 896, respectively, 6.9M parameters). It also achieves a performance similar to an ensemble of one individual model per scale (AUC 83.27±0.11, 20.9M parameters), while relying on significantly fewer parameters. The model leverages features of different granularities, resulting in a more accurate classification of all findings, regardless of their size, highlighting the advantages of this approach. CONCLUSIONS Different chest X-ray findings are better classified at different scales. Our study shows that multi-scale features can be obtained with nearly no additional parameters, boosting performance.
Collapse
Affiliation(s)
- Sofia C Pereira
- Faculty of Engineering of the University of Porto, Portugal; Institute for Systems and Computer Engineering, Technology and Science (INESC-TEC), Portugal.
| | - Joana Rocha
- Faculty of Engineering of the University of Porto, Portugal; Institute for Systems and Computer Engineering, Technology and Science (INESC-TEC), Portugal.
| | - Aurélio Campilho
- Faculty of Engineering of the University of Porto, Portugal; Institute for Systems and Computer Engineering, Technology and Science (INESC-TEC), Portugal.
| | - Pedro Sousa
- Hospital Center of Vila Nova de Gaia / Espinho, Portugal.
| | - Ana Maria Mendonça
- Faculty of Engineering of the University of Porto, Portugal; Institute for Systems and Computer Engineering, Technology and Science (INESC-TEC), Portugal.
| |
Collapse
|
15
|
Zhao LM, Zhang H, Kim D, Ghimire K, Hu R, Kargilis DC, Tang L, Meng S, Chen Q, Liao WH, Bai H, Jiao Z, Feng X. Head and neck tumor segmentation convolutional neural network robust to missing PET/CT modalities using channel dropout. Phys Med Biol 2023; 68. [PMID: 37019119 PMCID: PMC10126383 DOI: 10.1088/1361-6560/accac9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 04/05/2023] [Indexed: 04/07/2023]
Abstract
Objective
Radiation therapy for Head and Neck (H&N) cancer relies on accurate segmentation of the primary tumor. A robust, accurate, and automated gross tumor volume segmentation method is warranted for H&N cancer therapeutic management. The purpose of this study is to develop a novel deep learning segmentation model for H&N cancer based on independent and combined CT and FDG-PET modalities.
Approach
In this study, we developed a robust deep learning-based model leveraging information from both CT and PET. We implemented a 3D U-Net architecture with 5 levels of encoding and decoding, computing model loss through deep supervision. We used a channel dropout technique to emulate different combinations of input modalities. This technique prevents potential performance issues when only one modality is available, increasing model robustness. We implemented ensemble modeling by combining two types of convolutions with differing receptive fields, conventional and dilated, to improve capture of both fine details and global information.
Main Results
Our proposed methods yielded promising results, with a Dice Similarity Coefficient (DSC) of 0.802 when deployed on combined CT and PET, DSC of 0.610 when deployed on CT, and DSC of 0.750 when deployed on PET.
Significance
Application of a channel dropout method allowed for a single model to achieve high performance when deployed on either single modality images (CT or PET) or combined modality images (CT and PET). Furthermore, ensemble modeling showed comparable or improved performance by combining advantages of conventional and dilated convolution, while decreasing associated generalization errors. The presented segmentation techniques are clinically relevant to applications where images from a certain modality might not always be available.
Collapse
Affiliation(s)
- Lin-Mei Zhao
- Department of Radiology, Xiangya Hospital Central South University, 87 Xiangya Rd, Changsha, Hunan, 410008, CHINA
| | - Helen Zhang
- Department of Radiology, Brown University, 222 Richmond St, Providence, Rhode Island, 02903, UNITED STATES
| | - Daniel Kim
- Department of Radiology, Brown University, 222 Richmond St, Providence, Rhode Island, 02903, UNITED STATES
| | - Kanchan Ghimire
- Carina Medical, N/A, Lexington, Kentucky, 40513, UNITED STATES
| | - Rong Hu
- Department of Radiology, Xiangya Hospital Central South University, 87 Xiangya Rd, Changsha, Hunan, 410008, CHINA
| | - Daniel C Kargilis
- Department of Radiology and Radiological Science, Johns Hopkins Medicine, 1800 Orleans St., Baltimore, Maryland, 21287, UNITED STATES
| | - Lei Tang
- Department of Neurology, Xiangya Hospital Central South University, 87 Xiangya Rd, Changsha, Hunan, 410008, CHINA
| | - Shujuan Meng
- Department of Neurology, Xiangya Hospital Central South University, 87 Xiangya Rd, Changsha, Hunan, 410008, CHINA
| | - Quan Chen
- Carina Medical, N/A, Lexington, Kentucky, 40513, UNITED STATES
| | - Wei-Hua Liao
- Department of Radiology, Xiangya Hospital Central South University, 87 Xiangya Rd, Changsha, Hunan, 410008, CHINA
| | - Harrison Bai
- Department of Radiology and Radiological Science, Johns Hopkins Medicine, 1800 Orleans St., Baltimore, Maryland, 21287, UNITED STATES
| | - Zhicheng Jiao
- Department of Radiology, Brown University, 222 Richmond St, Providence, Rhode Island, 02903, UNITED STATES
| | - Xue Feng
- Carina Medical, N/A, Lexington, Kentucky, 40513, UNITED STATES
| |
Collapse
|
16
|
Hwang J, Kim J, Jeon B, Ko K, Ko E, Cho G. Estimation of ambient dose equivalent rate with a plastic scintillation detector using the least-square and first-order methods-based G(E) function. Appl Radiat Isot 2023; 194:110707. [PMID: 36787679 DOI: 10.1016/j.apradiso.2023.110707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 12/19/2022] [Accepted: 01/26/2023] [Indexed: 02/10/2023]
Abstract
Dose-rate monitoring instruments measure the ambient dose equivalent and hence are crucial for protecting workers from radiation exposure. Although plastic scintillation detectors (PSDs) are ideal equipment for dosimetry, they are rarely used owing to the lower detection efficiency than other scintillation detectors. In this study, we acquired ten types of G(E) functions to utilize a PSD in spectroscopic dosimetry using the least-square and first-order methods. The energy response of PSD was much improved in terms of dose evaluation.
Collapse
Affiliation(s)
- Jisung Hwang
- Dept. of Nuclear and Quantum Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea
| | - Junhyeok Kim
- Dept. of Nuclear and Quantum Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea
| | - Byoungil Jeon
- Artificial Intelligence Application & Strategy Team, Korea Atomic Energy Research Institute, Yuseong-gu, Daejeon, 34507, Republic of Korea
| | - Kilyoung Ko
- Dept. of Nuclear and Quantum Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea
| | - Eunbie Ko
- Dept. of Nuclear and Quantum Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea
| | - Gyuseong Cho
- Dept. of Nuclear and Quantum Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea.
| |
Collapse
|
17
|
Doğru A, Buyrukoğlu S, Arı M. A hybrid super ensemble learning model for the early-stage prediction of diabetes risk. Med Biol Eng Comput 2023; 61:785-97. [PMID: 36602674 DOI: 10.1007/s11517-022-02749-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Accepted: 12/22/2022] [Indexed: 01/06/2023]
Abstract
Diabetes mellitus has become a rapidly growing chronic health problem worldwide. There has been a noticeable increase in diabetes cases in the last two decades. Recent advances in ensemble machine learning methods play an important role in the early detection of diabetes mellitus. These methods are both faster and less costly than traditional methods. This study aims to propose a new super ensemble learning model to enable an early diagnosis of diabetes mellitus. Super learner is a cross-validation-based approach that makes better predictions by combining prediction results of more than one machine learning algorithm. The proposed super learner model was created with four base-learners (logistic regression, decision tree, random forest, gradient boosting) and a meta learner (support vector machines) as a result of a case study. Three different dataset were used to measure the robustness of the proposed model. Chi-square was determined as an optimal feature selection technique from five different techniques, and also hyper-parameter settings were made with GridSearch. Finally, the proposed new super learner model achieved to obtain the best accuracy results in the detection of Diabetes mellitus compared to the base-learners for the early-stage diabetes risk prediction (99.6%), PIMA (92%), and diabetes 130-US hospitals (98%) dataset, respectively. This study revealed that super learner algorithms can be effectively used in the detection of diabetes mellitus. Also, obtaining of the high and convincing statistical scores shows the robustness of the proposed super learner model.
Collapse
|
18
|
Moorman DE, Aston-Jones G. Prelimbic and infralimbic medial prefrontal cortex neuron activity signals cocaine seeking variables across multiple timescales. Psychopharmacology (Berl) 2023; 240:575-594. [PMID: 36464693 PMCID: PMC10406502 DOI: 10.1007/s00213-022-06287-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 11/23/2022] [Indexed: 12/12/2022]
Abstract
RATIONALE AND OBJECTIVES The prefrontal cortex is critical for execution and inhibition of reward seeking. Neural manipulation of rodent medial prefrontal cortex (mPFC) subregions differentially impacts execution and inhibition of cocaine seeking. Dorsal, or prelimbic (PL), and ventral, or infralimbic (IL) mPFC are implicated in cocaine seeking or extinction of cocaine seeking, respectively. This differentiation is not seen across all studies, indicating that further research is needed to understand specific mPFC contributions to drug seeking. METHODS We recorded neuronal activity in mPFC subregions during cocaine self-administration, extinction, and cue- and cocaine-induced reinstatement of cocaine seeking. RESULTS Both PL and IL neurons were phasically responsive around lever presses during cocaine self-administration, and activity in both areas was reduced during extinction. During both cue- and, to a greater extent, cocaine-induced reinstatement, PL neurons exhibited significantly elevated responses, in line with previous studies demonstrating a role for the region in relapse. The enhanced PL signaling in cocaine-induced reinstatement was driven by strong excitation and inhibition in different groups of neurons. Both of these response types were stronger in PL vs. IL neurons. Finally, we observed tonic changes in activity in all tasks phases, reflecting both session-long contextual modulation as well as minute-to-minute activity changes that were highly correlated with brain cocaine levels and motivation associated with cocaine seeking. CONCLUSIONS Although some differences were observed between PL and IL neuron activity across sessions, we found no evidence of a go/stop dichotomy in PL/IL function. Instead, our results demonstrate temporally heterogeneous prefrontal signaling during cocaine seeking and extinction in both PL and IL, revealing novel and complex functions for both regions during these behaviors. This combination of findings argues that mPFC neurons, in both PL and IL, provide multifaceted contributions to the regulation of drug seeking and addiction.
Collapse
Affiliation(s)
- David E Moorman
- Department of Psychological and Brain Sciences & Neuroscience and Behavior Graduate Program, University of Massachusetts Amherst, Amherst, MA, 01003, USA.
| | - Gary Aston-Jones
- Brain Health Institute, Rutgers University and Rutgers Biomedical and Health Sciences, Piscataway, NJ, 08854, USA
| |
Collapse
|
19
|
Wubneh MA, Worku TA, Chekol BZ. Climate change impact on water resources availability in the kiltie watershed, Lake Tana sub-basin, Ethiopia. Heliyon 2023; 9:e13941. [PMID: 36895343 PMCID: PMC9988557 DOI: 10.1016/j.heliyon.2023.e13941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 01/25/2023] [Accepted: 02/15/2023] [Indexed: 02/24/2023] Open
Abstract
Climate change's influence on water resource availability in watersheds must be evaluated to ensure food and water security. Using an ensemble of two global climate models (MIROC and MPI) and one regional climate model (RCA4), the impact of climate change on the availability of water in the Kiltie watershed was evaluated under the RCP4.5 and RCP8.5 scenarios for the year 2040s and 2070s. The flow was simulated using the HBV hydrological model, which needs fewer data and is typically employed in data-scarce settings. The model calibration and validation result, show RVE (relative volume error) of -1.27% and 6.93%, and NSE of 0.63 and 0.64 respectively. Seasonal Water Supply in the Future Under the RCP4.5 Scenario for the 2040s increased between 1.1 mm and 33.2 mm showing maximum incremental in August and a decrease in a range from 0.23 mm to 6.89 mm with a maximum decrease in September. While in the 2070s, water availability increases between 7.2 mm and 56.9 mm, with the largest increases occurring in October and the smallest reductions occurring in July by 9 mm. Future water availability increases under the RCP8.5 scenario during the 2040s period between 4.1 mm and 38.8 mm, with the highest increase occurring in August, and falls between 9.8 mm and 31.2 mm, with the maximum declines occurring in the spring seasons. Water availability in the 2070s, according to the RCP8.5 scenario, increases between 2.7 mm and 42.4 mm with the highest increments in August, and it decreases between 1.8 mm and 80.3 mm with maximum decreases in June. According to this study, climate change would make it easier to access water during the rainy season, necessitating the construction of water storage facilities so that surplus water can be used for dry farming. A watershed-level integrated water resource management strategy should be created quickly as future water supply will decline during the dry seasons.
Collapse
Affiliation(s)
- Melsew A. Wubneh
- Department of Hydraulic and Water Resources Engineering, University of Gondar, Gondar, Ethiopia
- Corresponding author.
| | - Tadege A. Worku
- Department of Hydraulic and Water Resources Engineering, Debre Tabor University, Debre Tabor, Ethiopia
| | - Bantalem Z. Chekol
- Department of Hydraulic and Water Resources Engineering, University of Gondar, Gondar, Ethiopia
| |
Collapse
|
20
|
Bania RK. Ensemble of deep transfer learning models for real-time automatic detection of face mask. Multimed Tools Appl 2023; 82:1-23. [PMID: 36743998 PMCID: PMC9890421 DOI: 10.1007/s11042-023-14408-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 11/12/2022] [Accepted: 01/21/2023] [Indexed: 06/18/2023]
Abstract
The COVID-19 pandemic is causing a global health crisis. Public spaces need to be safeguarded from the adverse effects of this pandemic. Wearing a facemask has become an adequate protection solution many governments adopt. Manual real-time monitoring of face mask wearing for many people is becoming a difficult task. This paper applies three heterogeneous deep transfer learning models, viz., ResNet50, Inception-v3, and VGG-16, to prepare an ensemble classification model for detecting whether a person is wearing a mask. The ensemble classification model is underlined by the concept of the weighted average technique. The proposed framework is based on two phases. An off-line phase that aims to prepare a classification model by following training-testing steps to detect and locate facemasks. Then in the second online phase, it is deployed to detect real-time faces from live videos, which are captured by a web-camera. The prepared model is compared with several state-of-the-art models. The proposed model has achieved the highest classification accuracy of 99.97%, precision of 0.997, recall of 0.997, F1-score of 0.997 and kappa coefficient 0.994. The superiority of the model over state-of-the-art compared methods is well evident from the experimental results.
Collapse
Affiliation(s)
- Rubul Kumar Bania
- Department of Computer Application, North-Eastern Hill University, Tura Campus, Tura, Meghalaya 794002 India
| |
Collapse
|
21
|
Zheng HL, An SY, Qiao BJ, Guan P, Huang DS, Wu W. A data-driven interpretable ensemble framework based on tree models for forecasting the occurrence of COVID-19 in the USA. Environ Sci Pollut Res Int 2023; 30:13648-13659. [PMID: 36131178 PMCID: PMC9492466 DOI: 10.1007/s11356-022-23132-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Accepted: 09/16/2022] [Indexed: 06/15/2023]
Abstract
This prevalence of coronavirus disease 2019 (COVID-19) has become one of the most serious public health crises. Tree-based machine learning methods, with the advantages of high efficiency, and strong interpretability, have been widely used in predicting diseases. A data-driven interpretable ensemble framework based on tree models was designed to forecast daily new cases of COVID-19 in the USA and to determine the important factors related to COVID-19. Based on a hyperparametric optimization technique, we developed three machine learning algorithms based on decision trees, including random forest (RF), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), and three linear ensemble models were used to integrate these outcomes for better prediction accuracy. Finally, the SHapley Additive explanation (SHAP) value was used to obtain the feature importance ranking. Our outcomes demonstrated that, among the three basic machine learners, the prediction accuracy was the following in descending order: LightGBM, XGBoost, and RF. The optimized LAD ensemble was the most precise prediction model that reduced the prediction error of the best base learner (LightGBM) by approximately 3.111%, while vaccination, wearing masks, less mobility, and government interventions had positive effects on the control and prevention of COVID-19.
Collapse
Affiliation(s)
- Hu-Li Zheng
- Department of Epidemiology, School of Public Health, China Medical University, No. 77 Puhe Road, Shenyang, Liaoning Province China
| | - Shu-Yi An
- Liaoning Provincial Center for Disease Control and Prevention, Shenyang, Liaoning China
| | - Bao-Jun Qiao
- Liaoning Provincial Center for Disease Control and Prevention, Shenyang, Liaoning China
| | - Peng Guan
- Department of Epidemiology, School of Public Health, China Medical University, No. 77 Puhe Road, Shenyang, Liaoning Province China
| | - De-Sheng Huang
- Department of Mathematics, School of Intelligent Medicine, China Medical University, Shenyang, Liaoning China
| | - Wei Wu
- Department of Epidemiology, School of Public Health, China Medical University, No. 77 Puhe Road, Shenyang, Liaoning Province China
| |
Collapse
|
22
|
Awasthi A, Goel N. Phishing website prediction using base and ensemble classifier techniques with cross-validation. Cybersecur (Singap) 2022; 5:22. [PMID: 36337366 PMCID: PMC9628466 DOI: 10.1186/s42400-022-00126-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Accepted: 05/04/2022] [Indexed: 06/16/2023]
Abstract
Internet or public internetwork has become a vulnerable place nowadays as there are so many threats available for the novice or careless users because there exist many types of tools and techniques being used by notorious people on it to victimize people somehow and gain access to their precious and personal data resulting in sometimes smaller. However, these victims suffer considerable losses in many instances due to their entrapment in such traps as hacking, cracking, data diddling, Trojan attacks, web jacking, salami attacks, and phishing. Therefore, despite the web users and the software and application developer's continuous effort to make and keep the IT infrastructure safe and secure using many techniques, including encryption, digital signatures, digital certificates, etc. this paper focuses on the problem of phishing to detect and predict phishing websites URLs, primary machine learning classifiers and new ensemble-based techniques are used on 2 distinct datasets. Again on a merged dataset, this study is conducted in 3 phases. First, they include classification using base classifiers, Ensemble classifiers, and then ensemble classifiers are tested with and without cross-validation. Finally, their performance is analyzed, and the results are presented at last to help others use this study for their upcoming research.
Collapse
Affiliation(s)
- Anjaneya Awasthi
- Department of Computer Applications, VBS Purvanchal University, Jaunpur, UP India
| | - Noopur Goel
- Department of Computer Applications, VBS Purvanchal University, Jaunpur, UP India
| |
Collapse
|
23
|
Chaney CP, Drake KA, Carroll TJ. Integration of Multiple, Diverse Methods to Identify Biologically Significant Marker Genes. J Mol Biol 2022; 434:167754. [PMID: 35868363 PMCID: PMC10210129 DOI: 10.1016/j.jmb.2022.167754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 06/28/2022] [Accepted: 07/14/2022] [Indexed: 11/17/2022]
Abstract
Identification of genes that reliably mark distinct cell types is key to leveraging single-cell RNA sequencing to better understand organismal biology. Such genes are usually chosen by measurement of differential expression between groups of cells and selecting those with the greatest magnitude or most statistically significant change. Many methods have been developed for performing such analyses, but no single, best method has emerged. Validating the results of these analyses is costly in terms of time, effort and resources. We demonstrate that applying an ensemble of such methods robustly identifies genes that mark cells that cluster together and that show restricted expression assessed by antisense mRNA in situ and immunofluorescence. This technique is easily extensible to any number of differential expression methods and the inclusion of additional methods is expected to result in further improvement in performance.
Collapse
Affiliation(s)
- Christopher P Chaney
- Department of Molecular Biology and Hamon Center for Regenerative Science and Medicine, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA; Department of Internal Medicine, Division of Nephrology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.
| | - Keri A Drake
- Department of Molecular Biology and Hamon Center for Regenerative Science and Medicine, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA; Division of Pediatric Nephrology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Thomas J Carroll
- Department of Molecular Biology and Hamon Center for Regenerative Science and Medicine, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA; Department of Internal Medicine, Division of Nephrology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| |
Collapse
|
24
|
Othman M, Elbasha AM, Naga YS, Moussa ND. Early prediction of hemodialysis complications employing ensemble techniques. Biomed Eng Online 2022; 21:74. [PMID: 36221077 PMCID: PMC9552449 DOI: 10.1186/s12938-022-01044-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 09/23/2022] [Indexed: 11/10/2022] Open
Abstract
Background and objectives Hemodialysis complications remain a critical threat among dialysis patients. They result in sudden termination of the session which impacts the efficiency of dialysis. As intra-dialytic complications are the result of the interplay of multiple factors, artificial intelligence can aid in their early prediction. This research aims to compare different machine learning tools for the early prediction of the most frequent hemodialysis complications with high performance, using the fewest predictors for easier practical implementation. Methods Fifty different variables were recorded during 6000 hemodialysis sessions performed in a regional dialysis unit in Egypt. The filter technique was used to extract the most relevant features. Then, five individual classifiers and three ensemble approaches were implemented to predict the occurrence of intra-dialytic complications. Different subsets of 25, 12 and 6 from the 50 collected features were tested. Results Random forest yielded the highest accuracy of 98% with the least training time using 12 features in a balanced dataset, while the gradient boosting allowed obtaining the highest F1-score of 94%, 92%, and 78% in the prediction of hypotension, hypertension, and dyspnea, respectively, in imbalanced datasets. Conclusion Applying different machine learning algorithms to big datasets can improve accuracy, reduce training time and model complexity allowing simple implementation in clinical practice. Our models can help nephrologists predict and possibly prevent dialysis complications. Supplementary Information The online version contains supplementary material available at 10.1186/s12938-022-01044-0.
Collapse
Affiliation(s)
- Mai Othman
- Biomedical Engineering Department, Medical Research Institute, Alexandria University, 165, Horreya Avenue, Hadara, Alexandria Governorate, Alexandria, Egypt
| | - Ahmed Mustafa Elbasha
- Internal Medicine Department, Faculty of Medicine, Alexandria University, Champollion Street, El-Khartoum Square, El Azareeta Medical Campus, Alexandria Governorate, Alexandria, Egypt
| | - Yasmine Salah Naga
- Internal Medicine Department, Faculty of Medicine, Alexandria University, Champollion Street, El-Khartoum Square, El Azareeta Medical Campus, Alexandria Governorate, Alexandria, Egypt
| | - Nancy Diaa Moussa
- Biomedical Engineering Department, Medical Research Institute, Alexandria University, 165, Horreya Avenue, Hadara, Alexandria Governorate, Alexandria, Egypt.
| |
Collapse
|
25
|
Banerjee A, Sarkar A, Roy S, Singh PK, Sarkar R. COVID-19 chest X-ray detection through blending ensemble of CNN snapshots. Biomed Signal Process Control 2022; 78:104000. [PMID: 35855489 PMCID: PMC9283670 DOI: 10.1016/j.bspc.2022.104000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 06/23/2022] [Accepted: 07/11/2022] [Indexed: 12/04/2022]
Abstract
The novel COVID-19 pandemic, has effectively turned out to be one of the deadliest events in modern history, with unprecedented loss of human life, major economic and financial setbacks and has set the entire world back quite a few decades. However, detection of the COVID-19 virus has become increasingly difficult due to the mutating nature of the virus, and the rise in asymptomatic cases. To counteract this and contribute to the research efforts for a more accurate screening of COVID-19, we have planned this work. Here, we have proposed an ensemble methodology for deep learning models to solve the task of COVID-19 detection from chest X-rays (CXRs) to assist Computer-Aided Detection (CADe) for medical practitioners. We leverage the strategy of transfer learning for Convolutional Neural Networks (CNNs), widely adopted in recent literature, and further propose an efficient ensemble network for their combination. The DenseNet-201 architecture has been trained only once to generate multiple snapshots, offering diverse information about the extracted features from CXRs. We follow the strategy of decision-level fusion to combine the decision scores using the blending algorithm through a Random Forest (RF) meta-learner. Experimental results confirm the efficacy of the proposed ensemble method, as shown through impressive results upon two open access COVID-19 CXR datasets — the largest COVID-X dataset, as well as a smaller scale dataset. On the large COVID-X dataset, the proposed model has achieved an accuracy score of 94.55% and on the smaller dataset by Chowdhury et al., the proposed model has achieved a 98.13% accuracy score.
Collapse
Affiliation(s)
- Avinandan Banerjee
- Department of Information Technology, Jadavpur University, Jadavpur University Second Campus, Plot No. 8, Salt Lake Bypass, LB Block, Sector III, Salt Lake City, Kolkata 700106, West Bengal, India
| | - Arya Sarkar
- Department of Computer Science, University of Engineering and Management, University Area, Plot No. III - B/5, New Town, Action Area - III, Kolkata 700160, West Bengal, India
| | - Sayantan Roy
- Department of Information Technology, Jadavpur University, Jadavpur University Second Campus, Plot No. 8, Salt Lake Bypass, LB Block, Sector III, Salt Lake City, Kolkata 700106, West Bengal, India
| | - Pawan Kumar Singh
- Department of Information Technology, Jadavpur University, Jadavpur University Second Campus, Plot No. 8, Salt Lake Bypass, LB Block, Sector III, Salt Lake City, Kolkata 700106, West Bengal, India
| | - Ram Sarkar
- Department of Computer Science and Engineering, Jadavpur University, 188, Raja S.C. Mallick Road, Kolkata 700032, West Bengal, India
| |
Collapse
|
26
|
De Meutter P, Delcloo AW. Uncertainty quantification of atmospheric transport and dispersion modelling using ensembles for CTBT verification applications. J Environ Radioact 2022; 250:106918. [PMID: 35653875 DOI: 10.1016/j.jenvrad.2022.106918] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 04/22/2022] [Accepted: 05/16/2022] [Indexed: 06/15/2023]
Abstract
Airborne concentrations of specific radioactive xenon isotopes (referred to as "radioxenon") are monitored globally as part of the verification regime of the Comprehensive Nuclear-Test-Ban Treaty, as these could be the signatures of a nuclear explosion. However, civilian nuclear facilities emit a regulated amount of radioxenon that can interfere with the very sensitive monitoring network. One approach to deal with this civilian background of radioxenon for Treaty verification purposes, is to explicitly simulate the expected radioxenon concentration from civilian sources at monitoring stations using atmospheric transport modelling. However, atmospheric transport modelling is prone to uncertainty, and the absence of an uncertainty quantification can limit its use for detection screening. In this paper, several ensembles are assessed that could provide an atmospheric transport modelling uncertainty quantification. These ensembles are validated with radioxenon observations, and recommendations are given for atmospheric transport modelling uncertainty quantification. Finally, the added value of an ensemble for detection screening is illustrated.
Collapse
Affiliation(s)
- Pieter De Meutter
- Belgian Nuclear Research Centre (SCK CEN) Boertang 200, 2400, Mol, Belgium; Royal Meteorological Institute of Belgium, Ringlaan 3, 1180, Brussels, Belgium.
| | - Andy W Delcloo
- Royal Meteorological Institute of Belgium, Ringlaan 3, 1180, Brussels, Belgium; Department of Physics and Astronomy, Ghent University, Krijgslaan 281/S9, B-9000, Ghent, Belgium
| |
Collapse
|
27
|
Witjes M, Parente L, van Diemen CJ, Hengl T, Landa M, Brodský L, Halounova L, Križan J, Antonić L, Ilie CM, Craciunescu V, Kilibarda M, Antonijević O, Glušica L. A spatiotemporal ensemble machine learning framework for generating land use/land cover time-series maps for Europe (2000-2019) based on LUCAS, CORINE and GLAD Landsat. PeerJ 2022; 10:e13573. [PMID: 35891647 PMCID: PMC9308969 DOI: 10.7717/peerj.13573] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 05/22/2022] [Indexed: 01/17/2023] Open
Abstract
A spatiotemporal machine learning framework for automated prediction and analysis of long-term Land Use/Land Cover dynamics is presented. The framework includes: (1) harmonization and preprocessing of spatial and spatiotemporal input datasets (GLAD Landsat, NPP/VIIRS) including five million harmonized LUCAS and CORINE Land Cover-derived training samples, (2) model building based on spatial k-fold cross-validation and hyper-parameter optimization, (3) prediction of the most probable class, class probabilities and model variance of predicted probabilities per pixel, (4) LULC change analysis on time-series of produced maps. The spatiotemporal ensemble model consists of a random forest, gradient boosted tree classifier, and an artificial neural network, with a logistic regressor as meta-learner. The results show that the most important variables for mapping LULC in Europe are: seasonal aggregates of Landsat green and near-infrared bands, multiple Landsat-derived spectral indices, long-term surface water probability, and elevation. Spatial cross-validation of the model indicates consistent performance across multiple years with overall accuracy (a weighted F1-score) of 0.49, 0.63, and 0.83 when predicting 43 (level-3), 14 (level-2), and five classes (level-1). Additional experiments show that spatiotemporal models generalize better to unknown years, outperforming single-year models on known-year classification by 2.7% and unknown-year classification by 3.5%. Results of the accuracy assessment using 48,365 independent test samples shows 87% match with the validation points. Results of time-series analysis (time-series of LULC probabilities and NDVI images) suggest forest loss in large parts of Sweden, the Alps, and Scotland. Positive and negative trends in NDVI in general match the land degradation and land restoration classes, with "urbanization" showing the most negative NDVI trend. An advantage of using spatiotemporal ML is that the fitted model can be used to predict LULC in years that were not included in its training dataset, allowing generalization to past and future periods, e.g. to predict LULC for years prior to 2000 and beyond 2020. The generated LULC time-series data stack (ODSE-LULC), including the training points, is publicly available via the ODSE Viewer. Functions used to prepare data and run modeling are available via the eumap library for Python.
Collapse
Affiliation(s)
| | | | | | - Tomislav Hengl
- OpenGeoHub, Wageningen, The Netherlands,Envirometrix, Wageningen, The Netherlands
| | - Martin Landa
- Department of Geomatics, Faculty of Civil Engineering, Czech Technical University of Prague, Prague, Czech Republic
| | - Lukáš Brodský
- Department of Geomatics, Faculty of Civil Engineering, Czech Technical University of Prague, Prague, Czech Republic
| | - Lena Halounova
- Department of Geomatics, Faculty of Civil Engineering, Czech Technical University of Prague, Prague, Czech Republic
| | | | | | - Codrina Maria Ilie
- Terrasigna, Bucharest, Romania,Technical University of Civil Engineering Bucharest, Bucharest, Romania
| | - Vasile Craciunescu
- Terrasigna, Bucharest, Romania,National Meteorological Administration of Romania, Bucharest, Romania
| | - Milan Kilibarda
- Department of Geodesy and Geoinformatics, Faculty of Civil Engineering, University of Belgrade, Belgrade, Serbia
| | - Ognjen Antonijević
- Department of Geodesy and Geoinformatics, Faculty of Civil Engineering, University of Belgrade, Belgrade, Serbia
| | | |
Collapse
|
28
|
Ray EL, Brooks LC, Bien J, Biggerstaff M, Bosse NI, Bracher J, Cramer EY, Funk S, Gerding A, Johansson MA, Rumack A, Wang Y, Zorn M, Tibshirani RJ, Reich NG. Comparing trained and untrained probabilistic ensemble forecasts of COVID-19 cases and deaths in the United States. Int J Forecast 2022:S0169-2070(22)00096-6. [PMID: 35791416 PMCID: PMC9247236 DOI: 10.1016/j.ijforecast.2022.06.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The U.S. COVID-19 Forecast Hub aggregates forecasts of the short-term burden of COVID-19 in the United States from many contributing teams. We study methods for building an ensemble that combines forecasts from these teams. These experiments have informed the ensemble methods used by the Hub. To be most useful to policy makers, ensemble forecasts must have stable performance in the presence of two key characteristics of the component forecasts: (1) occasional misalignment with the reported data, and (2) instability in the relative performance of component forecasters over time. Our results indicate that in the presence of these challenges, an untrained and robust approach to ensembling using an equally weighted median of all component forecasts is a good choice to support public health decision makers. In settings where some contributing forecasters have a stable record of good performance, trained ensembles that give those forecasters higher weight can also be helpful.
Collapse
Affiliation(s)
- Evan L Ray
- School of Public Health and Health Sciences, University of Massachusetts Amherst, United States of America
| | - Logan C Brooks
- Machine Learning Department, Carnegie Mellon University, United States of America
| | - Jacob Bien
- Department of Data Sciences and Operations, University of Southern California, United States of America
| | - Matthew Biggerstaff
- COVID-19 Response, U.S. Centers for Disease Control and Prevention, United States of America
| | - Nikos I Bosse
- London School of Hygiene & Tropical Medicine, United Kingdom
| | - Johannes Bracher
- Chair of Statistical Methods and Econometrics, Karlsruhe Institute of Technology, Germany
- Computational Statistics Group, Heidelberg Institute for Theoretical Studies, Germany
| | - Estee Y Cramer
- School of Public Health and Health Sciences, University of Massachusetts Amherst, United States of America
| | - Sebastian Funk
- London School of Hygiene & Tropical Medicine, United Kingdom
| | - Aaron Gerding
- School of Public Health and Health Sciences, University of Massachusetts Amherst, United States of America
| | - Michael A Johansson
- COVID-19 Response, U.S. Centers for Disease Control and Prevention, United States of America
| | - Aaron Rumack
- Machine Learning Department, Carnegie Mellon University, United States of America
| | - Yijin Wang
- School of Public Health and Health Sciences, University of Massachusetts Amherst, United States of America
| | - Martha Zorn
- School of Public Health and Health Sciences, University of Massachusetts Amherst, United States of America
| | - Ryan J Tibshirani
- Machine Learning Department, Carnegie Mellon University, United States of America
| | - Nicholas G Reich
- School of Public Health and Health Sciences, University of Massachusetts Amherst, United States of America
| |
Collapse
|
29
|
Sun J, Pi P, Tang C, Wang SH, Zhang YD. TSRNet: Diagnosis of COVID-19 based on self-supervised learning and hybrid ensemble model. Comput Biol Med 2022; 146:105531. [PMID: 35489140 PMCID: PMC9013277 DOI: 10.1016/j.compbiomed.2022.105531] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 03/12/2022] [Accepted: 04/13/2022] [Indexed: 12/16/2022]
Abstract
BACKGROUND As of Feb 27, 2022, coronavirus (COVID-19) has caused 434,888,591 infections and 5,958,849 deaths worldwide, dealing a severe blow to the economies and cultures of most countries around the world. As the virus has mutated, its infectious capacity has further increased. Effective diagnosis of suspected cases is an important tool to stop the spread of the pandemic. Therefore, we intended to develop a computer-aided diagnosis system for the diagnosis of suspected cases. METHODS To address the shortcomings of commonly used pre-training methods and exploit the information in unlabeled images, we proposed a new pre-training method based on transfer learning with self-supervised learning (TS). After that, a new convolutional neural network based on attention mechanism and deep residual network (RANet) was proposed to extract features. Based on this, a hybrid ensemble model (TSRNet) was proposed for classifying lung CT images of suspected patients as COVID-19 and normal. RESULTS Compared with the existing five models in terms of accuracy (DarkCOVIDNet: 98.08%; Deep-COVID: 97.58%; NAGNN: 97.86%; COVID-ResNet: 97.78%; Patch-based CNN: 88.90%), TSRNet has the highest accuracy of 99.80%. In addition, the recall, f1-score, and AUC of the model reached 99.59%, 99.78%, and 1, respectively. CONCLUSION TSRNet can effectively diagnose suspected COVID-19 cases with the help of the information in unlabeled and labeled images, thus helping physicians to adopt early treatment plans for confirmed cases.
Collapse
Affiliation(s)
- Junding Sun
- School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, Henan, 454000, PR China,Corresponding author
| | - Pengpeng Pi
- School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, Henan, 454000, PR China
| | - Chaosheng Tang
- School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, Henan, 454000, PR China
| | - Shui-Hua Wang
- School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, Henan, 454000, PR China,School of Computing and Mathematical Sciences, University of Leicester, Leicester, LE1 7RH, UK
| | - Yu-Dong Zhang
- School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, Henan, 454000, PR China,School of Computing and Mathematical Sciences, University of Leicester, Leicester, LE1 7RH, UK,Corresponding author. School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, Henan, 454000, PR China
| |
Collapse
|
30
|
Chowdhury NK, Kabir MA, Rahman MM, Islam SMS. Machine learning for detecting COVID-19 from cough sounds: An ensemble-based MCDM method. Comput Biol Med 2022; 145:105405. [PMID: 35318171 PMCID: PMC8926945 DOI: 10.1016/j.compbiomed.2022.105405] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2021] [Revised: 03/10/2022] [Accepted: 03/11/2022] [Indexed: 12/16/2022]
Abstract
This research aims to analyze the performance of state-of-the-art machine learning techniques for classifying COVID-19 from cough sounds and to identify the model(s) that consistently perform well across different cough datasets. Different performance evaluation metrics (precision, sensitivity, specificity, AUC, accuracy, etc.) make selecting the best performance model difficult. To address this issue, in this paper, we propose an ensemble-based multi-criteria decision making (MCDM) method for selecting top performance machine learning technique(s) for COVID-19 cough classification. We use four cough datasets, namely Cambridge, Coswara, Virufy, and NoCoCoDa to verify the proposed method. At first, our proposed method uses the audio features of cough samples and then applies machine learning (ML) techniques to classify them as COVID-19 or non-COVID-19. Then, we consider a multi-criteria decision-making (MCDM) method that combines ensemble technologies (i.e., soft and hard) to select the best model. In MCDM, we use the technique for order preference by similarity to ideal solution (TOPSIS) for ranking purposes, while entropy is applied to calculate evaluation criteria weights. In addition, we apply the feature reduction process through recursive feature elimination with cross-validation under different estimators. The results of our empirical evaluations show that the proposed method outperforms the state-of-the-art models. We see that when the proposed method is used for analysis using the Extra-Trees classifier, it has achieved promising results (AUC: 0.95, Precision: 1, Recall: 0.97).
Collapse
Affiliation(s)
- Nihad Karim Chowdhury
- Department of Computer Science and Engineering, University of Chittagong, Bangladesh,Corresponding author
| | - Muhammad Ashad Kabir
- Data Science Research Unit, School of Computing, Mathematics and Engineering, Charles Sturt University, NSW, Australia
| | - Md. Muhtadir Rahman
- Department of Computer Science and Engineering, University of Chittagong, Bangladesh
| | | |
Collapse
|
31
|
Rastogi S, Bansal D. Disinformation detection on social media: An integrated approach. Multimed Tools Appl 2022; 81:40675-40707. [PMID: 35582207 PMCID: PMC9098146 DOI: 10.1007/s11042-022-13129-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 03/10/2021] [Accepted: 04/10/2022] [Indexed: 06/15/2023]
Abstract
The emergence of social media platforms has amplified the dissemination of false information in various forms. Social media gives rise to virtual societies by providing freedom of expression to users in a democracy. Due to the presence of echo chambers on social media, social science studies play a vital role in the spread of false news. To this aim, we provide a comprehensive framework that is adapted from several scholarly studies. The framework is capable of detecting information into various types, namely real, disinformation and satire based on authenticity as well as intention. The process highlights the use of interdisciplinary approaches derived from fundamental theories of social science and integrating them with modern computational tools and techniques. Few of these theories claim that malicious users suggest writing fabricated content in a different style to attract the audience. Style-based methods evaluate the intention i.e., the content is written with an intent to mislead the audience or not. However, the writing style can be deceptive. Thus, it is important to involve user-oriented social information to improve model strength. Therefore, the paper used an integrated approach by combining style based and propagation-based features with a total of thirty-one features. The extracted features are divided into ten categories: relative frequency, quantity, complexity, uncertainty, sentiment, subjectivity, diversity, informality, additional, and popularity. The features have been iteratively utilized by supervised classifiers and then selected the best-correlated ones using the ANOVA test. Our experimental results have shown that the selected features are able to distinguish real from disinformation and satirical news. It has been observed that the Ensemble machine learning model outperformed other models over the developed multi-labelled corpus.
Collapse
Affiliation(s)
- Shubhangi Rastogi
- Department of Computer Science and Engineering, Punjab Engineering College (Deemed to be University), Chandigarh, India
| | - Divya Bansal
- Department of Computer Science and Engineering, Punjab Engineering College (Deemed to be University), Chandigarh, India
| |
Collapse
|
32
|
R S, Thaseen IS, M V, M D, M A, R M, Mahendran A, Alnumay W, Chatterjee P. An efficient hardware architecture based on an ensemble of deep learning models for COVID -19 prediction. Sustain Cities Soc 2022; 80:103713. [PMID: 35136715 PMCID: PMC8812126 DOI: 10.1016/j.scs.2022.103713] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 01/21/2022] [Accepted: 01/21/2022] [Indexed: 05/17/2023]
Abstract
Deep learning models demonstrate superior performance in image classification problems. COVID-19 image classification is developed using single deep learning models. In this paper, an efficient hardware architecture based on an ensemble deep learning model is built to identify the COVID-19 using chest X-ray (CXR) records. Five deep learning models namely ResNet, fitness, IRCNN (Inception Recurrent Convolutional Neural Network), effectiveness, and Fitnet are ensembled for fine-tuning and enhancing the performance of the COVID-19 identification; these models are chosen as they individually perform better in other applications. Experimental analysis shows that the accuracy, precision, recall, and F1 for COVID-19 detection are 0.99,0.98,0.98, and 0.98 respectively. An application-specific hardware architecture incorporates the pipeline, parallel processing, reusability of computational resources by carefully exploiting the data flow and resource availability. The processing element (PE) and the CNN architecture are modeled using Verilog, simulated, and synthesized using cadence with Taiwan Semiconductor Manufacturing Co Ltd (TSMC) 90 nm tech file. The simulated results show a 40% reduction in the latency and number of clock cycles. The computations and power consumptions are minimized by designing the PE as a data-aware unit. Thus, the proposed architecture is best suited for Covid-19 prediction and diagnosis.
Collapse
Affiliation(s)
- Sakthivel R
- School of Electronics Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - I Sumaiya Thaseen
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - Vanitha M
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - Deepa M
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - Angulakshmi M
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - Mangayarkarasi R
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - Anand Mahendran
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | | | | |
Collapse
|
33
|
Macêdo RL, Sousa FDR, Dumont HJ, Rietzler AC, Rocha O, Elmoor-Loureiro LMA. Climate change and niche unfilling tend to favor range expansion of Moina macrocopa Straus 1820, a potentially invasive cladoceran in temporary waters. Hydrobiologia 2022; 849:4015-4027. [PMID: 35342194 PMCID: PMC8938975 DOI: 10.1007/s10750-022-04835-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Revised: 02/19/2022] [Accepted: 02/24/2022] [Indexed: 06/14/2023]
Abstract
UNLABELLED Non-native species' introductions have increased in the last decades primarily due to anthropogenic causes such as climate change and globalization of trade. Moina macrocopa, a stress-tolerant cladoceran widely used in bioassays and aquaculture, is spreading in temporary and semi-temporary natural ponds outside its natural range. Here, we characterize the variations in the climatic niche of M. macrocopa during its invasions outside the native Palearctic range following introduction into the American continent. Specifically, we examined to what extent the climatic responses of this species have diverged from those characteristics for its native range. We also made predictions for its potential distribution under current and future scenarios. We found that the environmental space occupied by this species in its native and introduced distribution areas shares more characteristics than randomly expected. However, the introduced niche has a high degree of unfilling when displacing its original space towards the extension to drier and hotter conditions. Accordingly, M. macrocopa can invade new areas where it has not yet been recorded in response to warming temperatures and decreasing winter precipitation. In particular, temporary ponds are more vulnerable environments where climatic and environmental stresses may also lower biotic resistance. SUPPLEMENTARY INFORMATION The online version contains supplementary material available at 10.1007/s10750-022-04835-7.
Collapse
Affiliation(s)
- Rafael Lacerda Macêdo
- Núcleo de Estudos Limnológicos, Universidade Federal do Estado do Rio de Janeiro – UNIRIO, Av. 8 Pasteur, 458, Rio de Janeiro, RJ CEP 22290-240 Brazil
- Graduate Program in Ecology and Natural Resources, and Department of Ecology and Evolutionary Biology, Federal University of São Carlos - UFSCar, São Carlos, Brazil
| | - Francisco Diogo R. Sousa
- Laboratório de Taxonomia Animal, Unidade Acadêmica Especial de Ciências Biológicas, Universidade Federal de Jataí – UFJ, BR 364 km 195 n°3800, Jataí, GO CEP 75801-615 Brazil
- Programa de Pós-Graduação Em Zoologia, Universidade de Brasília - UnB, Campus Universitário Darcy Ribeiro, Brasília, CEP 70910-900 Brazil
| | - Henri J. Dumont
- Department of Biology, Ghent University, 9000 Ghent, Belgium
| | - Arnola C. Rietzler
- Department of Genetics, Ecology and Evolution, Federal University of Minas Gerais, Belo Horizonte, Brazil
| | - Odete Rocha
- Graduate Program in Ecology and Natural Resources, and Department of Ecology and Evolutionary Biology, Federal University of São Carlos - UFSCar, São Carlos, Brazil
| | - Lourdes M. A. Elmoor-Loureiro
- Laboratório de Taxonomia Animal, Unidade Acadêmica Especial de Ciências Biológicas, Universidade Federal de Jataí – UFJ, BR 364 km 195 n°3800, Jataí, GO CEP 75801-615 Brazil
| |
Collapse
|
34
|
Singh SK, Taylor RW, Pradhan B, Shirzadi A, Pham BT. Predicting sustainable arsenic mitigation using machine learning techniques. Ecotoxicol Environ Saf 2022; 232:113271. [PMID: 35121252 DOI: 10.1016/j.ecoenv.2022.113271] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 01/21/2022] [Accepted: 01/28/2022] [Indexed: 06/14/2023]
Abstract
This study evaluates state-of-the-art machine learning models in predicting the most sustainable arsenic mitigation preference. A Gaussian distribution-based Naïve Bayes (NB) classifier scored the highest Area Under the Curve (AUC) of the Receiver Operating Characteristic curve (0.82), followed by Nu Support Vector Classification (0.80), and K-Neighbors (0.79). Ensemble classifiers scored higher than 70% AUC, with Random Forest being the top performer (0.77), and Decision Tree model ranked fourth with an AUC of 0.77. The multilayer perceptron model also achieved high performance (AUC=0.75). Most linear classifiers underperformed, with the Ridge classifier at the top (AUC=0.73) and perceptron at the bottom (AUC=0.57). A Bernoulli distribution-based Naïve Bayes classifier was the poorest model (AUC=0.50). The Gaussian NB was also the most robust ML model with the slightest variation of Kappa score on training (0.58) and test data (0.64). The results suggest that nonlinear or ensemble classifiers could more accurately understand the complex relationships of socio-environmental data and help develop accurate and robust prediction models of sustainable arsenic mitigation. Furthermore, Gaussian NB is the best option when data is scarce.
Collapse
Affiliation(s)
- Sushant K Singh
- Department of Earth and Environmental Studies, Montclair State University, New Jersey, USA; The Center for Artificial Intelligence and Environmental Sustainability (CAIES) Foundation, Patna, Bihar, India.
| | - Robert W Taylor
- Department of Earth and Environmental Studies, Montclair State University, New Jersey, USA.
| | - Biswajeet Pradhan
- Centre for Advanced Modelling and Geospatial Information Systems (CAMGIS), School of Civil and Environmental Engineering, University of Technology Sydney, NSW 2007, Australia; Department of Energy and Mineral Resources Engineering, Sejong University, Choongmu-gwan, 209 Neungdong-ro Gwangjin-gu, Seoul 05006, Republic of Korea; Center of Excellence for Climate Change Research, King Abdulaziz University, P. O. Box 80234, Jeddah 21589, Saudi Arabia; Earth Observation Centre, Institute of Climate Change, Universiti Kebangsaan Malaysia, 43600 UKM, Bangi, Selangor, Malaysia.
| | - Ataollah Shirzadi
- College of Natural Resources, Department of Rangeland and Watershed Management Sciences, University of Kurdistan, Sanandaj, Iran.
| | - Binh Thai Pham
- Department of Geotechnical Engineering, University of Transport Technology, 54 Trieu Khuc, Thanh Xuan, Ha Noi, Viet Nam.
| |
Collapse
|
35
|
Lu Y, Forlenza E, Wilbur RR, Lavoie-Gagne O, Fu MC, Yanke AB, Cole BJ, Verma N, Forsythe B. Machine-learning model successfully predicts patients at risk for prolonged postoperative opioid use following elective knee arthroscopy. Knee Surg Sports Traumatol Arthrosc 2022; 30:762-772. [PMID: 33420807 DOI: 10.1007/s00167-020-06421-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 12/14/2020] [Indexed: 12/30/2022]
Abstract
PURPOSE Recovery following elective knee arthroscopy can be compromised by prolonged postoperative opioid utilization, yet an effective and validated risk calculator for this outcome remains elusive. The purpose of this study is to develop and validate a machine-learning algorithm that can reliably and effectively predict prolonged opioid consumption in patients following elective knee arthroscopy. METHODS A retrospective review of an institutional outcome database was performed at a tertiary academic medical centre to identify adult patients who underwent knee arthroscopy between 2016 and 2018. Extended postoperative opioid consumption was defined as opioid consumption at least 150 days following surgery. Five machine-learning algorithms were assessed for the ability to predict this outcome. Performances of the algorithms were assessed through discrimination, calibration, and decision curve analysis. RESULTS Overall, of the 381 patients included, 60 (20.3%) demonstrated sustained postoperative opioid consumption. The factors determined for prediction of prolonged postoperative opioid prescriptions were reduced preoperative scores on the following patient-reported outcomes: the IKDC, KOOS ADL, VR12 MCS, KOOS pain, and KOOS Sport and Activities. The ensemble model achieved the best performance based on discrimination (AUC = 0.74), calibration, and decision curve analysis. This model was integrated into a web-based open-access application able to provide both predictions and explanations. CONCLUSION Following appropriate external validation, the algorithm developed presently could augment timely identification of patients who are at risk of extended opioid use. Reduced scores on preoperative patient-reported outcomes, symptom duration and perioperative oral morphine equivalents were identified as novel predictors of prolonged postoperative opioid use. The predictive model can be easily deployed in the clinical setting to identify at risk patients thus allowing providers to optimize modifiable risk factors and appropriately counsel patients preoperatively. LEVEL OF EVIDENCE III.
Collapse
Affiliation(s)
- Yining Lu
- Department of Orthopaedic Surgery, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA.
| | - Enrico Forlenza
- Department of Orthopaedic Surgery, Rush University Medical Center, Chicago, IL, USA
| | - Ryan R Wilbur
- Department of Orthopaedic Surgery, Mayo Clinic, 200 First Street SW, Rochester, MN, 55905, USA
| | - Ophelie Lavoie-Gagne
- Department of Orthopaedic Surgery, Rush University Medical Center, Chicago, IL, USA
| | - Michael C Fu
- Department of Orthopaedic Surgery, Hospital for Special Surgery, New York, NY, USA
| | - Adam B Yanke
- Department of Orthopaedic Surgery, Rush University Medical Center, Chicago, IL, USA
| | - Brian J Cole
- Department of Orthopaedic Surgery, Rush University Medical Center, Chicago, IL, USA
| | - Nikhil Verma
- Department of Orthopaedic Surgery, Rush University Medical Center, Chicago, IL, USA
| | - Brian Forsythe
- Department of Orthopaedic Surgery, Rush University Medical Center, Chicago, IL, USA
| |
Collapse
|
36
|
Meakin S, Abbott S, Bosse N, Munday J, Gruson H, Hellewell J, Sherratt K, Funk S. Comparative assessment of methods for short-term forecasts of COVID-19 hospital admissions in England at the local level. BMC Med 2022; 20:86. [PMID: 35184736 PMCID: PMC8858706 DOI: 10.1186/s12916-022-02271-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 01/20/2022] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Forecasting healthcare demand is essential in epidemic settings, both to inform situational awareness and facilitate resource planning. Ideally, forecasts should be robust across time and locations. During the COVID-19 pandemic in England, it is an ongoing concern that demand for hospital care for COVID-19 patients in England will exceed available resources. METHODS We made weekly forecasts of daily COVID-19 hospital admissions for National Health Service (NHS) Trusts in England between August 2020 and April 2021 using three disease-agnostic forecasting models: a mean ensemble of autoregressive time series models, a linear regression model with 7-day-lagged local cases as a predictor, and a scaled convolution of local cases and a delay distribution. We compared their point and probabilistic accuracy to a mean-ensemble of them all and to a simple baseline model of no change from the last day of admissions. We measured predictive performance using the weighted interval score (WIS) and considered how this changed in different scenarios (the length of the predictive horizon, the date on which the forecast was made, and by location), as well as how much admissions forecasts improved when future cases were known. RESULTS All models outperformed the baseline in the majority of scenarios. Forecasting accuracy varied by forecast date and location, depending on the trajectory of the outbreak, and all individual models had instances where they were the top- or bottom-ranked model. Forecasts produced by the mean-ensemble were both the most accurate and most consistently accurate forecasts amongst all the models considered. Forecasting accuracy was improved when using future observed, rather than forecast, cases, especially at longer forecast horizons. CONCLUSIONS Assuming no change in current admissions is rarely better than including at least a trend. Using confirmed COVID-19 cases as a predictor can improve admissions forecasts in some scenarios, but this is variable and depends on the ability to make consistently good case forecasts. However, ensemble forecasts can make forecasts that make consistently more accurate forecasts across time and locations. Given minimal requirements on data and computation, our admissions forecasting ensemble could be used to anticipate healthcare needs in future epidemic or pandemic settings.
Collapse
Affiliation(s)
- Sophie Meakin
- Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene & Tropical Medicine, Keppel St, London, WC1E 7HT, UK.
| | - Sam Abbott
- Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene & Tropical Medicine, Keppel St, London, WC1E 7HT, UK
| | - Nikos Bosse
- Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene & Tropical Medicine, Keppel St, London, WC1E 7HT, UK
| | - James Munday
- Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene & Tropical Medicine, Keppel St, London, WC1E 7HT, UK
| | - Hugo Gruson
- Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene & Tropical Medicine, Keppel St, London, WC1E 7HT, UK
| | - Joel Hellewell
- Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene & Tropical Medicine, Keppel St, London, WC1E 7HT, UK
| | - Katharine Sherratt
- Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene & Tropical Medicine, Keppel St, London, WC1E 7HT, UK
| | - Sebastian Funk
- Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene & Tropical Medicine, Keppel St, London, WC1E 7HT, UK
| |
Collapse
|
37
|
Shukla S, Agarwal P, Kumar A. Disordered regions tune order in chromatin organization and function. Biophys Chem 2022; 281:106716. [PMID: 34844028 DOI: 10.1016/j.bpc.2021.106716] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Revised: 11/10/2021] [Accepted: 11/10/2021] [Indexed: 12/29/2022]
Abstract
Intrinsically disordered proteins or hybrid proteins with ordered domains and disordered regions (both collectively designated as IDP(R)s) defy the well-established structure-function paradigm due to their ability to perform multiple biological functions even in the absence of a well-defined 3D structure. IDP(R)s have a unique ability to exist as a functional heterogeneous ensemble, where they adopt multiple thermodynamically stable conformations with low energy barriers between states. The resultant structural plasticity or conformational adaptability provides them with a high functional diversity and ease of regulation. Hence, IDP(R)s are highly efficient biological machinery to mediate intricate cellular functions such as signaling, gene expression, and assembly of complex structures. One such structure is the nucleoprotein complex known as Chromatin. Interestingly, the proteins involved in shaping up the structure and function of chromatin are abundant in disordered regions, which serve more than just as mere flexible linkers. The disordered regions are involved in crucial processes such as gene expression regulation, chromatin architecture maintenance, and liquid-liquid phase separation initiation. This review is an attempt to explore the advantages and the functional and regulatory roles of intrinsic disorder in several Chromatin Associated Proteins from a mechanistic standpoint.
Collapse
Affiliation(s)
- Shivangi Shukla
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai, India
| | - Prakhar Agarwal
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai, India
| | - Ashutosh Kumar
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai, India.
| |
Collapse
|
38
|
Jensch A, Lopes MB, Vinga S, Radde N. ROSIE: RObust Sparse ensemble for outlIEr detection and gene selection in cancer omics data. Stat Methods Med Res 2022; 31:947-958. [PMID: 35072570 PMCID: PMC9014683 DOI: 10.1177/09622802211072456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The extraction of novel information from omics data is a challenging task, in
particular, since the number of features (e.g. genes) often far exceeds the
number of samples. In such a setting, conventional parameter estimation leads to
ill-posed optimization problems, and regularization may be required. In
addition, outliers can largely impact classification accuracy. Here we introduce ROSIE, an ensemble classification approach, which combines
three sparse and robust classification methods for outlier detection and feature
selection and further performs a bootstrap-based validity check. Outliers of
ROSIE are determined by the rank product test using outlier rankings of all
three methods, and important features are selected as features commonly selected
by all methods. We apply ROSIE to RNA-Seq data from The Cancer Genome Atlas (TCGA) to classify
observations into Triple-Negative Breast Cancer (TNBC) and non-TNBC tissue
samples. The pre-processed dataset consists of 16,600 genes and more than 1,000 samples. We demonstrate that ROSIE selects important features
and outliers in a robust way. Identified outliers are concordant with the
distribution of the commonly selected genes by the three methods, and results
are in line with other independent studies. Furthermore, we discuss the
association of some of the selected genes with the TNBC subtype in other
investigations. In summary, ROSIE constitutes a robust and sparse procedure to
identify outliers and important genes through binary classification. Our
approach is ad hoc applicable to other datasets, fulfilling the overall goal of
simultaneously identifying outliers and candidate disease biomarkers to the
targeted in therapy research and personalized medicine frameworks.
Collapse
Affiliation(s)
- Antje Jensch
- Institute for Systems Theory and Automatic Control, 9149University of Stuttgart, Germany
| | - Marta B Lopes
- Center for Mathematics and Applications (CMA), NOVA School of Science and Technology, Caparica, Portugal.,NOVA Laboratory for Computer Science and Informatics (NOVA LINCS), NOVA School of Science and Technology, Caparica, Portugal
| | - Susana Vinga
- INESC-ID, Instituto Superior Técnico, 72971Universidade de Lisboa, Portugal.,IDMEC, Instituto Superior Técnico, Universidade de Lisboa, Portugal
| | - Nicole Radde
- Institute for Systems Theory and Automatic Control, 9149University of Stuttgart, Germany
| |
Collapse
|
39
|
Firdous N, Bhardwaj S. Handling of derived imbalanced dataset using XGBoost for identification of pulmonary embolism-a non-cardiac cause of cardiac arrest. Med Biol Eng Comput 2022; 60:551-8. [PMID: 35023074 DOI: 10.1007/s11517-021-02455-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Accepted: 10/07/2021] [Indexed: 10/19/2022]
Abstract
Relationship between pulmonary embolism and heart failure is presented in this paper. The proposed research is divided into two phases. The first phase includes the establishment of a novel database with the help of a Cleveland's database for cardiology in order to establish a link between pulmonary embolism and heart failure. The connectivity is based on the relationship between the stroke volume and the pulse pressure (Pp < 25% (ap_hi)). The second phase includes the applicability of machine learning on the novel database. Novel database formed in this work is imbalanced, resulting in the overfitting problem. XGBoost has been used to get rid of overfitting problem. Efficiency has been increased by formulating an ensemble technique by combining extreme learning machines, IB3 tree, logistic regression, and averaged neural network (avNNet) models.
Collapse
|
40
|
Alhenawi E, Al-Sayyed R, Hudaib A, Mirjalili S. Feature selection methods on gene expression microarray data for cancer classification: A systematic review. Comput Biol Med 2022; 140:105051. [PMID: 34839186 DOI: 10.1016/j.compbiomed.2021.105051] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Revised: 11/01/2021] [Accepted: 11/15/2021] [Indexed: 11/29/2022]
Abstract
This systematic review provides researchers interested in feature selection (FS) for processing microarray data with comprehensive information about the main research directions for gene expression classification conducted during the recent seven years. A set of 132 researches published by three different publishers is reviewed. The studied papers are categorized into nine directions based on their objectives. The FS directions that received various levels of attention were then summarized. The review revealed that 'propose hybrid FS methods' represented the most interesting research direction with a percentage of 34.9%, while the other directions have lower percentages that ranged from 13.6% down to 3%. This guides researchers to select the most competitive research direction. Papers in each category are thoroughly reviewed based on six perspectives, mainly: method(s), classifier(s), dataset(s), dataset dimension(s) range, performance metric(s), and result(s) achieved.
Collapse
Affiliation(s)
- Esra'a Alhenawi
- King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan.
| | - Rizik Al-Sayyed
- King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan.
| | - Amjad Hudaib
- King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan.
| | - Seyedali Mirjalili
- Center for Artificial Intelligence Research and Optimization, Torrens University Australia, Fortitude Valley, Brisbane, 4006, QLD, Australia; Yonsei Frontier Lab, Yonsei University, Seoul, South Korea.
| |
Collapse
|
41
|
Banerjee A, Bhattacharya R, Bhateja V, Singh PK, Lay-Ekuakille A, Sarkar R. COFE-Net: An ensemble strategy for Computer-Aided Detection for COVID-19. Measurement (Lond) 2022; 187:110289. [PMID: 34663998 PMCID: PMC8516129 DOI: 10.1016/j.measurement.2021.110289] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 09/29/2021] [Accepted: 10/03/2021] [Indexed: 05/26/2023]
Abstract
Biomedical images contain a large volume of sensor measurements, which can reveal the descriptors of the disease under investigation. Computer-based analysis of such measurements helps detect the disease, and thereby swiftly aid medical professionals to choose adequate therapy. In this paper, we propose a robust deep learning ensemble framework known as COVID Fuzzy Ensemble Network, or COFE-Net. This strategy is proposed for the task of COVID-19 screening from chest X-rays (CXR) and CT Scans, as a part of Computer-Aided Detection (CADe) for medical practitioners. We leverage the strategy of Transfer Learning for Convolutional Neural Networks (CNNs) widely adopted in recent literature, and further propose an efficient ensemble network for their combination. The principles of fuzzy logic have been leveraged to combine the measured decision scores generated by three state-of-the-art CNNs - Inception V3, Inception ResNet V2 and DenseNet 201 - through the Choquet fuzzy integral. Experimental results support the efficacy of our approach over empirical ensembling, as the fuzzy ensembling strategy for biomedical measurement consists of dynamic refactoring of the classifier ensemble weights on the fly, based upon the confidence scores for coalitions of inputs. This is the chief advantage of our biomedical measurement strategy over others as other methods do not adjust to the multiple generated measurements dynamically unlike ours.Impressive results on multiple datasets demonstrate the effectiveness of the proposed method. The source code of our proposed method is made available at: https://github.com/theavicaster/covid-cade-ensemble.
Collapse
Affiliation(s)
- Avinandan Banerjee
- Department of Information Technology, Jadavpur University, Kolkata 700106, India
| | - Rajdeep Bhattacharya
- Department of Computer Science and Engineering, Jadavpur University, Kolkata 700032, India
| | - Vikrant Bhateja
- Department of Electronics and Communication Engineering, Shri Ramswaroop Memorial Group of Professional Colleges (SRMGPC), Lucknow 226028, Uttar Pradesh, India
- Dr. A.P.J. Abdul Kalam Technical University, Lucknow, Uttar Pradesh, India
| | - Pawan Kumar Singh
- Department of Information Technology, Jadavpur University, Kolkata 700106, India
| | - Aime' Lay-Ekuakille
- Dipartimento d'Ingegneria dell'Innovazione (DII), Università del Salento (Dept of Innovation Engineering, University of Salento) Via Monteroni, Ed. "Corpo O" 73100 Lecce (IT), Italy
| | - Ram Sarkar
- Department of Computer Science and Engineering, Jadavpur University, Kolkata 700032, India
| |
Collapse
|
42
|
De Angeli K, Gao S, Danciu I, Durbin EB, Wu XC, Stroup A, Doherty J, Schwartz S, Wiggins C, Damesyn M, Coyle L, Penberthy L, Tourassi GD, Yoon HJ. Class imbalance in out-of-distribution datasets: Improving the robustness of the TextCNN for the classification of rare cancer types. J Biomed Inform 2022; 125:103957. [PMID: 34823030 DOI: 10.1016/j.jbi.2021.103957] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 11/04/2021] [Accepted: 11/17/2021] [Indexed: 01/03/2023]
Abstract
In the last decade, the widespread adoption of electronic health record documentation has created huge opportunities for information mining. Natural language processing (NLP) techniques using machine and deep learning are becoming increasingly widespread for information extraction tasks from unstructured clinical notes. Disparities in performance when deploying machine learning models in the real world have recently received considerable attention. In the clinical NLP domain, the robustness of convolutional neural networks (CNNs) for classifying cancer pathology reports under natural distribution shifts remains understudied. In this research, we aim to quantify and improve the performance of the CNN for text classification on out-of-distribution (OOD) datasets resulting from the natural evolution of clinical text in pathology reports. We identified class imbalance due to different prevalence of cancer types as one of the sources of performance drop and analyzed the impact of previous methods for addressing class imbalance when deploying models in real-world domains. Our results show that our novel class-specialized ensemble technique outperforms other methods for the classification of rare cancer types in terms of macro F1 scores. We also found that traditional ensemble methods perform better in top classes, leading to higher micro F1 scores. Based on our findings, we formulate a series of recommendations for other ML practitioners on how to build robust models with extremely imbalanced datasets in biomedical NLP applications.
Collapse
|
43
|
Peng L, Yuan R, Shen L, Gao P, Zhou L. LPI-EnEDT: an ensemble framework with extra tree and decision tree classifiers for imbalanced lncRNA-protein interaction data classification. BioData Min 2021; 14:50. [PMID: 34861891 PMCID: PMC8642957 DOI: 10.1186/s13040-021-00277-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 08/22/2021] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Long noncoding RNAs (lncRNAs) have dense linkages with various biological processes. Identifying interacting lncRNA-protein pairs contributes to understand the functions and mechanisms of lncRNAs. Wet experiments are costly and time-consuming. Most computational methods failed to observe the imbalanced characterize of lncRNA-protein interaction (LPI) data. More importantly, they were measured based on a unique dataset, which produced the prediction bias. RESULTS In this study, we develop an Ensemble framework (LPI-EnEDT) with Extra tree and Decision Tree classifiers to implement imbalanced LPI data classification. First, five LPI datasets are arranged. Second, lncRNAs and proteins are separately characterized based on Pyfeat and BioTriangle and concatenated as a vector to represent each lncRNA-protein pair. Finally, an ensemble framework with Extra tree and decision tree classifiers is developed to classify unlabeled lncRNA-protein pairs. The comparative experiments demonstrate that LPI-EnEDT outperforms four classical LPI prediction methods (LPI-BLS, LPI-CatBoost, LPI-SKF, and PLIPCOM) under cross validations on lncRNAs, proteins, and LPIs. The average AUC values on the five datasets are 0.8480, 0,7078, and 0.9066 under the three cross validations, respectively. The average AUPRs are 0.8175, 0.7265, and 0.8882, respectively. Case analyses suggest that there are underlying associations between HOTTIP and Q9Y6M1, NRON and Q15717. CONCLUSIONS Fusing diverse biological features of lncRNAs and proteins and exploiting an ensemble learning model with Extra tree and decision tree classifiers, this work focus on imbalanced LPI data classification as well as interaction information inference for a new lncRNA (or protein).
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China.,College of Life Sciences and Chemistry, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Ruya Yuan
- School of Computer Science, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Ling Shen
- School of Computer Science, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Pengfei Gao
- College of Life Sciences and Chemistry, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Liqian Zhou
- School of Computer Science, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China.
| |
Collapse
|
44
|
Kundu R, Singh PK, Mirjalili S, Sarkar R. COVID-19 detection from lung CT-Scans using a fuzzy integral-based CNN ensemble. Comput Biol Med 2021; 138:104895. [PMID: 34649147 PMCID: PMC8483997 DOI: 10.1016/j.compbiomed.2021.104895] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Revised: 09/19/2021] [Accepted: 09/22/2021] [Indexed: 12/16/2022]
Abstract
The COVID-19 pandemic has collapsed the public healthcare systems, along with severely damaging the economy of the world. The SARS-CoV-2 virus also known as the coronavirus, led to community spread, causing the death of more than a million people worldwide. The primary reason for the uncontrolled spread of the virus is the lack of provision for population-wise screening. The apparatus for RT-PCR based COVID-19 detection is scarce and the testing process takes 6-9 h. The test is also not satisfactorily sensitive (71% sensitive only). Hence, Computer-Aided Detection techniques based on deep learning methods can be used in such a scenario using other modalities like chest CT-scan images for more accurate and sensitive screening. In this paper, we propose a method that uses a Sugeno fuzzy integral ensemble of four pre-trained deep learning models, namely, VGG-11, GoogLeNet, SqueezeNet v1.1 and Wide ResNet-50-2, for classification of chest CT-scan images into COVID and Non-COVID categories. The proposed framework has been tested on a publicly available dataset for evaluation and it achieves 98.93% accuracy and 98.93% sensitivity on the same. The model outperforms state-of-the-art methods on the same dataset and proves to be a reliable COVID-19 detector. The relevant source codes for the proposed approach can be found at: https://github.com/Rohit-Kundu/Fuzzy-Integral-Covid-Detection.
Collapse
Affiliation(s)
- Rohit Kundu
- Department of Electrical Engineering, Jadavpur University, 188, Raja S. C. Mallick Road, Kolkata-700032, West Bengal, India
| | - Pawan Kumar Singh
- Department of Information Technology, Jadavpur University, Jadavpur University Second Campus, Plot No. 8, Salt Lake Bypass, LB Block, Sector III, Salt Lake City, Kolkata-700106, West Bengal, India
| | - Seyedali Mirjalili
- Centre for Artificial Intelligence Research and Optimization, Torrens University, Australia,Yonser Frontier Lab, Yonsei University, South Korea,Corresponding author. Centre for Artificial Intelligence Research and Optimization, Torrens University, Australia
| | - Ram Sarkar
- Department of Computer Science & Engineering, Jadavpur University, 188, Raja S. C. Mallick Road, Kolkata-700032, West Bengal, India
| |
Collapse
|
45
|
Fauzi MA, Yang B. Continuous Stress Detection of Hospital Staff Using Smartwatch Sensors and Classifier Ensemble. Stud Health Technol Inform 2021; 285:245-250. [PMID: 34734881 DOI: 10.3233/shti210607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
High stress levels among hospital workers could be harmful to both workers and the institution. Enabling the workers to monitor their stress level has many advantages. Knowing their own stress level can help them to stay aware and feel more in control of their response to situations and know when it is time to relax or take some actions to treat it properly. This monitoring task can be enabled by using wearable devices to measure physiological responses related to stress. In this work, we propose a smartwatch sensors based continuous stress detection method using some individual classifiers and classifier ensembles. The experiment results show that all of the classifiers work quite well to detect stress with an accuracy of more than 70%. The results also show that the ensemble method obtained higher accuracy and F1-measure compared to all of the individual classifiers. The best accuracy was obtained by the ensemble with soft voting strategy (ES) with 87.10% while the hard voting strategy (EH) achieved the best F1-measure with 77.45%.
Collapse
Affiliation(s)
| | - Bian Yang
- Norwegian University of Science and Technology, Gjøvik, Norway
| |
Collapse
|
46
|
Pomponi J, Scardapane S, Uncini A. Structured Ensembles: An approach to reduce the memory footprint of ensemble methods. Neural Netw 2021; 144:407-418. [PMID: 34562814 DOI: 10.1016/j.neunet.2021.09.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 07/24/2021] [Accepted: 09/03/2021] [Indexed: 11/20/2022]
Abstract
In this paper, we propose a novel ensembling technique for deep neural networks, which is able to drastically reduce the required memory compared to alternative approaches. In particular, we propose to extract multiple sub-networks from a single, untrained neural network by solving an end-to-end optimization task combining differentiable scaling over the original architecture, with multiple regularization terms favouring the diversity of the ensemble. Since our proposal aims to detect and extract sub-structures, we call it Structured Ensemble. On a large experimental evaluation, we show that our method can achieve higher or comparable accuracy to competing methods while requiring significantly less storage. In addition, we evaluate our ensembles in terms of predictive calibration and uncertainty, showing they compare favourably with the state-of-the-art. Finally, we draw a link with the continual learning literature, and we propose a modification of our framework to handle continuous streams of tasks with a sub-linear memory cost. We compare with a number of alternative strategies to mitigate catastrophic forgetting, highlighting advantages in terms of average accuracy and memory.
Collapse
Affiliation(s)
- Jary Pomponi
- Department of Information Engineering, Electronics and Telecommunications (DIET), Sapienza University of Rome, Italy.
| | - Simone Scardapane
- Department of Information Engineering, Electronics and Telecommunications (DIET), Sapienza University of Rome, Italy
| | - Aurelio Uncini
- Department of Information Engineering, Electronics and Telecommunications (DIET), Sapienza University of Rome, Italy
| |
Collapse
|
47
|
Pathan S, Siddalingaswamy PC, Kumar P, Pai M M M, Ali T, Acharya UR. Novel ensemble of optimized CNN and dynamic selection techniques for accurate Covid-19 screening using chest CT images. Comput Biol Med 2021; 137:104835. [PMID: 34508976 PMCID: PMC8418990 DOI: 10.1016/j.compbiomed.2021.104835] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2021] [Revised: 08/30/2021] [Accepted: 09/01/2021] [Indexed: 01/20/2023]
Abstract
The world is significantly affected by infectious coronavirus disease (covid-19). Timely prognosis and treatment are important to control the spread of this infection. Unreliable screening systems and limited number of clinical facilities are the major hurdles in controlling the spread of covid-19. Nowadays, many automated detection systems based on deep learning techniques using computed tomography (CT) images have been proposed to detect covid-19. However, these systems have the following drawbacks: (i) limited data problem poses a major hindrance to train the deep neural network model to provide accurate diagnosis, (ii) random choice of hyperparameters of Convolutional Neural Network (CNN) significantly affects the classification performance, since the hyperparameters have to be application dependent and, (iii) the generalization ability using CNN classification is usually not validated. To address the aforementioned issues, we propose two models: (i) based on a transfer learning approach, and (ii) using novel strategy to optimize the CNN hyperparameters using Whale optimization-based BAT algorithm + AdaBoost classifier built using dynamic ensemble selection techniques. According to our second method depending on the characteristics of test sample, the classifier is chosen, thereby reducing the risk of overfitting and simultaneously produced promising results. Our proposed methodologies are developed using 746 CT images. Our method obtained a sensitivity, specificity, accuracy, F-1 score, and precision of 0.98, 0.97, 0.98, 0.98, and 0.98, respectively with five-fold cross-validation strategy. Our developed prototype is ready to be tested with huge chest CT images database before its real-world application.
Collapse
Affiliation(s)
- Sameena Pathan
- Department of Information and Communication Technology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India.
| | - P C Siddalingaswamy
- Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India.
| | - Preetham Kumar
- Department of Information and Communication Technology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India.
| | - Manohara Pai M M
- Department of Information and Communication Technology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India.
| | - Tanweer Ali
- Department of Electronics and Communication Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India.
| | - U Rajendra Acharya
- Dept. of Electronics and Computer Engineering, Ngee Ann Polytechnic, Singapore; Ngee Ann Polytechnic, Department of Electronics and Computer Engineering, 599489, Singapore; Department of Biomedical Engineering, School of Science and Technology, SUSS University, Singapore; Department of Biomedical Informatics and Medical Engineering, Asia University, Taichung, Taiwan.
| |
Collapse
|
48
|
Nobashi T, Zacharias C, Ellis JK, Ferri V, Koran ME, Franc BL, Iagaru A, Davidzon GA. Performance Comparison of Individual and Ensemble CNN Models for the Classification of Brain 18F-FDG-PET Scans. J Digit Imaging 2021; 33:447-455. [PMID: 31659587 DOI: 10.1007/s10278-019-00289-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
The high-background glucose metabolism of normal gray matter on [18F]-fluoro-2-D-deoxyglucose (FDG) positron emission tomography (PET) of the brain results in a low signal-to-background ratio, potentially increasing the possibility of missing important findings in patients with intracranial malignancies. To explore the strategy of using a deep learning classifier to aid in distinguishing normal versus abnormal findings on PET brain images, this study evaluated the performance of a two-dimensional convolutional neural network (2D-CNN) to classify FDG PET brain scans as normal (N) or abnormal (A). METHODS Two hundred eighty-nine brain FDG-PET scans (N; n = 150, A; n = 139) resulting in a total of 68,260 images were included. Nine individual 2D-CNN models with three different window settings for axial, coronal, and sagittal axes were trained and validated. The performance of these individual and ensemble models was evaluated and compared using a test dataset. Odds ratio, Akaike's information criterion (AIC), and area under curve (AUC) on receiver-operative-characteristic curve, accuracy, and standard deviation (SD) were calculated. RESULTS An optimal window setting to classify normal and abnormal scans was different for each axis of the individual models. An ensembled model using different axes with an optimized window setting (window-triad) showed better performance than ensembled models using the same axis and different windows settings (axis-triad). Increase in odds ratio and decrease in SD were observed in both axis-triad and window-triad models compared with individual models, whereas improvements of AUC and AIC were seen in window-triad models. An overall model averaging the probabilities of all individual models showed the best accuracy of 82.0%. CONCLUSIONS Data ensemble using different window settings and axes was effective to improve 2D-CNN performance parameters for the classification of brain FDG-PET scans. If prospectively validated with a larger cohort of patients, similar models could provide decision support in a clinical setting.
Collapse
Affiliation(s)
- Tomomi Nobashi
- Division of Nuclear Medicine and Molecular Imaging, Department of Radiology, Stanford University, 300 Pasteur Drive, Office H2228, Stanford, CA, 94305, USA
| | - Claudia Zacharias
- Clinic for Nuclear Medicine, University Hospital Essen, Essen, Germany
| | - Jason K Ellis
- DimensionalMechanics Inc.®, 2821 Northup Way Suite, Bellevue, WA, #200, USA
| | - Valentina Ferri
- Division of Nuclear Medicine and Molecular Imaging, Department of Radiology, Stanford University, 300 Pasteur Drive, Office H2228, Stanford, CA, 94305, USA
| | - Mary Ellen Koran
- Division of Nuclear Medicine and Molecular Imaging, Department of Radiology, Stanford University, 300 Pasteur Drive, Office H2228, Stanford, CA, 94305, USA
| | - Benjamin L Franc
- Division of Nuclear Medicine and Molecular Imaging, Department of Radiology, Stanford University, 300 Pasteur Drive, Office H2228, Stanford, CA, 94305, USA
| | - Andrei Iagaru
- Division of Nuclear Medicine and Molecular Imaging, Department of Radiology, Stanford University, 300 Pasteur Drive, Office H2228, Stanford, CA, 94305, USA
| | - Guido A Davidzon
- Division of Nuclear Medicine and Molecular Imaging, Department of Radiology, Stanford University, 300 Pasteur Drive, Office H2228, Stanford, CA, 94305, USA.
| |
Collapse
|
49
|
Kundu R, Basak H, Singh PK, Ahmadian A, Ferrara M, Sarkar R. Fuzzy rank-based fusion of CNN models using Gompertz function for screening COVID-19 CT-scans. Sci Rep 2021; 11:14133. [PMID: 34238992 PMCID: PMC8266871 DOI: 10.1038/s41598-021-93658-y] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 06/16/2021] [Indexed: 12/22/2022] Open
Abstract
COVID-19 has crippled the world's healthcare systems, setting back the economy and taking the lives of several people. Although potential vaccines are being tested and supplied around the world, it will take a long time to reach every human being, more so with new variants of the virus emerging, enforcing a lockdown-like situation on parts of the world. Thus, there is a dire need for early and accurate detection of COVID-19 to prevent the spread of the disease, even more. The current gold-standard RT-PCR test is only 71% sensitive and is a laborious test to perform, leading to the incapability of conducting the population-wide screening. To this end, in this paper, we propose an automated COVID-19 detection system that uses CT-scan images of the lungs for classifying the same into COVID and Non-COVID cases. The proposed method applies an ensemble strategy that generates fuzzy ranks of the base classification models using the Gompertz function and fuses the decision scores of the base models adaptively to make the final predictions on the test cases. Three transfer learning-based convolutional neural network models are used, namely VGG-11, Wide ResNet-50-2, and Inception v3, to generate the decision scores to be fused by the proposed ensemble model. The framework has been evaluated on two publicly available chest CT scan datasets achieving state-of-the-art performance, justifying the reliability of the model. The relevant source codes related to the present work is available in: GitHub.
Collapse
Affiliation(s)
- Rohit Kundu
- Department of Electrical Engineering, Jadavpur University, Kolkata, 700032, India
| | - Hritam Basak
- Department of Electrical Engineering, Jadavpur University, Kolkata, 700032, India
| | - Pawan Kumar Singh
- Department of Information Technology, Jadavpur University, Kolkata, 700106, India
| | - Ali Ahmadian
- Institute of IR 4.0, The National University of Malaysia (UKM), 43600, Bangi, Selangor, Malaysia.
- Department of Law, Economics and Human Sciences & Decisions Lab, Mediterranea University of Reggio Calabria, 89125, Reggio Calabria, Italy.
| | - Massimiliano Ferrara
- Department of Law, Economics and Human Sciences & Decisions Lab, Mediterranea University of Reggio Calabria, 89125, Reggio Calabria, Italy
| | - Ram Sarkar
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032, India
| |
Collapse
|
50
|
Balogun AO, Adewole KS, Raheem MO, Akande ON, Usman-Hamza FE, Mabayoje MA, Akintola AG, Asaju-Gbolagade AW, Jimoh MK, Jimoh RG, Adeyemo VE. Improving the phishing website detection using empirical analysis of Function Tree and its variants. Heliyon 2021; 7:e07437. [PMID: 34278030 DOI: 10.1016/j.heliyon.2021.e07437] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Revised: 04/30/2021] [Accepted: 06/25/2021] [Indexed: 11/22/2022] Open
Abstract
The phishing attack is one of the most complex threats that have put internet users and legitimate web resource owners at risk. The recent rise in the number of phishing attacks has instilled distrust in legitimate internet users, making them feel less safe even in the presence of powerful antivirus apps. Reports of a rise in financial damages as a result of phishing website attacks have caused grave concern. Several methods, including blacklists and machine learning-based models, have been proposed to combat phishing website attacks. The blacklist anti-phishing method has been faulted for failure to detect new phishing URLs due to its reliance on compiled blacklisted phishing URLs. Many ML methods for detecting phishing websites have been reported with relatively low detection accuracy and high false alarm. Hence, this research proposed a Functional Tree (FT) based meta-learning models for detecting phishing websites. That is, this study investigated improving the phishing website detection using empirical analysis of FT and its variants. The proposed models outperformed baseline classifiers, meta-learners and hybrid models that are used for phishing websites detection in existing studies. Besides, the proposed FT based meta-learners are effective for detecting legitimate and phishing websites with accuracy as high as 98.51% and a false positive rate as low as 0.015. Hence, the deployment and adoption of FT and its meta-learner variants for phishing website detection and applicable cybersecurity attacks are recommended.
Collapse
|