1
|
Buczak P. Frequency-adjusted borders ordinal forest: A novel tree ensemble method for ordinal prediction. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2025; 78:594-616. [PMID: 39648591 PMCID: PMC11971599 DOI: 10.1111/bmsp.12375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Revised: 11/01/2024] [Accepted: 11/09/2024] [Indexed: 12/10/2024]
Abstract
Ordinal responses commonly occur in psychology, e.g., through school grades or rating scales. Where traditionally parametric statistical models like the proportional odds model have been used, machine learning (ML) methods such as random forest (RF) are increasingly employed for ordinal prediction. With new developments in assessment and new data sources yielding increasing quantities of data in the psychological sciences, such ML approaches promise high predictive performance. As RF does not inherently account for ordinality, several extensions have been proposed. A promising approach lies in assigning optimized numeric scores to the ordinal response categories and using regression RF. However, these optimization procedures are computationally expensive and have been shown to yield only situational benefit. In this work, I propose Frequency-Adjusted Borders Ordinal Forest (fabOF), a novel tree ensemble method for ordinal prediction forgoing extensive optimization while offering improved predictive performance in simulation and an illustrative example of student performance. To aid interpretation, I additionally introduce a permutation variable importance measure for fabOF tailored towards ordinal prediction. When applied to the illustrative example, an interest in higher education, mother's education, and study time are identified as important predictors of student performance. The presented methodology is made available through an accompanying R package.
Collapse
Affiliation(s)
- Philip Buczak
- Department of StatisticsTU Dortmund UniversityDortmundGermany
- Research Center Trustworthy Data Science and SecurityUA RuhrDortmundGermany
| |
Collapse
|
2
|
Zadorozhny BS, Petrides KV, Cheng Y, Cuppello S, van der Linden D. Predicting Leadership Status Through Trait Emotional Intelligence and Cognitive Ability. Behav Sci (Basel) 2025; 15:345. [PMID: 40150239 PMCID: PMC11939709 DOI: 10.3390/bs15030345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2024] [Revised: 02/20/2025] [Accepted: 03/06/2025] [Indexed: 03/29/2025] Open
Abstract
Many interconnected factors have been implicated in the prediction of whether a given individual occupies a managerial role. These include an assortment of demographic variables such as age and gender as well as trait emotional intelligence (trait EI) and cognitive ability. In order to disentangle their respective effects on formal leadership position, the present study compares a traditional linear approach in the form of a logistic regression with the results of a set of supervised machine learning (SML) algorithms. In addition to merely extending beyond linear effects, a series of techniques were incorporated so as to practically apply ML approaches and interpret their results, including feature importance and interactions. The results demonstrated the superior predictive strength of trait EI over cognitive ability, especially of its sociability factor, and supported the predictive utility of the random forest (RF) algorithm in this context. We thereby hope to contribute and support a developing trend of acknowledging the genuine complexity of real-world contexts such as leadership and provide direction for future investigations, including more sophisticated ML approaches.
Collapse
Affiliation(s)
| | - K. V. Petrides
- Department of Psychology, University College London (UCL), London WC1E 6BT, UK
| | - Yongtian Cheng
- Department of Psychology, University College London (UCL), London WC1E 6BT, UK
| | | | | |
Collapse
|
3
|
Liu S, Wang N, Gualtieri C, Zhang C, Cao C, Chen J, Chen X, Yaak WB, Yao W. Fish migration modeling and habitat assessment in a complex fluvial system. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2025; 374:124146. [PMID: 39823939 DOI: 10.1016/j.jenvman.2025.124146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Revised: 01/08/2025] [Accepted: 01/12/2025] [Indexed: 01/20/2025]
Abstract
Fish migration patterns are driven by hydrodynamic factors, which are essential in aquatic ecology. This study investigated the hydrodynamic drivers of Gymnocypris przewalskii fish migration in two distinct river reaches-a straight reach (SR) and a confluence reach (CR)- in the area of Qinghai Lake, China, using a 3D numerical model, fish density field data, and four predictive models. Thirteen hydrodynamic factors, with a focus on water depth and velocity, were analyzed to identify their influence on fish migration. It was found that in the SR, linear factors of flow velocity and turbulent kinetic energy were most influential, while in the CR, nonlinear factors of water temperature and vortex intensity dominated. For CR, fish migration patterns are also important nonlinear factors. Methods that accurately reveal fish migration patterns, such as Random Forest, offer higher precision for habitat assessment. Our research also shows that fish swimming ability can, to some extent, reflect migration direction. Combining fish swimming ability with traditional linear habitat assessment methods can improve the adaptability of these methods in complex fluvial system. Based on our research findings, we propose a new workflow for fish habitat assessment that integrates both linear and nonlinear predictive methods. This framework provides valuable insights for enhancing fish conservation strategies in various fluvial systems.
Collapse
Affiliation(s)
- Shikang Liu
- State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu, 610065, China
| | - Nan Wang
- State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu, 610065, China
| | - Carlo Gualtieri
- Department of Civil, Architectural and Environmental Engineering, University of Napoli "Federico II", Naples, Italy
| | - Chendi Zhang
- State Key Laboratory of Hydroscience and Engineering, Tsinghua University, Beijing, China
| | - Chenyang Cao
- State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu, 610065, China
| | - Junguang Chen
- State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu, 610065, China
| | - Xuefeng Chen
- State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu, 610065, China
| | - William Bol Yaak
- State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu, 610065, China; Department of Environmental Studies, University of Juba, Juba P.O. BOX 82, South Sudan
| | - Weiwei Yao
- State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu, 610065, China.
| |
Collapse
|
4
|
Zhou R, Chen J, Cui S, Li L, Qian J, Zhao H, Huang G. A data-driven framework to identify influencing factors for soil heavy metal contaminations using random forest and bivariate local Moran's I: A case study. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2025; 375:124172. [PMID: 39842358 DOI: 10.1016/j.jenvman.2025.124172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Revised: 12/28/2024] [Accepted: 01/16/2025] [Indexed: 01/24/2025]
Abstract
The efficacy of traceability analysis is often limited by a lack of information on influencing factors for heavy metal (HM) contaminations in soil, such as spatial correlations between HM concentrations and influencing factors. To overcome this limitation, a novel data-driven framework was established to identify influencing factors for soil HM concentrations in an industrialised study area, in Guangdong Province, China, mainly using random forest (RF) and bivariate local Moran's I (BLMI) on the basis of the 577 soil samples and the 18 environmental covariates. The quantitative contributions of the 18 influencing factors for the Cd, As, Pb, and Cr concentrations were determined by the optimised RF. The main influencing factors of Cd were petrol stations (10.97%) and railways (9.99%), the main ones of As were groundwater depth (8.45%) and elevation (8.24%), the main ones of Pb were soil pH (8.82%) and hazardous waste disposal sites (8.02%), and the main ones of Cr were mine tailings (13.65%) and rainfall (11.88%). The eight spatial clustering maps between the four HM concentrations and the two key influencing factors were generated by BLMI. The middle part of the study area has shown the higher concentrations of Cd, As, Pb, and Cr, the more complex human activities and the more high-high clusters. Priority attention should be paid to the middle part when taking the specific prevention and control measures for their contaminations. This data-driven framework provided rich information on influencing factors, including HM concentrations, HM contaminations, quantitative contributions, and qualitative spatial clusters.
Collapse
Affiliation(s)
- Rui Zhou
- College of New Energy and Environment, Jilin University, Changchun, 130012, China
| | - Jian Chen
- Chinese Academy of Environmental Planning, Beijing, 100041, China
| | - Shiwen Cui
- College of New Energy and Environment, Jilin University, Changchun, 130012, China; Chinese Academy of Environmental Planning, Beijing, 100041, China
| | - Lu Li
- Chinese Academy of Environmental Planning, Beijing, 100041, China
| | - Jiangbo Qian
- Zhejiang Kehuan Environmental Engineering Technology Corporation Limited, Hangzhou, 311200, China
| | - Hang Zhao
- Chinese Academy of Environmental Planning, Beijing, 100041, China
| | - Guoxin Huang
- Chinese Academy of Environmental Planning, Beijing, 100041, China.
| |
Collapse
|
5
|
Che Y, Zhao M, Gao Y, Zhang Z, Zhang X. Application of machine learning for mass spectrometry-based multi-omics in thyroid diseases. Front Mol Biosci 2024; 11:1483326. [PMID: 39741929 PMCID: PMC11685090 DOI: 10.3389/fmolb.2024.1483326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Accepted: 12/02/2024] [Indexed: 01/03/2025] Open
Abstract
Thyroid diseases, including functional and neoplastic diseases, bring a huge burden to people's health. Therefore, a timely and accurate diagnosis is necessary. Mass spectrometry (MS) based multi-omics has become an effective strategy to reveal the complex biological mechanisms of thyroid diseases. The exponential growth of biomedical data has promoted the applications of machine learning (ML) techniques to address new challenges in biology and clinical research. In this review, we presented the detailed review of applications of ML for MS-based multi-omics in thyroid disease. It is primarily divided into two sections. In the first section, MS-based multi-omics, primarily proteomics and metabolomics, and their applications in clinical diseases are briefly discussed. In the second section, several commonly used unsupervised learning and supervised algorithms, such as principal component analysis, hierarchical clustering, random forest, and support vector machines are addressed, and the integration of ML techniques with MS-based multi-omics data and its application in thyroid disease diagnosis is explored.
Collapse
Affiliation(s)
- Yanan Che
- School of Pharmaceutical Science and Technology, Tianjin University, Tianjin, China
| | - Meng Zhao
- School of Pharmaceutical Science and Technology, Tianjin University, Tianjin, China
| | - Yan Gao
- School of Pharmaceutical Science and Technology, Tianjin University, Tianjin, China
| | - Zhibin Zhang
- Department of General Surgery, Tianjin First Central Hospital, Tianjin, China
| | - Xiangyang Zhang
- School of Pharmaceutical Science and Technology, Tianjin University, Tianjin, China
| |
Collapse
|
6
|
Tiwari NK, Panwar D. Optimising Venturi flume oxygen transfer efficiency using uncertainty-aware decision trees. WATER SCIENCE AND TECHNOLOGY : A JOURNAL OF THE INTERNATIONAL ASSOCIATION ON WATER POLLUTION RESEARCH 2024; 90:3210-3240. [PMID: 39733451 DOI: 10.2166/wst.2024.393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2024]
Abstract
This study optimizes standard oxygen transfer efficiency (SOTE) in Venturi flumes investigating the impact of key parameters such as discharge per unit width (q), throat width (W), throat length (F), upstream entrance width (E), and gauge readings (Ha and Hb). To achieve this, a comprehensive experimental dataset was analyzed using multiple linear regression (MLR), multiple nonlinear regression (MNLR), gradient boosting machine (GBM), extreme gradient boosting (XRT), random forest (RF), M5 (pruned and unpruned), random tree (RT), and reduced error pruning (REP). Model performance was evaluated based on key metrics: correlation coefficient (CC), root mean square error (RMSE), and mean absolute error (MAE). Among the proposed models, M5_Unprun emerged as the top performer, exhibiting the highest CC (0.9455), the lowest RMSE (0.1918), and the lowest MAE (0.0030). GBM followed closely with a CC value of 0.9372, an RMSE value of 0.2067, and an MAE value of 0.0006. Uncertainty analysis further solidified the superior performance of M5_Unpruned (0.7522) and GBM (0.8055), with narrower prediction bands compared to other models, including MLR, which exhibited the widest band (1.4320). One-way analysis of variance confirmed the reliability and robustness of the proposed models. Sensitivity, correlation, and SHapley Additive exPlanations analyses identified W and Hb as the most influencing factors.
Collapse
Affiliation(s)
- Nand Kumar Tiwari
- Department of Civil Engineering, National Institute of Technology Kurukshetra, Haryana 136119, India E-mail:
| | - Dinesh Panwar
- Department of Civil Engineering, National Institute of Technology Kurukshetra, Haryana 136119, India
| |
Collapse
|
7
|
Aslam RW, Naz I, Shu H, Yan J, Quddoos A, Tariq A, Davis JB, Al-Saif AM, Soufan W. Multi-temporal image analysis of wetland dynamics using machine learning algorithms. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 371:123123. [PMID: 39527879 DOI: 10.1016/j.jenvman.2024.123123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2024] [Revised: 10/09/2024] [Accepted: 10/27/2024] [Indexed: 11/16/2024]
Abstract
Wetlands play a crucial role in enhancing groundwater quality, mitigating natural hazards, controlling erosion, and providing essential habitats for unique flora and wildlife. Despite their significance, wetlands are facing decline in various global locations, underscoring the need for effective mapping, monitoring, and predictive modeling approaches. Recent advances in machine learning, time series earth observation data, and cloud computing have opened up new possibilities to address the challenges of large-scale wetlands mapping and dynamics forecasting. This research conducts a comprehensive analysis of wetland dynamics in the Thatta region, encompassing Haleji & Kinjhar Lake in Pakistan, and evaluates the efficacy of different classification systems. Leveraging Google Earth Engine, Landsat imagery, and various spectral indices, we assess four classification techniques to derive accurate wetland mapping results. Our findings demonstrate that Random Forest emerged as the most efficient and accurate method, achieving 87% accuracy across all time periods. Change detection analysis reveals a significant and alarming decline in Haleji & Kinjhar Lake wetlands over 1990-2020, primarily driven by agricultural expansion, urbanization, groundwater extraction, and climate change impacts like rising temperatures and reduced precipitation. If left unaddressed, this continued wetland loss could have severe implications for aquatic and terrestrial species, water and soil quality, wildlife populations, and local livelihoods. The study predicts future wetland dynamics under different scenarios - enhancing drainage for farmland conversion (10-20% increase), increasing urbanization (10-20% expansion), escalating groundwater extraction (7.2m annual decline), and climate change (up to 5 °C warming and 54% precipitation deficit by 2050). These scenarios forecast sustained long-term wetland deterioration driven by anthropogenic pressures and climate change. To guide conservation strategies, the research integrates satellite data analytics, machine learning algorithms, and spatial modeling to generate actionable insights into multifaceted wetland vulnerabilities. Findings provide a robust baseline to inform policies ensuring sustainable management and preservation of these vital ecosystems amidst escalating human and climate threats. Over 1990-2020, the Thatta region witnessed a 352.8 sq.km loss of wetlands, necessitating urgent restoration efforts to safeguard their invaluable ecosystem services.
Collapse
Affiliation(s)
- Rana Waqar Aslam
- State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan University, Wuhan, 430079, China.
| | - Iram Naz
- State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan University, Wuhan, 430079, China
| | - Hong Shu
- State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan University, Wuhan, 430079, China
| | - Jianguo Yan
- State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan University, Wuhan, 430079, China
| | - Abdul Quddoos
- State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan University, Wuhan, 430079, China
| | - Aqil Tariq
- Department of Wildlife, Fisheries and Aquaculture, College of the Forest Resources, Mississippi State University, Starkville, MS, 39762-9690, USA
| | - J Brian Davis
- Department of Wildlife, Fisheries and Aquaculture, College of the Forest Resources, Mississippi State University, Starkville, MS, 39762-9690, USA
| | - Adel M Al-Saif
- Plant Production Department, College of Food and Agriculture Sciences, King Saud University, Riyadh, 11451, Saudi Arabia
| | - Walid Soufan
- Plant Production Department, College of Food and Agriculture Sciences, King Saud University, Riyadh, 11451, Saudi Arabia
| |
Collapse
|
8
|
Liu C, Zhang Y, Liang Y, Zhang T, Wang G. DrugReSC: targeting disease-critical cell subpopulations with single-cell transcriptomic data for drug repurposing in cancer. Brief Bioinform 2024; 25:bbae490. [PMID: 39350337 PMCID: PMC11442150 DOI: 10.1093/bib/bbae490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Revised: 08/25/2024] [Accepted: 09/17/2024] [Indexed: 10/04/2024] Open
Abstract
The field of computational drug repurposing aims to uncover novel therapeutic applications for existing drugs through high-throughput data analysis. However, there is a scarcity of drug repurposing methods leveraging the cellular-level information provided by single-cell RNA sequencing data. To address this need, we propose DrugReSC, an innovative approach to drug repurposing utilizing single-cell RNA sequencing data, intending to target specific cell subpopulations critical to disease pathology. DrugReSC constructs a drug-by-cell matrix representing the transcriptional relationships between individual cells and drugs and utilizes permutation-based methods to assess drug contributions to cellular phenotypic changes. We demonstrate DrugReSC's superior performance compared to existing drug repurposing methods based on bulk or single-cell RNA sequencing data across multiple cancer case studies. In summary, DrugReSC offers a novel perspective on the utilization of single-cell sequencing data in drug repurposing methods, contributing to the advancement of precision medicine for cancer.
Collapse
Affiliation(s)
- Chonghui Liu
- College of Life Science, Northeast Forestry University, 26 Hexing Road, Xiangfang District, Harbin 150040, China
- College of Computer and Control Engineering, Northeast Forestry University, 26 Hexing Road, Xiangfang District, Harbin 150040, China
| | - Yan Zhang
- Kunming Institute of Zoology, Chinese Academy of Sciences, 17 Longxin Road, Panlong District, Kunming 650201, Yunnan, China
- University of Chinese Academy of Sciences, 1 Yanxi Lake East Road, Huairou District, Beijing 100049, China
| | - Yingjian Liang
- Department of General Surgery, the First Affiliated Hospital of Harbin Medical University, 23 Youzheng Street, Nangang District, Harbin 150007, China
| | - Tianjiao Zhang
- College of Computer and Control Engineering, Northeast Forestry University, 26 Hexing Road, Xiangfang District, Harbin 150040, China
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, 26 Hexing Road, Xiangfang District, Harbin 150040, China
| |
Collapse
|
9
|
Fenta HM, Zewotir TT, Naidoo S, Naidoo RN, Mwambi H. Factors of acute respiratory infection among under-five children across sub-Saharan African countries using machine learning approaches. Sci Rep 2024; 14:15801. [PMID: 38982206 PMCID: PMC11233665 DOI: 10.1038/s41598-024-65620-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 06/21/2024] [Indexed: 07/11/2024] Open
Abstract
Symptoms of Acute Respiratory infections (ARIs) among under-five children are a global health challenge. We aimed to train and evaluate ten machine learning (ML) classification approaches in predicting symptoms of ARIs reported by mothers among children younger than 5 years in sub-Saharan African (sSA) countries. We used the most recent (2012-2022) nationally representative Demographic and Health Surveys data of 33 sSA countries. The air pollution covariates such as global annual surface particulate matter (PM 2.5) and the nitrogen dioxide available in the form of raster images were obtained from the National Aeronautics and Space Administration (NASA). The MLA was used for predicting the symptoms of ARIs among under-five children. We randomly split the dataset into two, 80% was used to train the model, and the remaining 20% was used to test the trained model. Model performance was evaluated using sensitivity, specificity, accuracy, and the area under the receiver operating characteristic curve. A total of 327,507 under-five children were included in the study. About 7.10, 4.19, 20.61, and 21.02% of children reported symptoms of ARI, Severe ARI, cough, and fever in the 2 weeks preceding the survey years respectively. The prevalence of ARI was highest in Mozambique (15.3%), Uganda (15.05%), Togo (14.27%), and Namibia (13.65%,), whereas Uganda (40.10%), Burundi (38.18%), Zimbabwe (36.95%), and Namibia (31.2%) had the highest prevalence of cough. The results of the random forest plot revealed that spatial locations (longitude, latitude), particulate matter, land surface temperature, nitrogen dioxide, and the number of cattle in the houses are the most important features in predicting the diagnosis of symptoms of ARIs among under-five children in sSA. The RF algorithm was selected as the best ML model (AUC = 0.77, Accuracy = 0.72) to predict the symptoms of ARIs among children under five. The MLA performed well in predicting the symptoms of ARIs and associated predictors among under-five children across the sSA countries. Random forest MLA was identified as the best classifier to be employed for the prediction of the symptoms of ARI among under-five children.
Collapse
Affiliation(s)
- Haile Mekonnen Fenta
- Discipline of Public Health Medicine, School of Nursing and Public Health College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa.
- Department of Statistics, College of Science, Bahir Dar University, Bahir Dar, Ethiopia.
| | - Temesgen T Zewotir
- School of Mathematics, Statistics and Computer Science, College of Agriculture Engineering and Science, University of KwaZulu-Natal, Durban, South Africa
| | - Saloshni Naidoo
- Discipline of Public Health Medicine, School of Nursing and Public Health College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Rajen N Naidoo
- Discipline of Occupational and Environmental Health, School of Nursing and Public Health, College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Henry Mwambi
- School of Mathematics, Statistics and Computer Science, College of Agriculture Engineering and Science, University of KwaZulu-Natal, Durban, South Africa
| |
Collapse
|
10
|
Zhang S, Han Z, Qi H, Liu S, Liu B, Sun C, Feng Z, Sun M, Duan X. Convolutional Neural Network-Driven Impedance Flow Cytometry for Accurate Bacterial Differentiation. Anal Chem 2024; 96:4419-4429. [PMID: 38448396 DOI: 10.1021/acs.analchem.3c04421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2024]
Abstract
Impedance flow cytometry (IFC) has been demonstrated to be an efficient tool for label-free bacterial investigation to obtain the electrical properties in real time. However, the accurate differentiation of different species of bacteria by IFC technology remains a challenge owing to the insignificant differences in data. Here, we developed a convolutional neural networks (ConvNet) deep learning approach to enhance the accuracy and efficiency of the IFC toward distinguishing various species of bacteria. First, more than 1 million sets of impedance data (comprising 42 characteristic features for each set) of various groups of bacteria were trained by the ConvNet model. To improve the efficiency for data analysis, the Spearman correlation coefficient and the mean decrease accuracy of the random forest algorithm were introduced to eliminate feature interaction and extract the opacity of impedance related to the bacterial wall and membrane structure as the predominant features in bacterial differentiation. Moreover, the 25 optimized features were selected with differentiation accuracies of >96% for three groups of bacteria (bacilli, cocci, and vibrio) and >95% for two species of bacilli (Escherichia coli and Salmonella enteritidis), compared to machine learning algorithms (complex tree, linear discriminant, and K-nearest neighbor algorithms) with a maximum accuracy of 76.4%. Furthermore, bacterial differentiation was achieved on spiked samples of different species with different mixing ratios. The proposed ConvNet deep learning-assisted data analysis method of IFC exhibits advantages in analyzing a huge number of data sets with capacity for extracting predominant features within multicomponent information and will bring about progress and advances in the fields of both biosensing and data analysis.
Collapse
Affiliation(s)
- Shuaihua Zhang
- State Key Laboratory of Precision Measuring Technology & Instruments, College of Precision Instrument and Optoelectronics Engineering, Tianjin University, Tianjin 300072, China
| | - Ziyu Han
- State Key Laboratory of Precision Measuring Technology & Instruments, College of Precision Instrument and Optoelectronics Engineering, Tianjin University, Tianjin 300072, China
| | - Hang Qi
- State Key Laboratory of Precision Measuring Technology & Instruments, College of Precision Instrument and Optoelectronics Engineering, Tianjin University, Tianjin 300072, China
| | - Siyuan Liu
- State Key Laboratory of Precision Measuring Technology & Instruments, College of Precision Instrument and Optoelectronics Engineering, Tianjin University, Tianjin 300072, China
| | - Bohua Liu
- State Key Laboratory of Precision Measuring Technology & Instruments, College of Precision Instrument and Optoelectronics Engineering, Tianjin University, Tianjin 300072, China
| | - Chongling Sun
- State Key Laboratory of Precision Measuring Technology & Instruments, College of Precision Instrument and Optoelectronics Engineering, Tianjin University, Tianjin 300072, China
| | - Zhe Feng
- Wuqing District Center for Disease Control and Prevention, Tianjin 301700, China
| | - Meiqing Sun
- Wuqing District Center for Disease Control and Prevention, Tianjin 301700, China
| | - Xuexin Duan
- State Key Laboratory of Precision Measuring Technology & Instruments, College of Precision Instrument and Optoelectronics Engineering, Tianjin University, Tianjin 300072, China
| |
Collapse
|
11
|
Lawrence S, Mueller BR, Benn EKT, Kim-Schulze S, Kwon P, Robinson-Papp J. Autonomic Neuropathy is Associated with More Densely Interconnected Cytokine Networks in People with HIV. J Neuroimmune Pharmacol 2023; 18:563-572. [PMID: 37923971 PMCID: PMC10997189 DOI: 10.1007/s11481-023-10088-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 10/17/2023] [Indexed: 11/06/2023]
Abstract
The autonomic nervous system (ANS) plays a complex role in the regulation of the immune system, with generally inhibitory effects via activation of β-adrenergic receptors on immune cells. We hypothesized that HIV-associated autonomic neuropathy (HIV-AN) would result in immune hyperresponsiveness which could be depicted using network analyses. Forty-two adults with well-controlled HIV underwent autonomic testing to yield the Composite Autonomic Severity Score (CASS). The observed range of CASS was 2-5, consistent with normal to moderate HIV-AN. To construct the networks, participants were divided into 4 groups based on the CASS (i.e., 2, 3, 4 or 5). Forty-four blood-based immune markers were included as nodes in all networks and the connections (i.e., edges) between pairs of nodes were determined by their bivariate Spearman's Rank Correlation Coefficient. Four centrality measures (strength, closeness, betweenness and expected influence) were calculated for each node in each network. The median value of each centrality measure across all nodes in each network was calculated as a quantitative representation of network complexity. Graphical representation of the four networks revealed greater complexity with increasing HIV-AN severity. This was confirmed by significant differences in the median value of all four centrality measures across the networks (p ≤ 0.025 for each). Among people with HIV, HIV-AN is associated with stronger and more numerous positive correlations between blood-based immune markers. Findings from this secondary analysis can be used to generate hypotheses for future studies investigating HIV-AN as a mechanism contributing to the chronic immune activation observed in HIV.
Collapse
Affiliation(s)
- Steven Lawrence
- Vilcek Institute of Graduate Biomedical Sciences, NYU Grossman School of Medicine, New York, NY, USA
| | - Bridget R Mueller
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Emma K T Benn
- Center for Scientific Diversity, Center for Biostatistics, and Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Seunghee Kim-Schulze
- Human Immune Monitoring Center, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Patrick Kwon
- Department of Neurology, NYU Grossman School of Medicine, New York, NY, USA
| | - Jessica Robinson-Papp
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.
| |
Collapse
|
12
|
Saeloo B, Jitapunkul K, Iamprasertkun P, Panomsuwan G, Sirisaksoontorn W, Sooknoi T, Hirunpinyopas W. Size-Dependent Graphene Support for Decorating Gold Nanoparticles as a Catalyst for Hydrogen Evolution Reaction with Machine Learning-Assisted Prediction. ACS APPLIED MATERIALS & INTERFACES 2023. [PMID: 37919242 DOI: 10.1021/acsami.3c10553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/04/2023]
Abstract
Size-dependent two-dimensional (2D) materials (e.g., graphene) have been recently used to improve their performance in various applications such as membrane filtration, energy storage, and electrocatalysts. It has also been demonstrated that 2D nanosheets can be one of the promising support materials for decorating nanoparticles (NPs). However, the optimum nanosheet size (lateral length and thickness) for supporting NPs has not yet been explored to enhance their catalytic performance. Herein, we elucidate the mechanism behind size-dependent graphene (GP) as a support due to which gold nanoparticles (AuNPs) are used as an active catalyst for the hydrogen evolution reaction (HER). Surprisingly, the decoration of AuNPs increased with the increasing nanosheet size, counter to what is widely reported in the literature (high surface area for smaller nanosheet size). We found that a large graphene nanosheet (lGP; ∼800 nm) used as the AuNP support (lGP/AuNPs) exhibited superior performance for the HER with long-term stability. The lGP/AuNPs with a suitable content of AuNPs provides a low overpotential and a small Tafel slope, being lower than that of other reported carbon-based HER electrocatalysts. This results from highly exposed active sites of well-dispersed AuNPs on lGP giving high conductivity. The laminar structure of the stacked graphene nanosheets and the high wettability of the lGP/AuNPs electrode surface also play crucial roles in enhancing electrolytes for penetration in the electrode, suggesting a highly electrochemical surface area. Moreover, machine learning (Random Forest) was also used to reveal the essential features of the advanced catalytic material design for catalyst-based applications.
Collapse
Affiliation(s)
- Boontarika Saeloo
- Department of Chemistry and Centre of Excellence for Innovation in Chemistry, Faculty of Science, Kasetsart University, Chatuchak, Bangkok 10900, Thailand
| | - Kulpavee Jitapunkul
- School of Bio-Chemical Engineering and Technology Sirindhorn International Institute of Technology (SIIT), Thammasat University - Rangsit Campus, Khlong Nueng, Pathum Thani 12120, Thailand
- Research Unit in Sustainable Electrochemical Intelligent, Thammasat University, Khlong Nueng, Pathum Thani 12120, Thailand
| | - Pawin Iamprasertkun
- School of Bio-Chemical Engineering and Technology Sirindhorn International Institute of Technology (SIIT), Thammasat University - Rangsit Campus, Khlong Nueng, Pathum Thani 12120, Thailand
- Research Unit in Sustainable Electrochemical Intelligent, Thammasat University, Khlong Nueng, Pathum Thani 12120, Thailand
| | - Gasidit Panomsuwan
- Department of Materials Engineering, Faculty of Engineering, Kasetsart University, Chatuchak, Bangkok 10900, Thailand
| | - Weekit Sirisaksoontorn
- Department of Chemistry and Centre of Excellence for Innovation in Chemistry, Faculty of Science, Kasetsart University, Chatuchak, Bangkok 10900, Thailand
| | - Tawan Sooknoi
- Department of Chemistry, School of Science, King Mongkut's Institute of Technology Ladkrabang, Chalongkrung Road, Ladkrabang, Bangkok 10520, Thailand
| | - Wisit Hirunpinyopas
- Department of Chemistry and Centre of Excellence for Innovation in Chemistry, Faculty of Science, Kasetsart University, Chatuchak, Bangkok 10900, Thailand
| |
Collapse
|
13
|
Yalezo N, Musee N. Meta-analysis of engineered nanoparticles dynamic aggregation in freshwater-like systems using machine learning techniques. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2023; 337:117739. [PMID: 36934506 DOI: 10.1016/j.jenvman.2023.117739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Revised: 02/17/2023] [Accepted: 03/12/2023] [Indexed: 06/18/2023]
Abstract
Predictive algorithms for exposure characterization of engineered nanoparticles (ENPs) in the ecosystems are essential to improve the development of robust nano-safety frameworks. Here, machine learning (ML) techniques were utilised for data mining and prediction of the dynamic aggregation transformation process in aqueous environments using case studies of nZnO and nTiO2. Supervised ML models using input variables of natural organic matter, ionic strength, size, and ENPs concentration showed poor prediction performance based on statistical metric values of root mean square error (RMSE), mean absolute error (MAE), coefficient of determination (R2), and Nash-Sutcliffe efficiency (NSE) for both types of ENP. On the contrary, algorithms developed using model input parameters of zeta potential, pH, and time had good generalisation and high prediction accuracy. Among the five developed ML algorithms, random forest regression, support vector regression, and artificial neural network generated good prediction accuracy for both data sets. Therefore, the use of ML can be valuable in the development of robust nano-safety frameworks to optimise societal benefits, and for proactive long-term ecological protection.
Collapse
Affiliation(s)
- Ntsikelelo Yalezo
- Emerging Contaminants Ecological and Risk Assessment (ECERA) Group, Department of Chemical Engineering, University of Pretoria, Private Bag X20, Hatfield 0028, Pretoria, South Africa
| | - Ndeke Musee
- Emerging Contaminants Ecological and Risk Assessment (ECERA) Group, Department of Chemical Engineering, University of Pretoria, Private Bag X20, Hatfield 0028, Pretoria, South Africa.
| |
Collapse
|
14
|
Katambire VN, Musabe R, Uwitonze A, Mukanyiligira D. Battery-Powered RSU Running Time Monitoring and Prediction Using ML Model Based on Received Signal Strength and Data Transmission Frequency in V2I Applications. SENSORS (BASEL, SWITZERLAND) 2023; 23:3536. [PMID: 37050596 PMCID: PMC10099191 DOI: 10.3390/s23073536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 02/28/2023] [Accepted: 03/13/2023] [Indexed: 06/19/2023]
Abstract
The application of the Internet of Things (IoT), vehicles to infrastructure (V2I) communication and intelligent roadside units (RSU) are promising paradigms to improve road traffic safety. However, for the RSUs to communicate with the vehicles and transmit the data to the remote location, RSUs require enough power and good network quality. Recent advances in technology have improved lithium-ion battery capabilities. However, other complementary methodologies including battery management systems (BMS) have to be developed to provide an early warning sign of the battery's state of health. In this paper, we have evaluated the impact of the received signal strength indication (RSSI) and the current consumption at different transmission frequencies on a static battery-based RSU that depends on the global system for mobile communications (GSM)/general packet radio services (GPRS). Machine learning (ML) models, for instance, Random Forest (RF) and Support Vector Machine (SVM), were employed and tested on the collected data and later compared using the coefficient of determination (R2). The models were used to predict the battery current consumption based on the RSSI of the location where the RSUs were imposed and the frequency at which the RSU transmits the data to the remote database. The RF was preferable to SVM for predicting current consumption with an R2 of 98% and 94%, respectively. It is essential to accurately forecast the battery health of RSUs to assess their dependability and running time. The primary duty of the BMS is to estimate the status of the battery and its dynamic operating limits. However, achieving an accurate and robust battery state of charge remains a significant challenge. Referring to that can help road managers make alternative decisions, such as replacing the battery before the RSU power source gets drained. The proposed method can be deployed in other remote WSN and IoT-based applications.
Collapse
Affiliation(s)
- Vienna N. Katambire
- African Center of Excellence in Internet of Things (ACEIoT), College of Science and Technology, University of Rwanda, Kigali P.O. Box 3900, Rwanda
| | | | - Alfred Uwitonze
- African Center of Excellence in Internet of Things (ACEIoT), College of Science and Technology, University of Rwanda, Kigali P.O. Box 3900, Rwanda
| | | |
Collapse
|
15
|
Cheng Y, He C, Hegarty M, Chrastil ER. Who believes they are good navigators? A machine learning pipeline highlights the impact of gender, commuting time, and education. MACHINE LEARNING WITH APPLICATIONS 2022. [DOI: 10.1016/j.mlwa.2022.100419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|
16
|
Fan Y, Gu J, Yin G. Sparse Concordance‐based Ordinal Classification. Scand Stat Theory Appl 2022. [DOI: 10.1111/sjos.12606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Yiwei Fan
- School of Mathematics and Statistics, Beijing Institute of Technology Beijing China
| | - Jiaqi Gu
- Department of Statistics and Actuarial Science The University of Hong Kong Hong Kong, China
| | - Guosheng Yin
- Department of Statistics and Actuarial Science The University of Hong Kong Hong Kong, China
| |
Collapse
|
17
|
Predicting the Rheological Properties of Super-Plasticized Concrete Using Modeling Techniques. MATERIALS 2022; 15:ma15155208. [PMID: 35955143 PMCID: PMC9369977 DOI: 10.3390/ma15155208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Revised: 07/11/2022] [Accepted: 07/12/2022] [Indexed: 12/07/2022]
Abstract
Interface yield stress (YS) and plastic viscosity (PV) have a significant impact on the pumpability of concrete mixes. This study is based on the application of predictive machine learning (PML) techniques to forecast the rheological properties of fresh concrete. The artificial neural network (NN) and random forest (R-F) PML approaches were introduced to anticipate the PV and YS of concrete. In comparison, the R-F model outperforms the NN model by giving the coefficient of determination (R2) values equal to 0.92 and 0.96 for PV and YS, respectively. In contrast, the model’s legitimacy was also verified by applying statistical checks and a k-fold cross validation approach. The mean absolute error, mean square error, and root mean square error values for R-F models by investigating the YS were noted as 30.36 Pa, 1141.76 Pa, and 33.79 Pa, respectively. Similarly, for the PV, these values were noted as 3.52 Pa·s, 16.48 Pa·s, and 4.06 Pa·s, respectively. However, by comparing these values with the NN’s model, they were found to be higher, which also gives confirmation of R-F’s high precision in terms of predicting the outcomes. A validation approach known as k-fold cross validation was also introduced to authenticate the precision of employed models. Moreover, the influence of the input parameters was also investigated with regard to predictions of PV and YS. The proposed study will be beneficial for the researchers and construction industries in terms of saving time, effort, and cost of a project.
Collapse
|
18
|
Extreme Gradient Boosting-Based Machine Learning Approach for Green Building Cost Prediction. SUSTAINABILITY 2022. [DOI: 10.3390/su14116651] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Accurate building construction cost prediction is critical, especially for sustainable projects (i.e., green buildings). Green building construction contracts are relatively new to the construction industry, where stakeholders have limited experience in contract cost estimation. Unlike conventional building construction, green buildings are designed to utilize new technologies to reduce their operations’ environmental and societal impacts. Consequently, green buildings’ construction bidding and awarding processes have become more complicated due to difficulties forecasting the initial construction costs and setting integrated selection criteria for the winning bidders. Thus, robust green building cost prediction modeling is essential to provide stakeholders with an initial construction cost benchmark to enhance decision-making. The current study presents machine learning-based algorithms, including extreme gradient boosting (XGBOOST), deep neural network (DNN), and random forest (RF), to predict green building costs. The proposed models are designed to consider the influence of soft and hard cost-related attributes. Evaluation metrics (i.e., MAE, MSE, MAPE, and R2) are applied to evaluate and compare the developed algorithms’ accuracy. XGBOOST provided the highest accuracy of 0.96 compared to 0.91 for the DNN, followed by RF with an accuracy of 0.87. The proposed machine learning models can be utilized as a decision support tool for construction project managers and practitioners to advance automation as a coherent field of research within the green construction industry.
Collapse
|
19
|
An exploratory analysis of forme fruste keratoconus sensitivity diagnostic parameters. Int Ophthalmol 2022; 42:2473-2481. [PMID: 35247116 DOI: 10.1007/s10792-022-02246-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Accepted: 02/10/2022] [Indexed: 10/18/2022]
Abstract
PURPOSE To secondary statistical analysis of the Pentacam or Corvis ST parameters from literatures, and to obtain more sensitive diagnostic parameters for clinical keratoconus (CKC) and forme fruste keratoconus (FFKC), respectively. METHODS The parameters and the corresponding area of ROC curve (AUC) in previous studies were extracted and screened to obtain the database of CKC (Data-CKC) and FFKC (Data-FFKC), respectively. Two different importance evaluation methods (%IncMSE and IncNodePurity) of random forest were used to preliminary select the important parameters. Then, based on the partial dependency analysis, the sensitive diagnostic parameters that had promotion to the diagnostic performance were obtained. Data-FFKC was analyzed in the same way. Finally, a diagnostic test meta-analysis on the sensitive parameter of interest was conducted to verify the reliability of the above analysis methods. RESULTS There were 88 parameters with 766 records in Data-CKC, 57 parameters with 346 records in Data-FFKC. Based on two importance evaluation methods, 60 important parameters were obtained, of which 20 were further screened as sensitive parameters of keratoconus, and most of these parameters were related to the thinnest point of cornea. The stiffness parameter at first applanation (SPA1) was the only Corvis ST output parameter sensitive to FFKC except the Tomographic and Biomechanical Index and the Corvis Biomechanical Parameter (CBI). A total of 4 records were included in the meta-analysis of diagnostic tests on SPA1. The results showed that there was threshold effect, but no significant heterogeneity (I2 = 33%), and the area under the SROC curve was 0.87 (95% CI, 0.84-0.90). CONCLUSIONS For the diagnosis of FFKC, the sensitivity of SPA1 is not inferior to the well-known CBI, and may be the earliest Corvis ST output parameter to reflect the changes of corneal biomechanics during keratoconus progression. The elevation parameters based on the typical position of the thinnest point of corneal thickness are of great significance for the diagnosis of keratoconus.
Collapse
|
20
|
Fenta HM, Zewotir T, Muluneh EK. A machine learning classifier approach for identifying the determinants of under-five child undernutrition in Ethiopian administrative zones. BMC Med Inform Decis Mak 2021; 21:291. [PMID: 34689769 PMCID: PMC8542294 DOI: 10.1186/s12911-021-01652-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Accepted: 10/04/2021] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Undernutrition is the main cause of child death in developing countries. This paper aimed to explore the efficacy of machine learning (ML) approaches in predicting under-five undernutrition in Ethiopian administrative zones and to identify the most important predictors. METHOD The study employed ML techniques using retrospective cross-sectional survey data from Ethiopia, a national-representative data collected in the year (2000, 2005, 2011, and 2016). We explored six commonly used ML algorithms; Logistic regression, Least Absolute Shrinkage and Selection Operator (L-1 regularization logistic regression), L-2 regularization (Ridge), Elastic net, neural network, and random forest (RF). Sensitivity, specificity, accuracy, and area under the curve were used to evaluate the performance of those models. RESULTS Based on different performance evaluations, the RF algorithm was selected as the best ML model. In the order of importance; urban-rural settlement, literacy rate of parents, and place of residence were the major determinants of disparities of nutritional status for under-five children among Ethiopian administrative zones. CONCLUSION Our results showed that the considered machine learning classification algorithms can effectively predict the under-five undernutrition status in Ethiopian administrative zones. Persistent under-five undernutrition status was found in the northern part of Ethiopia. The identification of such high-risk zones could provide useful information to decision-makers trying to reduce child undernutrition.
Collapse
Affiliation(s)
- Haile Mekonnen Fenta
- Department of Statistics, College of Science, Bahir Dar University, Bahir Dar, Ethiopia
| | - Temesgen Zewotir
- School of Mathematics, Statistics and Computer Science, College of Agriculture Engineering and Science, University of KwaZulu-Natal, Durban, South Africa
| | - Essey Kebede Muluneh
- School of Public Health, College of Medicine and Health Sciences, Bahir Dar University, Bahir Dar, Ethiopia
| |
Collapse
|
21
|
Sentinel-1 and 2 Time-Series for Vegetation Mapping Using Random Forest Classification: A Case Study of Northern Croatia. REMOTE SENSING 2021. [DOI: 10.3390/rs13122321] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Land-cover (LC) mapping in a morphologically heterogeneous landscape area is a challenging task since various LC classes (e.g., crop types in agricultural areas) are spectrally similar. Most research is still mostly relying on optical satellite imagery for these tasks, whereas synthetic aperture radar (SAR) imagery is often neglected. Therefore, this research assessed the classification accuracy using the recent Sentinel-1 (S1) SAR and Sentinel-2 (S2) time-series data for LC mapping, especially vegetation classes. Additionally, ancillary data, such as texture features, spectral indices from S1 and S2, respectively, as well as digital elevation model (DEM), were used in different classification scenarios. Random Forest (RF) was used for classification tasks using a proposed hybrid reference dataset derived from European Land Use and Coverage Area Frame Survey (LUCAS), CORINE, and Land Parcel Identification Systems (LPIS) LC database. Based on the RF variable selection using Mean Decrease Accuracy (MDA), the combination of S1 and S2 data yielded the highest overall accuracy (OA) of 91.78%, with a total disagreement of 8.22%. The most pertinent features for vegetation mapping were GLCM Mean and Variance for S1, NDVI, along with Red and SWIR band for S2, whereas the digital elevation model produced major classification enhancement as an input feature. The results of this study demonstrated that the aforementioned approach (i.e., RF using a hybrid reference dataset) is well-suited for vegetation mapping using Sentinel imagery, which can be applied for large-scale LC classifications.
Collapse
|
22
|
Mapping the Extent of Mangrove Ecosystem Degradation by Integrating an Ecological Conceptual Model with Satellite Data. REMOTE SENSING 2021. [DOI: 10.3390/rs13112047] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Anthropogenic and natural disturbances can cause degradation of ecosystems, reducing their capacity to sustain biodiversity and provide ecosystem services. Understanding the extent of ecosystem degradation is critical for estimating risks to ecosystems, yet there are few existing methods to map degradation at the ecosystem scale and none using freely available satellite data for mangrove ecosystems. In this study, we developed a quantitative classification model of mangrove ecosystem degradation using freely available earth observation data. Crucially, a conceptual model of mangrove ecosystem degradation was established to identify suitable remote sensing variables that support the quantitative classification model, bridging the gap between satellite-derived variables and ecosystem degradation with explicit ecological links. We applied our degradation model to two case-studies, the mangroves of Rakhine State, Myanmar, which are severely threatened by anthropogenic disturbances, and Shark River within the Everglades National Park, USA, which is periodically disturbed by severe tropical storms. Our model suggested that 40% (597 km2) of the extent of mangroves in Rakhine showed evidence of degradation. In the Everglades, the model suggested that the extent of degraded mangrove forest increased from 5.1% to 97.4% following the Category 4 Hurricane Irma in 2017. Quantitative accuracy assessments indicated the model achieved overall accuracies of 77.6% and 79.1% for the Rakhine and the Everglades, respectively. We highlight that using an ecological conceptual model as the basis for building quantitative classification models to estimate the extent of ecosystem degradation ensures the ecological relevance of the classification models. Our developed method enables researchers to move beyond only mapping ecosystem distribution to condition and degradation as well. These results can help support ecosystem risk assessments, natural capital accounting, and restoration planning and provide quantitative estimates of ecosystem degradation for new global biodiversity targets.
Collapse
|
23
|
Jahandideh S, Jahandideh M, Barzegari E. Individuals' Intention to Engage in Outpatient Cardiac Rehabilitation Programs: Prediction Based on an Enhanced Model. J Clin Psychol Med Settings 2021; 28:798-807. [PMID: 33723685 DOI: 10.1007/s10880-021-09771-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/01/2021] [Indexed: 11/29/2022]
Abstract
Motivation is an important factor in encouraging individuals to attend rehabilitation and underpins many approaches to engagement. The aims of this study were to develop an accurate model able to predict individual intention to engage in outpatient cardiac rehabilitation (CR) programs based on the first stage of the Model of Therapeutic Engagement integrated into a socio-environmental context. The cross-sectional study in the cardiology ward of an Australian hospital included a total of 217 individuals referred to outpatient CR. Through an ordinal logistic regression, the effect of random forest (RF)-selected profile features on individual intention to engage in outpatient CR was explored. The RF based on the conditional inference trees predicted the intention to engage in outpatient CR with high accuracy. The findings highlighted the significant roles of individuals' 'willingness to consider the treatment', 'perceived self-efficacy' and 'perceived need for rehabilitation' in their intention, while the involvement of 'barriers to engagement' and 'demographic and medical factors' was not evident.
Collapse
Affiliation(s)
- Sepideh Jahandideh
- School of Human Services and Social Work, Menzies Health Institute Queensland, Griffith University, Gold Coast Campus, Queensland, Australia
| | - Mina Jahandideh
- Department of Mathematics, Faculty of Science, Zanjan University, Zanjan, Iran
| | - Ebrahim Barzegari
- Medical Biology Research Center, Health Technology Institute, Kermanshah University of Medical Sciences, P.O. Box: 67155-1616, Zakariya Razi Blvd., Kermanshah, Iran.
| |
Collapse
|
24
|
Ruan F, Hou L, Zhang T, Li H. A novel hybrid filter/wrapper method for feature selection in archaeological ceramics classification by laser-induced breakdown spectroscopy. Analyst 2021; 146:1023-1031. [PMID: 33300506 DOI: 10.1039/d0an02045a] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Laser-induced breakdown spectroscopy (LIBS) has been appreciated as a valuable analytical tool in the cultural heritage field owing to its unique technological superiority, particularly in combination with chemometric methods. Feature selection (FS) as an indispensable pre-processing step in data optimization, for eliminating the redundant or irrelevant features from high-dimensional data to enhance the predictive capacity and result comprehensibility of multivariate classification based on LIBS technology. In this paper, a novel hybrid filter/wrapper method based on the MI-DBS algorithm was proposed to enhance the qualitative analysis performance of the LIBS technique. The proposed method combines the advantages of the mutual information (MI) algorithm based filter method and bi-directional selection (DBS) algorithm based wrapper method. The MI algorithm is the first to remove the redundant or uncorrelated features so that a simplified input subset can be established. Then, the DBS algorithm is used to further select the retained features and hence to seek an optimal feature subset with good predictive performance. To benefit the above feature selection process, the wavelet transform denoising (WTD) method was used to reduce the noise from LIBS spectra. LIBS experiments were performed using 35 archaeological ceramic samples. Besides, the proposed hybrid filter/wrapper method was implemented through a random forest (RF) based nonlinear multivariate classification method. Through a comparison between several other feature selection methods and the proposed method, it has been seen that the proposed method is the best regarding the predictive performance and number of the selected features. Finally, the MI-DBS algorithm is used to seek the optimal features from the full spectrum (220-720 nm); the corresponding sensitivity, specificity and accuracy acquired through the RF classifier for the test set were 0.9722, 0.9956 and 0.9850. It is shown from the general results that the MI-DBS algorithm is more effective in terms of improving the model performance and decreasing the redundant or uncorrelated features and computational time and serves as a good alternative for FS in multivariate classification.
Collapse
Affiliation(s)
- Fangqi Ruan
- Key Laboratory of Synthetic and Natural Functional Molecular Chemistry of Ministry of Education, College of Chemistry & Material Science, Northwest University, Xi'an, China.
| | | | | | | |
Collapse
|
25
|
Erişkin L. Preference modelling in sorting problems: Multiple criteria decision aid and statistical learning perspectives. JOURNAL OF MULTI-CRITERIA DECISION ANALYSIS 2021. [DOI: 10.1002/mcda.1737] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Affiliation(s)
- Levent Erişkin
- Industrial Engineering Department Naval Academy, National Defense University Tuzla Istanbul Turkey
| |
Collapse
|
26
|
Beattie M, Nicholson C. Feature Extraction for Heroin-Use Classification Using Imbalanced Random Forest Methods. Subst Use Misuse 2021; 56:123-130. [PMID: 33183142 DOI: 10.1080/10826084.2020.1843058] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
The National Survey on Drug Use and Health (NSDUH) contains a large number of responses and many features. This study aims to identify features from within NSDUH that are important in classifying heroin use. Proper implementation of random forest (RF) techniques copes with the highly imbalanced nature of heroin usage among respondents to identify features that are prominent in classification models involving nonlinear combinations of predictive variables. To date, methods for the proper application of RF to imbalanced medical datasets have not been defined. Methods: Three different RF classification techniques are applied to the 2016 NSDUH. The techniques are compared using scoring criteria, including area under the precision recall curve (AUPRC), to identify the best model. Variable importance scores (VIS) are checked for stability across the three models and the VIS from the best model are used to highlight features and categories of features that most influence the classification of heroin users. Findings: The best performing method was RF with random oversampling (AUPRC = 0.5437). The category of features regarding other drug use was most important (average z-scored VIS = 1.66) followed by age-of-first-use features (0.32). The most important individual feature was cocaine usage (z-scored VIS = 11.05), followed by crack usage (6.51). The most important individual feature other than specific drug use flags was the use of marijuana under the age of 18 (3.11). This study demonstrates a method for the use of RF in feature extraction from imbalanced medical datasets with many predictors.
Collapse
Affiliation(s)
- Matthew Beattie
- Data Science and Analytics, University of Oklahoma, Norman, Oklahoma, USA
| | - Charles Nicholson
- Data Science and Analytics, University of Oklahoma, Norman, Oklahoma, USA
| |
Collapse
|
27
|
Gribkova N, Zitikis R. Functional Correlations in the Pursuit of Performance Assessment of Classifiers. INT J PATTERN RECOGN 2020. [DOI: 10.1142/s0218001420510131] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In statistical classification and machine learning, as well as in social and other sciences, a number of measures of association have been proposed for assessing and comparing individual classifiers, raters, as well as their groups. In this paper, we introduce, justify, and explore several new measures of association, which we call CO-, ANTI-, and COANTI-correlation coefficients, that we demonstrate to be powerful tools for classifying confusion matrices. We illustrate the performance of these new coefficients using a number of examples, from which we also conclude that the coefficients are new objects in the sense that they differ from those already in the literature.
Collapse
Affiliation(s)
- Nadezhda Gribkova
- Faculty of Mathematics and Mechanics, St. Petersburg State University, St. Petersburg 199034, Russia
| | - Ričardas Zitikis
- School of Mathematical and Statistical Sciences, Western University, London, ON, Canada N6A 5B7, Canada
| |
Collapse
|
28
|
Götz FM, Stieger S, Gosling SD, Potter J, Rentfrow PJ. Physical topography is associated with human personality. Nat Hum Behav 2020; 4:1135-1144. [PMID: 32895542 DOI: 10.1038/s41562-020-0930-x] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Accepted: 07/20/2020] [Indexed: 01/14/2023]
Abstract
Regional differences in personality are associated with a range of consequential outcomes. But which factors are responsible for these differences? Frontier settlement theory suggests that physical topography is a crucial factor shaping the psychological landscape of regions. Hence, we investigated whether topography is associated with regional variation in personality across the United States (n = 3,387,014). Consistent with frontier settlement theory, results from multilevel modelling revealed that mountainous areas were lower on agreeableness, extraversion, neuroticism and conscientiousness but higher on openness to experience. Conditional random forest algorithms confirmed mountainousness as a meaningful predictor of personality when tested against a conservative set of controls. East-west comparisons highlighted potential differences between ecological (driven by physical features) and sociocultural (driven by social norms) effects of mountainous terrain.
Collapse
Affiliation(s)
- Friedrich M Götz
- Department of Psychology, University of Cambridge, Cambridge, UK.
| | - Stefan Stieger
- Department of Psychology and Psychodynamics, Karl Landsteiner University of Health Sciences, Krems an der Donau, Austria
| | - Samuel D Gosling
- Department of Psychology, University of Texas at Austin, Austin, TX, USA.,Melbourne School of Psychological Sciences, University of Melbourne, Parkville, Victoria, Australia
| | | | - Peter J Rentfrow
- Department of Psychology, University of Cambridge, Cambridge, UK.
| |
Collapse
|
29
|
Wei G, Zhao J, Feng Y, He A, Yu J. A novel hybrid feature selection method based on dynamic feature importance. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106337] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
30
|
Use of UAS Multispectral Imagery at Different Physiological Stages for Yield Prediction and Input Resource Optimization in Corn. REMOTE SENSING 2020. [DOI: 10.3390/rs12152392] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Changes in spatial and temporal variability in yield estimation are detectable through plant biophysical characteristics observed at different phenological development stages of corn. A multispectral red-edge sensor mounted on an Unmanned Aerial Systems (UAS) can provide spatial and temporal information with high resolution. Spectral analysis of UAS acquired spatiotemporal images can be used to develop a statistical model to predict yield based on different phenological stages. Identifying critical vegetation indices (VIs) and significant spectral information could lead to increased yield prediction accuracy. The objective of this study was to develop a yield prediction model at specific phenological stages using spectral data obtained from a corn field. The available spectral bands (red, blue, green, near infrared (NIR), and red-edge) were used to analyze 26 different VIs. The spectral information was collected from a cornfield at Mississippi State University using a MicaSense multispectral red-edge sensor, mounted on a UAS. In this research, a new empirical method used to reduce the effects of bare soil pixels in acquired images was introduced. The experimental design was a randomized complete block that consisted of 16 blocks with 12 rows of corn planted in each block. Four treatments of nitrogen (N) including 0, 90, 180, and 270 kg/ha were applied randomly. Random forest was utilized as a feature selection method to choose the best combination of variables for different stages. Multiple linear regression and gradient boosting decision trees were used to develop yield prediction models for each specific phenological stage by utilizing the most effective variables at each stage. At the V3 (3 leaves with visible leaf collar) and V4-5 (4-5 leaves with visible leaf collar) stages, the Optimized Soil Adjusted Vegetation Index (OSAVI) and Simplified Canopy Chlorophyll Content Index (SCCCI) were the single dominant variables in the yield predicting models, respectively. A combination of the Green Atmospherically Resistant Index (GARI), Normalized Difference Red-Edge (NDRE), and green Normalized Difference Vegetation Index (GNDVI) at V6-7, SCCCI, and Soil-Adjusted Vegetation Index (SAVI) at V10,11, and SCCCI, Green Leaf Index (GLI), and Visible Atmospherically Resistant Index (VARIgreen) at tasseling stage (VT) were the best indices for predicting grain yield of corn. The prediction models at V10 and VT had the greatest accuracy with a coefficient of determination of 0.90 and 0.93, respectively. Moreover, the SCCCI as a combined index seemed to be the most proper index for predicting yield at most of the phenological stages. As corn development progressed, the models predicted final grain yield more accurately.
Collapse
|
31
|
Larkin T, McManus D. An analytical toast to wine: Using stacked generalization to predict wine preference. Stat Anal Data Min 2020. [DOI: 10.1002/sam.11474] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Affiliation(s)
| | - Denise McManus
- Information Systems, Statistics and Management Science DepartmentThe University of Alabama Tuscaloosa Alabama USA
| |
Collapse
|
32
|
|
33
|
Faerman A, Kaplan KA, Zeitzer JM. Subjective sleep quality is poorly associated with actigraphy and heart rate measures in community-dwelling older men. Sleep Med 2020; 73:154-161. [PMID: 32836083 DOI: 10.1016/j.sleep.2020.04.012] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/06/2020] [Revised: 03/26/2020] [Accepted: 04/11/2020] [Indexed: 01/19/2023]
Abstract
OBJECTIVES There has been a proliferation in the use of commercially-available accelerometry- and heart rate-based wearable devices to monitor sleep. While the underlying technology is reasonable at detecting sleep quantity, the ability of these devices to predict subjective sleep quality is currently unknown. We tested whether the fundamental signals from such devices are useful in determining subjective sleep quality. METHODS Older, community-dwelling men (76.5 ± 5.77 years) enrolled in the Osteoporotic Fractures in Men Study (MrOS) participated in an overnight sleep study during which sleep was monitored with actigraphy (wrist-worn accelerometry) and polysomnography (PSG), including electrocardiography (N = 1141). Subjective sleep quality was determined the next morning using 5-point Likert-type scales of sleep depth and restfulness. Lasso and random forest regression models analyzed the relationship between actigraph-determined sleep variables, the shape of the activity patterns during sleep (functional principal component analysis), average heart rate, heart rate variability (HRV), demographics, and self-reported depression, anxiety, habitual sleep, and daytime sleepiness measures. RESULTS Actigraphy data, in combination with heart rate, HRV, demographic, and psychological variables, do not predict well subjective sleep quality (R2 = 0.025 to 0.162). CONCLUSIONS Findings are consistent with previous studies that objective sleep measures are not well correlated with subjective sleep quality. Developing validated biomarkers of subjective sleep quality could improve both existing and novel treatment modalities and advance sleep medicine towards precision healthcare standards.
Collapse
Affiliation(s)
- Afik Faerman
- Department of Psychology, Palo Alto University, Palo Alto, CA, USA; Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA
| | - Katherine A Kaplan
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA
| | - Jamie M Zeitzer
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA; Mental Illness Research, Education, and Clinical Center, VA Palo Alto Health Care System, Palo Alto, CA, USA.
| |
Collapse
|
34
|
Prediction of Metabolic Syndrome in a Mexican Population Applying Machine Learning Algorithms. Symmetry (Basel) 2020. [DOI: 10.3390/sym12040581] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Metabolic syndrome is a health condition that increases the risk of heart diseases, diabetes, and stroke. The prognostic variables that identify this syndrome have already been defined by the World Health Organization (WHO), the National Cholesterol Education Program Third Adult Treatment Panel (ATP III) as well as by the International Diabetes Federation. According to these guides, there is some symmetry among anthropometric prognostic variables to classify abdominal obesity in people with metabolic syndrome. However, some appear to be more sensitive than others, nevertheless, these proposed definitions have failed to appropriately classify a specific population or ethnic group. In this work, we used the ATP III criteria as the framework with the purpose to rank the health parameters (clinical and anthropometric measurements, lifestyle data, and blood tests) from a data set of 2942 participants of Mexico City Tlalpan 2020 cohort, applying machine learning algorithms. We aimed to find the most appropriate prognostic variables to classify Mexicans with metabolic syndrome. The criteria of sensitivity, specificity, and balanced accuracy were used for validation. The ATP III using Waist-to-Height-Ratio (WHtR) as an anthropometric index for the diagnosis of abdominal obesity achieved better performance in classification than waist or body mass index. Further work is needed to assess its precision as a classification tool for Metabolic Syndrome in a Mexican population.
Collapse
|
35
|
Fu H, Archer KJ. High-dimensional variable selection for ordinal outcomes with error control. Brief Bioinform 2020; 22:334-345. [PMID: 32031572 DOI: 10.1093/bib/bbaa007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 01/06/2020] [Indexed: 12/24/2022] Open
Abstract
Many high-throughput genomic applications involve a large set of potential covariates and a response which is frequently measured on an ordinal scale, and it is crucial to identify which variables are truly associated with the response. Effectively controlling the false discovery rate (FDR) without sacrificing power has been a major challenge in variable selection research. This study reviews two existing variable selection frameworks, model-X knockoffs and a modified version of reference distribution variable selection (RDVS), both of which utilize artificial variables as benchmarks for decision making. Model-X knockoffs constructs a 'knockoff' variable for each covariate to mimic the covariance structure, while RDVS generates only one null variable and forms a reference distribution by performing multiple runs of model fitting. Herein, we describe how different importance measures for ordinal responses can be constructed that fit into these two selection frameworks, using either penalized regression or machine learning techniques. We compared these measures in terms of the FDR and power using simulated data. Moreover, we applied these two frameworks to high-throughput methylation data for identifying features associated with the progression from normal liver tissue to hepatocellular carcinoma to further compare and contrast their performances.
Collapse
|
36
|
Moore A, Cox-Martin M, Dempsey AF, Berenbaum Szanton K, Binswanger IA. HPV Vaccination in Correctional Care: Knowledge, Attitudes, and Barriers Among Incarcerated Women. JOURNAL OF CORRECTIONAL HEALTH CARE 2019; 25:219-230. [PMID: 31242811 DOI: 10.1177/1078345819853286] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Incarcerated women are at increased risk of developing cervical cancer and have high rates of human papillomavirus (HPV) infection, an important cause of cervical cancer. However, many correctional facilities do not offer HPV vaccination to female inmates. This pilot survey study, conducted with incarcerated women aged 18 to 26 at a facility that does not offer the vaccine, assessed attitudes and knowledge about HPV and the HPV vaccine, acceptability of and barriers to in-prison HPV vaccination, and self-reported HPV vaccination rates. Most participants reported that they had not received the HPV vaccine but had positive attitudes toward it and would be willing to get it in prison. Correctional facilities should consider offering this preventive service to this vulnerable population.
Collapse
Affiliation(s)
- Alia Moore
- 1 Los Angeles County Department of Health Services, Correctional Health Services, Los Angeles, CA, USA.,2 Division of General Internal Medicine, University of Colorado School of Medicine, Aurora, CO, USA
| | - Matthew Cox-Martin
- 3 Adult and Child Consortium for Health Outcomes Research and Delivery Science, University of Colorado, Aurora, CO, USA
| | - Amanda F Dempsey
- 4 Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | | | - Ingrid A Binswanger
- 2 Division of General Internal Medicine, University of Colorado School of Medicine, Aurora, CO, USA.,6 Institute for Health Research, Kaiser Permanente Colorado, Aurora, CO, USA
| |
Collapse
|
37
|
Machine Learning to Identify Dialysis Patients at High Death Risk. Kidney Int Rep 2019; 4:1219-1229. [PMID: 31517141 PMCID: PMC6732773 DOI: 10.1016/j.ekir.2019.06.009] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Revised: 04/30/2019] [Accepted: 06/10/2019] [Indexed: 12/17/2022] Open
Abstract
Introduction Given the high mortality rate within the first year of dialysis initiation, an accurate estimation of postdialysis mortality could help patients and clinicians in decision making about initiation of dialysis. We aimed to use machine learning (ML) by incorporating complex information from electronic health records to predict patients at risk for postdialysis short-term mortality. Methods This study was carried out on a contemporary cohort of 27,615 US veterans with incident end-stage renal disease (ESRD). We implemented a random forest method on 49 variables obtained before dialysis transition to predict outcomes of 30-, 90-, 180-, and 365-day all-cause mortality after dialysis initiation. Results The mean (±SD) age of our cohort was 68.7 ± 11.2 years, 98.1% of patients were men, 29.4% were African American, and 71.4% were diabetic. The final random forest model provided C-statistics (95% confidence intervals) of 0.7185 (0.6994–0.7377), 0.7446 (0.7346–0.7546), 0.7504 (0.7425–0.7583), and 0.7488 (0.7421–0.7554) for predicting risk of death within the 4 different time windows. The models showed good internal validity and replicated well in patients with various demographic and clinical characteristics and provided similar or better performance compared with other ML algorithms. Results may not be generalizable to non-veterans. Use of predictors available in electronic medical records has limited the assessment of number of predictors. Conclusion We implemented and ML-based method to accurately predict short-term postdialysis mortality in patients with incident ESRD. Our models could aid patients and clinicians in better decision making about the best course of action in patients approaching ESRD.
Collapse
|
38
|
A Brief Review of Random Forests for Water Scientists and Practitioners and Their Recent History in Water Resources. WATER 2019. [DOI: 10.3390/w11050910] [Citation(s) in RCA: 102] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Random forests (RF) is a supervised machine learning algorithm, which has recently started to gain prominence in water resources applications. However, existing applications are generally restricted to the implementation of Breiman’s original algorithm for regression and classification problems, while numerous developments could be also useful in solving diverse practical problems in the water sector. Here we popularize RF and their variants for the practicing water scientist, and discuss related concepts and techniques, which have received less attention from the water science and hydrologic communities. In doing so, we review RF applications in water resources, highlight the potential of the original algorithm and its variants, and assess the degree of RF exploitation in a diverse range of applications. Relevant implementations of random forests, as well as related concepts and techniques in the R programming language, are also covered.
Collapse
|
39
|
Richter R, Gabriel D, Rist F, Töpfer R, Zyprian E. Identification of co-located QTLs and genomic regions affecting grapevine cluster architecture. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2019; 132:1159-1177. [PMID: 30569367 DOI: 10.1007/s00122-018-3269-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Accepted: 12/12/2018] [Indexed: 05/18/2023]
Abstract
Loose cluster architecture is an important aim in grapevine breeding since it has high impact on the phytosanitary status of grapes. This investigation analyzed the contributions of individual cluster sub-traits to the overall trait of cluster architecture. Six sub-traits showed large impact on cluster architecture as major determinants. They explained 57% of the OIV204 descriptor for cluster compactness rating in a highly diverse cross-population of 149 genotypes. Genetic analysis revealed several genomic regions involved in the expression of this trait. Based on the linkage of phenotypic features to molecular markers, QTL calculations shed new light on the genetic determinants of cluster architecture. Eight QTL clusters harbor overlapping confidence intervals of up to four co-located QTLs. A physical projection of the QTL clusters by confidence interval-flanking markers onto the PN40024 reference genome sequence revealed genes enriched in these regions.
Collapse
Affiliation(s)
- Robert Richter
- Institute for Grapevine Breeding Geilweilerhof, Julius Kuehn Institute, Federal Research Centre of Cultivated Plants, Geilweilerhof, 76833, Siebeldingen, Germany
| | - Doreen Gabriel
- Institute for Crop and Soil Science, Julius Kuehn Institute, Federal Research Centre of Cultivated Plants, Bundesallee 58, 38116, Brunswick, Germany
| | - Florian Rist
- Institute for Grapevine Breeding Geilweilerhof, Julius Kuehn Institute, Federal Research Centre of Cultivated Plants, Geilweilerhof, 76833, Siebeldingen, Germany
| | - Reinhard Töpfer
- Institute for Grapevine Breeding Geilweilerhof, Julius Kuehn Institute, Federal Research Centre of Cultivated Plants, Geilweilerhof, 76833, Siebeldingen, Germany
| | - Eva Zyprian
- Institute for Grapevine Breeding Geilweilerhof, Julius Kuehn Institute, Federal Research Centre of Cultivated Plants, Geilweilerhof, 76833, Siebeldingen, Germany.
| |
Collapse
|
40
|
Buechley ER, Santangeli A, Girardello M, Neate‐Clegg MH, Oleyar D, McClure CJ, Şekercioğlu ÇH. Global raptor research and conservation priorities: Tropical raptors fall prey to knowledge gaps. DIVERS DISTRIB 2019. [DOI: 10.1111/ddi.12901] [Citation(s) in RCA: 72] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Affiliation(s)
- Evan R. Buechley
- HawkWatch International Salt Lake City Utah
- Department of Biology University of Utah Salt Lake City Utah
- Smithsonian Migratory Bird Center Washington, DC
| | - Andrea Santangeli
- The Helsinki Lab of Ornithology, Finnish Museum of Natural History University of Helsinki Helsinki Finland
- Helsinki Institute of Sustainability Science University of Helsinki Helsinki Finland
| | - Marco Girardello
- cE3c – Centre for Ecology, Evolution and Environmental Changes/Azorean Biodiversity Group Universidade dos Açores – Depto de Ciências e Engenharia do Ambiente Angra do Heroísmo Portugal
| | | | | | | | - Çagan H. Şekercioğlu
- Department of Biology University of Utah Salt Lake City Utah
- College of Sciences Koç University Istanbul Turkey
| |
Collapse
|
41
|
Subject-specific and group-based running pattern classification using a single wearable sensor. J Biomech 2019; 84:227-233. [PMID: 30670327 DOI: 10.1016/j.jbiomech.2019.01.001] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Revised: 11/21/2018] [Accepted: 01/02/2019] [Indexed: 01/08/2023]
Abstract
The objective of this study was to determine whether subject-specific or group-based models provided better classification accuracy to identify changes in biomechanical running gait patterns across different inclination conditions. The classification process was based on measurements from a single wearable sensor using a total of 41,780 strides from eleven recreational runners while running in real-world and uncontrolled environment. Biomechanical variables included pelvic drop, ground contact time, braking, vertical oscillation of pelvis, pelvic rotation, and cadence were recorded during running on three inclination grades: downhill, -2° to -7°; level, -0.2° to +0.2°; and uphill, +2° to +7°. An ensemble and non-linear machine learning algorithm, random forest (RF), was used to classify inclination condition and determine the importance of each of the biomechanical variables. Classification accuracy was determined for subject-specific and group-based RF models. The mean classification accuracy of all subject-specific RF models was 86.29%, while group-based classification accuracy was 76.17%. Braking was identified as the most important variable for all the runners using the group-based model and for most of the runners based on a subject-specific models. In addition, individual runners used different strategies across different inclination conditions and the ranked order of variable importance was unique for each runner. These results demonstrate that subject-specific models can better characterize changes in gait biomechanical patterns compared to a more traditional group-based approach.
Collapse
|
42
|
Zhang Y, Li Q, Xin Y, Lv W, Ge C. Association between serum magnesium and common complications of diabetes mellitus. Technol Health Care 2018; 26:379-387. [PMID: 29758962 PMCID: PMC6004978 DOI: 10.3233/thc-174702] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
BACKGROUND: Magnesium ion, as important cation in the human body, involved in various enzymatic reactions, glucose transport and insulin release. Now diabetes mellitus and diabetic complications have become important public health problems around the world. OBJECTIVE: This paper explores the association between concentration levels of serum magnesium and common complications and comorbidities of diabetes mellitus and other biochemical indexes. METHODS: There are 1217 eligible patients selected from 14,317 cases of diabetic hospitalization patients from January 2010 to December 2011. Random forest algorithm was applied to assess the importance of various biochemical indexes and to perform diabetic complications prediction. RESULTS: The research results showed that low concentration of serum magnesium and four common diabetic complications – diabetic retinopathy, diabetic nephropathy, diabetic neuropathy and diabetic macroangiopathy – exists association, but no obvious correlation with other comorbidities like hypertension. CONCLUSIONS: The specific factors of four common diabetic complications were selected from the biochemical indexes to provide a reference direction for further research.
Collapse
Affiliation(s)
| | | | - Yi Xin
- Corresponding author: Yi Xin, Department of Biomedical Engineering, School of Life Science, Beijing Institute of Technology, 5 South Zhongguancun Street, Haidian District, Beijing 100081, China. Tel.: +86 13810028162; E-mail: .
| | | | | |
Collapse
|
43
|
Matin S, Farahzadi L, Makaremi S, Chelgani SC, Sattari G. Variable selection and prediction of uniaxial compressive strength and modulus of elasticity by random forest. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2017.06.030] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
44
|
Georgoulas G, Karvelis P, Gavrilis D, Stylios CD, Nikolakopoulos G. An ordinal classification approach for CTG categorization. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2018; 2017:2642-2645. [PMID: 29060442 DOI: 10.1109/embc.2017.8037400] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Evaluation of cardiotocogram (CTG) is a standard approach employed during pregnancy and delivery. But, its interpretation requires high level expertise to decide whether the recording is Normal, Suspicious or Pathological. Therefore, a number of attempts have been carried out over the past three decades for development automated sophisticated systems. These systems are usually (multiclass) classification systems that assign a category to the respective CTG. However most of these systems usually do not take into consideration the natural ordering of the categories associated with CTG recordings. In this work, an algorithm that explicitly takes into consideration the ordering of CTG categories, based on binary decomposition method, is investigated. Achieved results, using as a base classifier the C4.5 decision tree classifier, prove that the ordinal classification approach is marginally better than the traditional multiclass classification approach, which utilizes the standard C4.5 algorithm for several performance criteria.
Collapse
|
45
|
Rapid Discrimination Between Authentic and Adulterated Andiroba Oil Using FTIR-HATR Spectroscopy and Random Forest. FOOD ANAL METHOD 2018. [DOI: 10.1007/s12161-017-1142-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
46
|
Analyzing Land Cover Change and Urban Growth Trajectories of the Mega-Urban Region of Dhaka Using Remotely Sensed Data and an Ensemble Classifier. SUSTAINABILITY 2017. [DOI: 10.3390/su10010010] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
|
47
|
López B, Torrent-Fontbona F, Viñas R, Fernández-Real JM. Single Nucleotide Polymorphism relevance learning with Random Forests for Type 2 diabetes risk prediction. Artif Intell Med 2017; 85:43-49. [PMID: 28943335 DOI: 10.1016/j.artmed.2017.09.005] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2017] [Revised: 09/04/2017] [Indexed: 10/18/2022]
Abstract
OBJECTIVE The use of artificial intelligence techniques to find out which Single Nucleotide Polymorphisms (SNPs) promote the development of a disease is one of the features of medical research, as such techniques may potentially aid early diagnosis and help in the prescription of preventive measures. In particular, the aim is to help physicians to identify the relevant SNPs related to Type 2 diabetes, and to build a decision-support tool for risk prediction. METHODS We use the Random Forest (RF) technique in order to search for the most important attributes (SNPs) related to diabetes, giving a weight (degree of importance), ranging between 0 and 1, to each attribute. Support Vector Machines and Logistic Regression have also been used since they are two other machine learning techniques that are well-established in the health community. Their performance has been compared to that achieved by RF. Furthermore, the relevance of the attributes obtained through the use of RF has then been used to perform predictions with k-Nearest Neighbour method weighting attributes in the similarity measure according to the relevance of the attributes with RF. RESULTS Testing is performed on a set of 677 subjects. RF is able to handle the complexity of features' interactions, overfitting, and unknown attribute values, providing the SNPs' relevance with an up to 0.89 area under the ROC curve in terms of risk prediction. RF outperforms all the other tested machine learning techniques in terms of prediction accuracy, and in terms of the stability of the estimated relevance of the attributes. CONCLUSIONS The Random Forest is a useful method for learning predictive models and the relevance of SNPs without any underlying assumption.
Collapse
Affiliation(s)
- Beatriz López
- University of Girona, Campus Montilivi, building EPS4, 17071 Girona, Spain.
| | | | - Ramón Viñas
- University of Girona, Campus Montilivi, building EPS4, 17071 Girona, Spain.
| | - José Manuel Fernández-Real
- Biomedical Research Institute of Girona, Avda. de França, s/n, 17007 Girona, Spain; CIBERobn Pathophysiology of Obesity and Nutrition, Instituto de Salud Carlos III, Madrid, Spain.
| |
Collapse
|
48
|
A computationally fast variable importance test for random forests for high-dimensional data. ADV DATA ANAL CLASSI 2016. [DOI: 10.1007/s11634-016-0276-4] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
49
|
Liu Z, Han J, Lv H, Liu J, Liu R. Computational identification of circular RNAs based on conformational and thermodynamic properties in the flanking introns. Comput Biol Chem 2016; 61:221-5. [PMID: 26917277 DOI: 10.1016/j.compbiolchem.2016.02.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2015] [Revised: 02/03/2016] [Accepted: 02/03/2016] [Indexed: 01/08/2023]
Abstract
Circular RNAs (circRNAs) were found more than 30 years ago, but have been treated as molecular flukes in a long time. Combining deep sequencing studies with bioinformatics technique, thousands of endogenous circRNAs have been found in mammalian cells, and some researchers have proved that several circRNAs act as competing endogenous RNAs (ceRNAs) to regulate gene expression. However, the mechanism by which the precursor mRNA to be transformed into a circular RNA or a linear mRNA is largely unknown. In this paper, we attempted to bioinformatically identify shared genomic features that might further elucidate the mechanism of formation and proposed a SVM-based model to distinguish circRNAs from non-circularized, expressed exons. Firstly, conformational and thermodynamic dinucleotide properties in the flanking introns were extracted as potential features. Secondly, two feature selection methods were applied to gain the optimal feature subset. Our 10-fold cross-validation results showed that the model can be used to distinguish circRNAs from non-circularized, expressed exons with an Sn of 0.884, Sp of 0.900, ACC of 0.892, MCC of 0.784, respectively. The identification results suggest that conformational and thermodynamic properties in the flanking introns are closely related to the formation of circRNAs. Datasets and the tool involved in this paper are all available at https://sourceforge.net/projects/predicircrnatool/files/.
Collapse
Affiliation(s)
- Ze Liu
- School of Electronics and Information Engineering, Xi'an jiaotong University, Xi'an 710049, PR China
| | - Jiuqiang Han
- School of Electronics and Information Engineering, Xi'an jiaotong University, Xi'an 710049, PR China.
| | - Hongqiang Lv
- School of Electronics and Information Engineering, Xi'an jiaotong University, Xi'an 710049, PR China
| | - Jun Liu
- School of Electronics and Information Engineering, Xi'an jiaotong University, Xi'an 710049, PR China; School of Electrical Engineering, Xi'an Jiaotong University, Xi'an 710049, PR China
| | - Ruiling Liu
- School of Electronics and Information Engineering, Xi'an jiaotong University, Xi'an 710049, PR China
| |
Collapse
|