1
|
Pireddu A, Bedini A, Lombardi M, Ciribini ALC, Berardi D. A Review of Data Mining Strategies by Data Type, with a Focus on Construction Processes and Health and Safety Management. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2024; 21:831. [PMID: 39063408 PMCID: PMC11277231 DOI: 10.3390/ijerph21070831] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 06/14/2024] [Accepted: 06/19/2024] [Indexed: 07/28/2024]
Abstract
Increasingly, information technology facilitates the storage and management of data useful for risk analysis and event prediction. Studies on data extraction related to occupational health and safety are increasingly available; however, due to its variability, the construction sector warrants special attention. This review is conducted under the research programs of the National Institute for Occupational Accident Insurance (Inail). OBJECTIVES The research question focuses on identifying which data mining (DM) methods, among supervised, unsupervised, and others, are most appropriate for certain investigation objectives, types, and sources of data, as defined by the authors. METHODS Scopus and ProQuest were the main sources from which we extracted studies in the field of construction, published between 2014 and 2023. The eligibility criteria applied in the selection of studies were based on the Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA). For exploratory purposes, we applied hierarchical clustering, while for in-depth analysis, we used principal component analysis (PCA) and meta-analysis. RESULTS The search strategy based on the PRISMA eligibility criteria provided us with 63 out of 2234 potential articles, 206 observations, 89 methodologies, 4 survey purposes, 3 data sources, 7 data types, and 3 resource types. Cluster analysis and PCA organized the information included in the paper dataset into two dimensions and labels: "supervised methods, institutional dataset, and predictive and classificatory purposes" (correlation 0.97-8.18 × 10-1; p-value 7.67 × 10-55-1.28 × 10-22) and the second, Dim2 "not-supervised methods; project, simulation, literature, text data; monitoring, decision-making processes; machinery and environment" (corr. 0.84-0.47; p-value 5.79 × 10-25--3.59 × 10-6). We answered the research question regarding which method, among supervised, unsupervised, or other, is most suitable for application to data in the construction industry. CONCLUSIONS The meta-analysis provided an overall estimate of the better effectiveness of supervised methods (Odds Ratio = 0.71, Confidence Interval 0.53-0.96) compared to not-supervised methods.
Collapse
Affiliation(s)
- Antonella Pireddu
- Department of Technological Innovations and Safety of Plants, Products and Anthropic Settlements (DIT), Italian National Institute for Insurance against Accidents at Work, Inail, 00144 Rome, Italy
| | - Angelico Bedini
- Department of Technological Innovations and Safety of Plants, Products and Anthropic Settlements (DIT), Italian National Institute for Insurance against Accidents at Work, Inail, 00144 Rome, Italy
| | - Mara Lombardi
- Department of Chemical Engineering Materials Environment (DICMA), Sapienza-University of Rome, 00184 Rome, Italy; (M.L.); (D.B.)
| | - Angelo L. C. Ciribini
- Department of Civil Engineering, Architecture, Land, Environment and Mathematics (DICATAM), Brescia University, 25121 Brescia, Italy;
| | - Davide Berardi
- Department of Chemical Engineering Materials Environment (DICMA), Sapienza-University of Rome, 00184 Rome, Italy; (M.L.); (D.B.)
| |
Collapse
|
2
|
Gangadhari RK, Rabiee M, Khanzode V, Murthy S, Kumar Tarei P. From unstructured accident reports to a hybrid decision support system for occupational risk management: The consensus converging approach. JOURNAL OF SAFETY RESEARCH 2024; 89:91-104. [PMID: 38858066 DOI: 10.1016/j.jsr.2024.02.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 11/06/2023] [Accepted: 02/13/2024] [Indexed: 06/12/2024]
Abstract
INTRODUCTION Workplace accidents in the petroleum industry can cause catastrophic damage to people, property, and the environment. Earlier studies in this domain indicate that the majority of the accident report information is available in unstructured text format. Conventional techniques for the analysis of accident data are time-consuming and heavily dependent on experts' subject knowledge, experience, and judgment. There is a need to develop a machine learning-based decision support system to analyze the vast amounts of unstructured text data that are frequently overlooked due to a lack of appropriate methodology. METHOD To address this gap in the literature, we propose a hybrid methodology that uses improved text-mining techniques combined with an un-bias group decision-making framework to combine the output of objective weights (based on text mining) and subjective weights (based on expert opinion) of risk factors to prioritize them. Based on the contextual word embedding models and term frequencies, we extracted five important clusters of risk factors comprising more than 32 risk sub-factors. A heterogeneous group of experts and employees in the petroleum industry were contacted to obtain their opinions on the extracted risk factors, and the best-worst method was used to convert their opinions to weights. CONCLUSIONS AND PRACTICAL APPLICATIONS The applicability of our proposed framework was tested on the data compiled from the accident data released by the petroleum industries in India. Our framework can be extended to accident data from any industry, to reduce analysis time and improve the accuracy in classifying and prioritizing risk factors.
Collapse
Affiliation(s)
- Rajan Kumar Gangadhari
- Operations and Supply Chain Management, Indian Institute of Management, Mumbai 400087, India.
| | - Meysam Rabiee
- Business School, University of Colorado Denver, Denver, CO 80202, USA.
| | - Vivek Khanzode
- Operations and Supply Chain Management, Indian Institute of Management, Mumbai 400087, India.
| | - Shankar Murthy
- Sustainability Management, Indian Institute of Management, Mumbai 400087, India.
| | - Pradeep Kumar Tarei
- Operations & Supply Chain Area, Indian Institute of Management Jammu, Jagti, Jammu & Kashmir, India.
| |
Collapse
|
3
|
Khairuddin MZF, Sankaranarayanan S, Hasikin K, Abd Razak NA, Omar R. Contextualizing injury severity from occupational accident reports using an optimized deep learning prediction model. PeerJ Comput Sci 2024; 10:e1985. [PMID: 38660193 PMCID: PMC11042013 DOI: 10.7717/peerj-cs.1985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 03/21/2024] [Indexed: 04/26/2024]
Abstract
Background This study introduced a novel approach for predicting occupational injury severity by leveraging deep learning-based text classification techniques to analyze unstructured narratives. Unlike conventional methods that rely on structured data, our approach recognizes the richness of information within injury narrative descriptions with the aim of extracting valuable insights for improved occupational injury severity assessment. Methods Natural language processing (NLP) techniques were harnessed to preprocess the occupational injury narratives obtained from the US Occupational Safety and Health Administration (OSHA) from January 2015 to June 2023. The methodology involved meticulous preprocessing of textual narratives to standardize text and eliminate noise, followed by the innovative integration of Term Frequency-Inverse Document Frequency (TF-IDF) and Global Vector (GloVe) word embeddings for effective text representation. The proposed predictive model adopts a novel Bidirectional Long Short-Term Memory (Bi-LSTM) architecture and is further refined through model optimization, including random search hyperparameters and in-depth feature importance analysis. The optimized Bi-LSTM model has been compared and validated against other machine learning classifiers which are naïve Bayes, support vector machine, random forest, decision trees, and K-nearest neighbor. Results The proposed optimized Bi-LSTM models' superior predictability, boasted an accuracy of 0.95 for hospitalization and 0.98 for amputation cases with faster model processing times. Interestingly, the feature importance analysis revealed predictive keywords related to the causal factors of occupational injuries thereby providing valuable insights to enhance model interpretability. Conclusion Our proposed optimized Bi-LSTM model offers safety and health practitioners an effective tool to empower workplace safety proactive measures, thereby contributing to business productivity and sustainability. This study lays the foundation for further exploration of predictive analytics in the occupational safety and health domain.
Collapse
Affiliation(s)
| | - Suresh Sankaranarayanan
- Department of Computer Science, College of Computer Science and Information Technology, King Faisal University, Hofuf, Kingdom of Saudi Arabia
| | - Khairunnisa Hasikin
- Department of Biomedical Engineering, Faculty of Engineering, Universiti Malaya, Kuala Lumpur, Kuala Lumpur, Malaysia
| | - Nasrul Anuar Abd Razak
- Department of Biomedical Engineering, Faculty of Engineering, Universiti Malaya, Kuala Lumpur, Kuala Lumpur, Malaysia
| | - Rosidah Omar
- Occupational and Environmental Health Unit, Kedah State Health Department, Alor Setar, Kedah, Malaysia
| |
Collapse
|
4
|
Khairuddin MZF, Hasikin K, Razak NAA, Mohshim SA, Ibrahim SS. Harnessing the Multimodal Data Integration and Deep Learning for Occupational Injury Severity Prediction. IEEE ACCESS 2023; 11:85284-85302. [DOI: 10.1109/access.2023.3304328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Affiliation(s)
| | - Khairunnisa Hasikin
- Department of Biomedical Engineering, Faculty of Engineering, University Malaya, Kuala Lumpur, Malaysia
| | - Nasrul Anuar Abd Razak
- Department of Biomedical Engineering, Faculty of Engineering, University Malaya, Kuala Lumpur, Malaysia
| | - Siti Afifah Mohshim
- Medical Engineering Technology Section, British Malaysian Institute, Universiti Kuala Lumpur, Kuala Lumpur, Selangor, Malaysia
| | - Siti Salwa Ibrahim
- Negeri Sembilan State Health Department, Ministry of Health, Seremban, Negeri Sembilan, Malaysia
| |
Collapse
|
5
|
Khairuddin MZF, Lu Hui P, Hasikin K, Abd Razak NA, Lai KW, Mohd Saudi AS, Ibrahim SS. Occupational Injury Risk Mitigation: Machine Learning Approach and Feature Optimization for Smart Workplace Surveillance. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:13962. [PMID: 36360843 PMCID: PMC9653932 DOI: 10.3390/ijerph192113962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 10/09/2022] [Accepted: 10/25/2022] [Indexed: 06/16/2023]
Abstract
Forecasting the severity of occupational injuries shall be all industries' top priority. The use of machine learning is theoretically valuable to assist the predictive analysis, thus, this study attempts to propose a feature-optimized predictive model for anticipating occupational injury severity. A public database of 66,405 occupational injury records from OSHA is analyzed using five sets of machine learning models: Support Vector Machine, K-Nearest Neighbors, Naïve Bayes, Decision Tree, and Random Forest. For model comparison, Random Forest outperformed other models with higher accuracy and F1-score. Therefore, it highlighted the potential of ensemble learning as a more accurate prediction model in the field of occupational injury. In constructing the model, this study also proposed the feature optimization technique that revealed the three most important features; 'nature of injury', 'type of event', and 'affected body part' in developing model. The accuracy of the Random Forest model was improved by 0.5% or 0.895 and 0.954 for the prediction of hospitalization and amputation, respectively by redeveloping and optimizing the model with hyperparameter tuning. The feature optimization is essential in providing insight knowledge to the Safety and Health Practitioners for future injury corrective and preventive strategies. This study has shown promising potential for smart workplace surveillance.
Collapse
Affiliation(s)
- Mohamed Zul Fadhli Khairuddin
- Department of Biomedical Engineering, Faculty of Engineering, Universiti Malaya, Kuala Lumpur 50603, Malaysia
- Environmental Healthcare Section, Institute of Medical Science Technology, Universiti Kuala Lumpur, Kajang 40300, Selangor, Malaysia
| | - Puat Lu Hui
- Department of Biomedical Engineering, Faculty of Engineering, Universiti Malaya, Kuala Lumpur 50603, Malaysia
| | - Khairunnisa Hasikin
- Department of Biomedical Engineering, Faculty of Engineering, Universiti Malaya, Kuala Lumpur 50603, Malaysia
- Centre of Intelligent Systems for Emerging Technology (CISET), Faculty of Engineering, Universiti Malaya, Kuala Lumpur 50603, Malaysia
| | - Nasrul Anuar Abd Razak
- Department of Biomedical Engineering, Faculty of Engineering, Universiti Malaya, Kuala Lumpur 50603, Malaysia
| | - Khin Wee Lai
- Department of Biomedical Engineering, Faculty of Engineering, Universiti Malaya, Kuala Lumpur 50603, Malaysia
| | - Ahmad Shakir Mohd Saudi
- Centre of Water Engineering Technology, Water Energy Section, Malaysia France Institute, Universiti Kuala Lumpur, Bangi 43650, Selangor, Malaysia
| | - Siti Salwa Ibrahim
- Negeri Sembilan State Health Department, Seremban 70300, Negeri Sembilan, Malaysia
| |
Collapse
|
6
|
Catchpoole J, Nanda G, Vallmuur K, Nand G, Lehto M. Application of a Machine Learning-based Decision Support Tool to Improve an Injury Surveillance System Workflow. Appl Clin Inform 2022; 13:700-710. [PMID: 35644141 DOI: 10.1055/a-1863-7176] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022] Open
Abstract
Abstract
Background
Emergency department (ED)-based injury surveillance systems across many countries face resourcing challenges related to manual validation and coding of data.
Objective
This paper describes the evaluation of a machine learning-based Decision Support Tool (DST) to assist injury surveillance departments in the validation, coding and use of their data, comparing outcomes in coding time and accuracy pre- and post-implementation.
Methods
Manually coded injury surveillance data has been used to develop, train and iteratively refine a machine learning-based classifier to enable semi-automated coding of injury narrative data. This paper describes a trial implementation of the machine learning-based DST in the Queensland Injury Surveillance Unit (QISU) workflow using a major pediatric hospital's emergency department data comparing outcomes in coding time and accuracy pre- and post-implementation.
Results
The study found a 10% reduction in manual coding time after the DST was introduced. The Kappa statistics analysis in both DST-assisted and unassisted data shows increases in accuracy across three data fields; injury intent (85.4% unassisted vs. 94.5% assisted), external cause (88.8% unassisted vs. 91.8% assisted) and injury factor (89.3% unassisted vs. 92.9% assisted). The classifier was also used to produce a timely report monitoring injury patterns during the COVID-19 pandemic. Hence, it has the potential for near real-time surveillance of emerging hazards to inform public health responses.
Conclusions
The integration of the DST into the injury surveillance workflow shows benefits as it facilitates timely reporting and acts as a DST in the manual coding process.
Collapse
Affiliation(s)
- Jesani Catchpoole
- Jamieson Trauma Institute, Metro North Hospital and Health Service, Herston, Australia
- Queensland Injury Surveillance Unit, Metro North Hospital and Health Service, Herston, Australia
- Queensland University of Technology, Kelvin Grove, Australia
| | - Gaurav Nanda
- School of Engineering Technology, Purdue University, West Lafayette, United States
| | - Kirsten Vallmuur
- Australian Centre for Health Services Innovation, Queensland University of Technology, Kelvin Grove, Australia
- Jamieson Trauma Institute, Metro North Hospital and Health Service, Herston, Australia
| | - Goshad Nand
- Queensland Injury Surveillance Unit, Metro North Hospital and Health Service, Herston, Australia
| | - Mark Lehto
- Industrial Engineering, Purdue University, West Lafayette, United States
| |
Collapse
|