1
|
Cassidy B, McBride C, Kendrick C, Reeves ND, Pappachan JM, Fernandez CJ, Chacko E, Brüngel R, Friedrich CM, Alotaibi M, AlWabel AA, Alderwish M, Lai KY, Yap MH. An enhanced harmonic densely connected hybrid transformer network architecture for chronic wound segmentation utilising multi-colour space tensor merging. Comput Biol Med 2025; 192:110172. [PMID: 40318494 DOI: 10.1016/j.compbiomed.2025.110172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Revised: 04/04/2025] [Accepted: 04/07/2025] [Indexed: 05/07/2025]
Abstract
Chronic wounds and associated complications present ever growing burdens for clinics and hospitals world wide. Venous, arterial, diabetic, and pressure wounds are becoming increasingly common globally. These conditions can result in highly debilitating repercussions for those affected, with limb amputations and increased mortality risk resulting from infection becoming more common. New methods to assist clinicians in chronic wound care are therefore vital to maintain high quality care standards. This paper presents an improved HarDNet segmentation architecture which integrates a contrast-eliminating component in the initial layers of the network to enhance feature learning. We also utilise a multi-colour space tensor merging process and adjust the harmonic shape of the convolution blocks to facilitate these additional features. We train our proposed model using wound images from light skinned patients and test the model on two test sets (one set with ground truth, and one without) comprising only darker skinned cases. Subjective ratings are obtained from clinical wound experts with intraclass correlation coefficient used to determine inter-rater reliability. For the dark skin tone test set with ground truth, when comparing the baseline results (DSC=0.6389, IoU=0.5350) with the results for the proposed model (DSC=0.7610, IoU=0.6620) we demonstrate improvements in terms of Dice similarity coefficient (+0.1221) and intersection over union (+0.1270). Measures from the qualitative analysis also indicate improvements in terms of high expert ratings, with improvements of >3% demonstrated when comparing the baseline model with the proposed model. This paper presents the first study to focus on darker skin tones for chronic wound segmentation using models trained only on wound images exhibiting lighter skin. Diabetes is highly prevalent in countries where patients have darker skin tones, highlighting the need for a greater focus on such cases. Additionally, we conduct the largest qualitative study to date for chronic wound segmentation. All source code for this study is available at: https://github.com/mmu-dermatology-research/hardnet-cws.
Collapse
Affiliation(s)
- Bill Cassidy
- Department of Computing and Mathematics, Manchester Metropolitan University, Dalton Building, Chester Street, Manchester, M1 5GD, UK.
| | - Christian McBride
- Department of Computing and Mathematics, Manchester Metropolitan University, Dalton Building, Chester Street, Manchester, M1 5GD, UK
| | - Connah Kendrick
- Department of Computing and Mathematics, Manchester Metropolitan University, Dalton Building, Chester Street, Manchester, M1 5GD, UK
| | - Neil D Reeves
- Medical School, Faculty of Health and Medicine, Health Innovation Campus, Lancaster University, LA1 4YW, UK
| | - Joseph M Pappachan
- Lancashire Teaching Hospitals NHS Foundation Trust, Preston, PR2 9HT, UK
| | | | - Elias Chacko
- Jersey General Hospital, St Helier, JE1 3QS, Jersey
| | - Raphael Brüngel
- Department of Computer Science, University of Applied Sciences and Arts Dortmund (FH Dortmund), Emil-Figge-Str. 42, 44227 Dortmund, Germany; Institute for Medical Informatics, Biometry and Epidemiology (IMIBE), University Hospital Essen, Zweigertstr. 37, 45130 Essen, Germany; Institute for Artificial Intelligence in Medicine (IKIM), University Hospital Essen, Girardetstr. 2, 45131 Essen, Germany
| | - Christoph M Friedrich
- Department of Computer Science, University of Applied Sciences and Arts Dortmund (FH Dortmund), Emil-Figge-Str. 42, 44227 Dortmund, Germany; Institute for Medical Informatics, Biometry and Epidemiology (IMIBE), University Hospital Essen, Zweigertstr. 37, 45130 Essen, Germany
| | - Metib Alotaibi
- University Diabetes Center, King Saud University Medical City, Riyadh, Saudi Arabia
| | | | - Mohammad Alderwish
- University Diabetes Center, King Saud University Medical City, Riyadh, Saudi Arabia
| | | | - Moi Hoon Yap
- Department of Computing and Mathematics, Manchester Metropolitan University, Dalton Building, Chester Street, Manchester, M1 5GD, UK; Lancashire Teaching Hospitals NHS Foundation Trust, Preston, PR2 9HT, UK
| |
Collapse
|
2
|
Chindanuruks T, Jindanil T, Cumpim C, Sinpitaksakul P, Arunjaroensuk S, Mattheos N, Pimkhaokham A. Development and validation of a deep learning algorithm for the classification of the level of surgical difficulty in impacted mandibular third molar surgery. Int J Oral Maxillofac Surg 2025; 54:452-460. [PMID: 39632213 DOI: 10.1016/j.ijom.2024.11.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 11/11/2024] [Accepted: 11/15/2024] [Indexed: 12/07/2024]
Abstract
The aim of this study was to develop and validate a convolutional neural network (CNN) algorithm for the detection of impacted mandibular third molars in panoramic radiographs and the classification of the surgical extraction difficulty level. A dataset of 1730 panoramic radiographs was collected; 1300 images were allocated to training and 430 to testing. The performance of the model was evaluated using the confusion matrix for multiclass classification, and the actual scores were compared to those of two human experts. The area under the precision-recall curve of the YOLOv5 model ranged from 72% to 89% across the variables in the surgical difficulty index. The area under the receiver operating characteristic curve showed promising results of the YOLOv5 model for classifying third molars into three surgical difficulty levels (micro-average AUC 87%). Furthermore, the algorithm scores demonstrated good agreement with the human experts. In conclusion, the YOLOv5 model has the potential to accurately detect and classify the position of mandibular third molars, with high performance for every criterion in radiographic images. The proposed model could serve as an aid in improving clinician performance and could be integrated into a screening system.
Collapse
Affiliation(s)
- T Chindanuruks
- Department of Oral and Maxillofacial Surgery, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand; Oral and Maxillofacial Surgery and Digital Implant Surgery Research Unit, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand
| | - T Jindanil
- Department of Radiology, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand
| | - C Cumpim
- Department of Computer Engineering, Faculty of Engineering, Rajamangala University of Technology Rattanakosin, Nakhon Pathom, Thailand
| | - P Sinpitaksakul
- Department of Radiology, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand
| | - S Arunjaroensuk
- Department of Oral and Maxillofacial Surgery, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand; Oral and Maxillofacial Surgery and Digital Implant Surgery Research Unit, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand.
| | - N Mattheos
- Department of Oral and Maxillofacial Surgery, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand; Oral and Maxillofacial Surgery and Digital Implant Surgery Research Unit, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand; Department of Dental Medicine, Karolinska Institute, Stockholm, Sweden
| | - A Pimkhaokham
- Department of Oral and Maxillofacial Surgery, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand; Oral and Maxillofacial Surgery and Digital Implant Surgery Research Unit, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand
| |
Collapse
|
3
|
Xu X, Luo W, Ren Z, Song X. Intelligent Detection and Recognition of Marine Plankton by Digital Holography and Deep Learning. SENSORS (BASEL, SWITZERLAND) 2025; 25:2325. [PMID: 40218838 PMCID: PMC11991423 DOI: 10.3390/s25072325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2025] [Revised: 03/12/2025] [Accepted: 03/17/2025] [Indexed: 04/14/2025]
Abstract
The detection, observation, recognition, and statistics of marine plankton are the basis of marine ecological research. In recent years, digital holography has been widely applied to plankton detection and recognition. However, the recording and reconstruction of digital holography require a strictly controlled laboratory environment and time-consuming iterative computation, respectively, which impede its application in marine plankton imaging. In this paper, an intelligent method designed with digital holography and deep learning algorithms is proposed to detect and recognize marine plankton (IDRMP). An accurate integrated A-Unet network is established under the principle of deep learning and trained by digital holograms recorded with publicly available plankton datasets. This method can complete the work of reconstructing and recognizing a variety of plankton organisms stably and efficiently by a single hologram, and a system interface of YOLOv5 that can realize the task of the end-to-end detection of plankton by a single frame is provided. The structural similarities of the images reconstructed by IDRMP are all higher than 0.97, and the average accuracy of the detection of four plankton species, namely, Appendicularian, Chaetognath, Echinoderm and Hydromedusae,, reaches 91.0% after using YOLOv5. In optical experiments, typical marine plankton collected from Weifang, China, are employed as samples. For randomly selected samples of Copepods, Tunicates and Polychaetes, the results are ideal and acceptable, and a batch detection function is developed for the learning of the system. Our test and experiment results demonstrate that this method is efficient and accurate for the detection and recognition of numerous plankton within a certain volume of space after they are recorded by digital holography.
Collapse
Affiliation(s)
- Xianfeng Xu
- College of Science, China University of Petroleum (East China), Qingdao 266580, China; (W.L.); (Z.R.); (X.S.)
| | | | | | | |
Collapse
|
4
|
Ohara J, Maeda Y, Ogata N, Kuroki T, Misawa M, Kudo SE, Nemoto T, Yamochi T, Iacucci M. Automated Neutrophil Quantification and Histological Score Estimation in Ulcerative Colitis. Clin Gastroenterol Hepatol 2025; 23:846-854.e7. [PMID: 39059545 DOI: 10.1016/j.cgh.2024.06.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 06/27/2024] [Accepted: 06/27/2024] [Indexed: 07/28/2024]
Abstract
BACKGROUND In the management of ulcerative colitis (UC), histological remission is increasingly recognized as the ultimate goal. The absence of neutrophil infiltration is crucial for assessing remission. This study aimed to develop an artificial intelligence (AI) system capable of accurately quantifying and localizing neutrophils in UC biopsy specimens to facilitate histological assessment. METHODS Our AI system, which incorporates semantic segmentation and object detection models, was developed to identify neutrophils in hematoxylin and eosin-stained whole slide images. The system assessed the presence and location of neutrophils within either the epithelium or lamina propria and predicted components of the Nancy Histological Index and the PICaSSO Histologic Remission Index. We evaluated the system's performance against that of experienced pathologists and validated its ability to predict future clinical relapse risk in patients with clinically remitted UC. The primary outcome measure was the clinical relapse rate, defined as a partial Mayo score of ≥3. RESULTS The model accurately identified neutrophils, achieving a performance of 0.77, 0.81, and 0.79 for precision, recall, and F-score, respectively. The system's histological score predictions showed a positive correlation with the pathologists' diagnoses (Spearman's ρ = 0.68-0.80; P < .05). Among patients who relapsed, the mean number of neutrophils in the rectum was higher than in those who did not relapse. Furthermore, the study highlighted that higher AI-based PICaSSO Histologic Remission Index and Nancy Histological Index scores were associated with hazard ratios increasing from 3.2 to 5.0 for evaluating the risk of UC relapse. CONCLUSIONS The AI system's precise localization and quantification of neutrophils proved valuable for histological assessment and clinical prognosis stratification.
Collapse
Affiliation(s)
- Jun Ohara
- Department of Pathology, Showa University School of Medicine, Tokyo, Japan.
| | - Yasuharu Maeda
- Digestive Disease Center, Showa University Northern Yokohama Hospital, Kanagawa, Japan; APC Microbiome Ireland, College of Medicine and Health, University College Cork, Cork, Ireland
| | - Noriyuki Ogata
- Digestive Disease Center, Showa University Northern Yokohama Hospital, Kanagawa, Japan
| | - Takanori Kuroki
- Digestive Disease Center, Showa University Northern Yokohama Hospital, Kanagawa, Japan
| | - Masashi Misawa
- Digestive Disease Center, Showa University Northern Yokohama Hospital, Kanagawa, Japan
| | - Shin-Ei Kudo
- Digestive Disease Center, Showa University Northern Yokohama Hospital, Kanagawa, Japan
| | - Tetsuo Nemoto
- Department of Diagnostic Pathology, Showa University Northern Yokohama Hospital, Kanagawa, Japan
| | - Toshiko Yamochi
- Department of Pathology, Showa University School of Medicine, Tokyo, Japan
| | - Marietta Iacucci
- APC Microbiome Ireland, College of Medicine and Health, University College Cork, Cork, Ireland
| |
Collapse
|
5
|
Verk J, Hernavs J, Klančnik S. Using a Region-Based Convolutional Neural Network (R-CNN) for Potato Segmentation in a Sorting Process. Foods 2025; 14:1131. [PMID: 40238279 PMCID: PMC11988819 DOI: 10.3390/foods14071131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2025] [Revised: 03/15/2025] [Accepted: 03/24/2025] [Indexed: 04/18/2025] Open
Abstract
This study focuses on the segmentation part in the development of a potato-sorting system that utilizes camera input for the segmentation and classification of potatoes. The key challenge addressed is the need for efficient segmentation to allow the sorter to handle a higher volume of potatoes simultaneously. To achieve this, the study employs a region-based convolutional neural network (R-CNN) approach for the segmentation task, while trying to achieve more precise segmentation than with classic CNN-based object detectors. Specifically, Mask R-CNN is implemented and evaluated based on its performance with different parameters in order to achieve the best segmentation results. The implementation and methodologies used are thoroughly detailed in this work. The findings reveal that Mask R-CNN models can be utilized in the production process of potato sorting and can improve the process.
Collapse
Affiliation(s)
| | | | - Simon Klančnik
- Laboratory for Machining Processes, Faculty of Mechanical Engineering, University of Maribor, Koroška Cesta 46, 2000 Maribor, Slovenia; (J.V.); (J.H.)
| |
Collapse
|
6
|
Ascagorta O, Pollicelli MD, Iaconis FR, Eder E, Vázquez-Sano M, Delrieux C. Large-Scale Coastal Marine Wildlife Monitoring with Aerial Imagery. J Imaging 2025; 11:94. [PMID: 40278010 PMCID: PMC12027912 DOI: 10.3390/jimaging11040094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2025] [Revised: 03/20/2025] [Accepted: 03/21/2025] [Indexed: 04/26/2025] Open
Abstract
Monitoring coastal marine wildlife is crucial for biodiversity conservation, environmental management, and sustainable utilization of tourism-related natural assets. Conducting in situ censuses and population studies in extensive and remote marine habitats often faces logistical constraints, necessitating the adoption of advanced technologies to enhance the efficiency and accuracy of monitoring efforts. This study investigates the utilization of aerial imagery and deep learning methodologies for the automated detection, classification, and enumeration of marine-coastal species. A comprehensive dataset of high-resolution images, captured by drones and aircrafts over southern elephant seal (Mirounga leonina) and South American sea lion (Otaria flavescens) colonies in the Valdés Peninsula, Patagonia, Argentina, was curated and annotated. Using this annotated dataset, a deep learning framework was developed and trained to identify and classify individual animals. The resulting model may help produce automated, accurate population metrics that support the analysis of ecological dynamics. The resulting model achieved F1 scores of between 0.7 and 0.9, depending on the type of individual. Among its contributions, this methodology provided essential insights into the impacts of emergent threats, such as the outbreak of the highly pathogenic avian influenza virus H5N1 during the 2023 austral spring season, which caused significant mortality in these species.
Collapse
Affiliation(s)
- Octavio Ascagorta
- Departamento de Ingeniería, Universidad Nacional de la Patagonia San Juan Bosco, Puerto Madryn 9120, Argentina; (O.A.); (M.D.P.)
| | - María Débora Pollicelli
- Departamento de Ingeniería, Universidad Nacional de la Patagonia San Juan Bosco, Puerto Madryn 9120, Argentina; (O.A.); (M.D.P.)
| | - Francisco Ramiro Iaconis
- Departamento de Física, Instituto de Física del Sur, Universidad Nacional del Sur (UNS) and CONICET, Bahía Blanca 8000, Argentina;
| | - Elena Eder
- Centro para el Estudio de Sistemas Marinos, Centro Nacional Patagonico, CONICET, Puerto Madryn 9120, Argentina;
| | - Mathías Vázquez-Sano
- Departamento de Biologia, Universidad Nacional de Catamarca, San Fernando del Valle de Catamarca 4700, Argentina;
| | - Claudio Delrieux
- Departamento de Ingeniería Eléctrica y Computadoras, Instituto de Ciencias e Ingeniería de la Computación, Universidad Nacional del Sur and CONICET, Bahía Blanca 8000, Argentina
| |
Collapse
|
7
|
Cheng CT, Ooyang CH, Liao CH, Kang SC. Applications of deep learning in trauma radiology: A narrative review. Biomed J 2025; 48:100743. [PMID: 38679199 PMCID: PMC11751421 DOI: 10.1016/j.bj.2024.100743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Revised: 03/26/2024] [Accepted: 04/24/2024] [Indexed: 05/01/2024] Open
Abstract
Diagnostic imaging is essential in modern trauma care for initial evaluation and identifying injuries requiring intervention. Deep learning (DL) has become mainstream in medical image analysis and has shown promising efficacy for classification, segmentation, and lesion detection. This narrative review provides the fundamental concepts for developing DL algorithms in trauma imaging and presents an overview of current progress in each modality. DL has been applied to detect free fluid on Focused Assessment with Sonography for Trauma (FAST), traumatic findings on chest and pelvic X-rays, and computed tomography (CT) scans, identify intracranial hemorrhage on head CT, detect vertebral fractures, and identify injuries to organs like the spleen, liver, and lungs on abdominal and chest CT. Future directions involve expanding dataset size and diversity through federated learning, enhancing model explainability and transparency to build clinician trust, and integrating multimodal data to provide more meaningful insights into traumatic injuries. Though some commercial artificial intelligence products are Food and Drug Administration-approved for clinical use in the trauma field, adoption remains limited, highlighting the need for multi-disciplinary teams to engineer practical, real-world solutions. Overall, DL shows immense potential to improve the efficiency and accuracy of trauma imaging, but thoughtful development and validation are critical to ensure these technologies positively impact patient care.
Collapse
Affiliation(s)
- Chi-Tung Cheng
- Department of Trauma and Emergency Surgery, Chang Gung Memorial Hospital, Linkou, Taoyuan, Taiwan; School of Medicine, Chang Gung University, Taoyuan, Taiwan
| | - Chun-Hsiang Ooyang
- Department of Trauma and Emergency Surgery, Chang Gung Memorial Hospital, Linkou, Taoyuan, Taiwan
| | - Chien-Hung Liao
- Department of Trauma and Emergency Surgery, Chang Gung Memorial Hospital, Linkou, Taoyuan, Taiwan
| | - Shih-Ching Kang
- Department of Trauma and Emergency Surgery, Chang Gung Memorial Hospital, Linkou, Taoyuan, Taiwan.
| |
Collapse
|
8
|
Butler RM, Frassini E, Vijfvinkel TS, van Riel S, Bachvarov C, Constandse J, van der Elst M, van den Dobbelsteen JJ, Hendriks BHW. Benchmarking 2D human pose estimators and trackers for workflow analysis in the cardiac catheterization laboratory. Med Eng Phys 2025; 136:104289. [PMID: 39979009 DOI: 10.1016/j.medengphy.2025.104289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 12/23/2024] [Accepted: 01/07/2025] [Indexed: 02/22/2025]
Abstract
Workflow insights can improve efficiency and safety in the Cardiac Catheterization Laboratory (Cath Lab). As manual analysis is labor-intensive, we aim for automation through camera monitoring. Literature shows that human poses are indicative of activities and therefore workflow. As a first exploration, we evaluate how marker-less multi-human pose estimators perform in the Cath Lab. We annotated poses in 2040 frames from ten multi-view coronary angiogram (CAG) recordings. Pose estimators AlphaPose, OpenPifPaf and OpenPose were run on the footage. Detection and tracking were evaluated separately for the Head, Arms, and Legs with Average Precision (AP), head-guided Percentage of Correct Keypoints (PCKh), Association Accuracy (AA), and Higher-Order Tracking Accuracy (HOTA). We give qualitative examples of results for situations common in the Cath Lab, with reflections in the monitor or occlusion of personnel. AlphaPose performed best on most mean Full-pose metrics with an AP from 0.56 to 0.82, AA from 0.55 to 0.71, and HOTA from 0.58 to 0.73. On PCKh OpenPifPaf scored highest, from 0.53 to 0.64. Arms, Legs, and the Head were detected best in that order, from the views which see the least occlusion. During tracking in the Cath Lab, AlphaPose tended to swap identities and OpenPifPaf merged different individuals. Results suggest that AlphaPose yields the most accurate confidence scores and limbs, and OpenPifPaf more accurate keypoint locations in the Cath Lab. Occlusions and reflection complicate pose tracking. The AP of up to 0.82 suggests that AlphaPose is a suitable pose detector for workflow analysis in the Cath Lab, whereas its HOTA of up to 0.73 here calls for another tracking solution.
Collapse
Affiliation(s)
- Rick M Butler
- Delft University of Technology, Delft, the Netherlands.
| | | | | | | | | | | | - Maarten van der Elst
- Delft University of Technology, Delft, the Netherlands; Reinier de Graaf Gasthuis, Delft, the Netherlands
| | | | - Benno H W Hendriks
- Delft University of Technology, Delft, the Netherlands; Philips Healthcare, Best, the Netherlands
| |
Collapse
|
9
|
Albuquerque C, Henriques R, Castelli M. Deep learning-based object detection algorithms in medical imaging: Systematic review. Heliyon 2025; 11:e41137. [PMID: 39758372 PMCID: PMC11699422 DOI: 10.1016/j.heliyon.2024.e41137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2024] [Revised: 12/04/2024] [Accepted: 12/10/2024] [Indexed: 01/06/2025] Open
Abstract
Over the past decade, Deep Learning (DL) techniques have demonstrated remarkable advancements across various domains, driving their widespread adoption. Particularly in medical image analysis, DL received greater attention for tasks like image segmentation, object detection, and classification. This paper provides an overview of DL-based object recognition in medical images, exploring recent methods and emphasizing different imaging techniques and anatomical applications. Utilizing a meticulous quantitative and qualitative analysis following PRISMA guidelines, we examined publications based on citation rates to explore into the utilization of DL-based object detectors across imaging modalities and anatomical domains. Our findings reveal a consistent rise in the utilization of DL-based object detection models, indicating unexploited potential in medical image analysis. Predominantly within Medicine and Computer Science domains, research in this area is most active in the US, China, and Japan. Notably, DL-based object detection methods have gotten significant interest across diverse medical imaging modalities and anatomical domains. These methods have been applied to a range of techniques including CR scans, pathology images, and endoscopic imaging, showcasing their adaptability. Moreover, diverse anatomical applications, particularly in digital pathology and microscopy, have been explored. The analysis underscores the presence of varied datasets, often with significant discrepancies in size, with a notable percentage being labeled as private or internal, and with prospective studies in this field remaining scarce. Our review of existing trends in DL-based object detection in medical images offers insights for future research directions. The continuous evolution of DL algorithms highlighted in the literature underscores the dynamic nature of this field, emphasizing the need for ongoing research and fitted optimization for specific applications.
Collapse
|
10
|
Trigka M, Dritsas E. A Comprehensive Survey of Machine Learning Techniques and Models for Object Detection. SENSORS (BASEL, SWITZERLAND) 2025; 25:214. [PMID: 39797004 PMCID: PMC11723456 DOI: 10.3390/s25010214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2024] [Revised: 12/18/2024] [Accepted: 12/30/2024] [Indexed: 01/13/2025]
Abstract
Object detection is a pivotal research domain within computer vision, with applications spanning from autonomous vehicles to medical diagnostics. This comprehensive survey presents an in-depth analysis of the evolution and significant advancements in object detection, emphasizing the critical role of machine learning (ML) and deep learning (DL) techniques. We explore a wide spectrum of methodologies, ranging from traditional approaches to the latest DL models, thoroughly evaluating their performance, strengths, and limitations. Additionally, the survey delves into various metrics for assessing model effectiveness, including precision, recall, and intersection over union (IoU), while addressing ongoing challenges in the field, such as managing occlusions, varying object scales, and improving real-time processing capabilities. Furthermore, we critically examine recent breakthroughs, including advanced architectures like Transformers, and discuss challenges and future research directions aimed at overcoming existing barriers. By synthesizing current advancements, this survey provides valuable insights for enhancing the robustness, accuracy, and efficiency of object detection systems across diverse and challenging applications.
Collapse
Affiliation(s)
| | - Elias Dritsas
- Industrial Systems Institute, Athena Research and Innovation Center, 26504 Patras, Greece;
| |
Collapse
|
11
|
Butler RM, Vijfvinkel TS, Frassini E, van Riel S, Bachvarov C, Constandse J, van der Elst M, van den Dobbelsteen JJ, Hendriks BHW. 2D human pose tracking in the cardiac catheterisation laboratory with BYTE. Med Eng Phys 2025; 135:104270. [PMID: 39922649 DOI: 10.1016/j.medengphy.2024.104270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 10/22/2024] [Accepted: 12/01/2024] [Indexed: 02/10/2025]
Abstract
Workflow insights can enable safety- and efficiency improvements in the Cardiac Catheterisation Laboratory (Cath Lab). Human pose tracklets from video footage can provide a source of workflow information. However, occlusions and visual similarity between personnel make the Cath Lab a challenging environment for the re-identification of individuals. We propose a human pose tracker that addresses these problems specifically, and test it on recordings of real coronary angiograms. This tracker uses no visual information for re-identification, and instead employs object keypoint similarity between detections and predictions from a third-order motion model. Algorithm performance is measured on Cath Lab footage using Higher-Order Tracking Accuracy (HOTA). To evaluate its stability during procedures, this is done separately for five different surgical steps of the procedure. We achieve up to 0.71 HOTA where tested state-of-the-art pose trackers score up to 0.65 on the used dataset. We observe that the pose tracker HOTA performance varies with up to 10 percentage point (pp) between workflow phases, where tested state-of-the-art trackers show differences of up to 23 pp. In addition, the tracker achieves up to 22.5 frames per second, which is 9 frames per second faster than the current state-of-the-art on our setup in the Cath Lab. The fast and consistent short-term performance of the provided algorithm makes it suitable for use in workflow analysis in the Cath Lab and opens the door to real-time use-cases. Our code is publicly available at https://github.com/RM-8vt13r/PoseBYTE.
Collapse
Affiliation(s)
- Rick M Butler
- Delft University of Technology, Delft, the Netherlands.
| | | | | | | | | | | | - Maarten van der Elst
- Delft University of Technology, Delft, the Netherlands; Reinier de Graaf Gasthuis, Delft, the Netherlands
| | | | - Benno H W Hendriks
- Delft University of Technology, Delft, the Netherlands; Philips Healthcare, Best, the Netherlands
| |
Collapse
|
12
|
Dennis D, Suebnukarn S, Vicharueang S, Limprasert W. Development and evaluation of a deep learning segmentation model for assessing non-surgical endodontic treatment outcomes on periapical radiographs: A retrospective study. PLoS One 2024; 19:e0310925. [PMID: 39739891 DOI: 10.1371/journal.pone.0310925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Accepted: 09/09/2024] [Indexed: 01/02/2025] Open
Abstract
This study aimed to evaluate the performance of a deep learning-based segmentation model for predicting outcomes of non-surgical endodontic treatment. Preoperative and 3-year postoperative periapical radiographic images of each tooth from routine root canal treatments performed by endodontists from 2015 to 2021 were obtained retrospectively from Thammasat University hospital. Preoperative radiographic images of 1200 teeth with 3-year follow-up results (440 healed, 400 healing, and 360 disease) were collected. Mask Region-based Convolutional Neural Network (Mask R-CNN) was used to pixel-wise segment the root from other structures in the image and trained to predict class label into healed, healing and disease. Three endodontists annotated 1080 images used for model training, validation, and testing. The performance of the model was evaluated on a test set and also by comparison with the performance of clinicians (general practitioners and endodontists) with and without the help of the model on independent 120 images. The performance of the Mask R-CNN prediction model was high with the mean average precision (mAP) of 0.88 (95% CI 0.83-0.93) and area under the precision-recall curve of 0.91 (95% CI 0.88-0.94), 0.83 (95% CI 0.81-0.85), 0.91 (95% CI 0.90-0.92) on healed, healing and disease, respectively. The prediction metrics of general practitioners and endodontists significantly improved with the help of Mask R-CNN outperforming clinicians alone with mAP increasing from 0.75 (95% CI 0.72-0.78) to 0.84 (95% CI 0.81-0.87) and 0.88 (95% CI 0.85-0.91) to 0.92 (95% CI 0.89-0.95), respectively. In conclusion, deep learning-based segmentation model had the potential to predict non-surgical endodontic treatment outcomes from periapical radiographic images and were expected to aid in endodontic treatment.
Collapse
Affiliation(s)
- Dennis Dennis
- Faculty of Dentistry, Universitas Sumatera Utara, Medan, Indonesia
| | | | | | - Wasit Limprasert
- College of Interdisciplinary Studies, Thammasat University, Pathum Thani, Thailand
| |
Collapse
|
13
|
Scaillierez AJ, Izquierdo García-Faria T, Broers H, van Nieuwamerongen - de Koning SE, van der Tol RPPJ, Bokkers EAM, Boumans IJMM. Determining the posture and location of pigs using an object detection model under different lighting conditions. Transl Anim Sci 2024; 8:txae167. [PMID: 39669266 PMCID: PMC11635830 DOI: 10.1093/tas/txae167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2024] [Accepted: 12/02/2024] [Indexed: 12/14/2024] Open
Abstract
Computer vision techniques are becoming increasingly popular for monitoring pig behavior. For instance, object detection models allow us to detect the presence of pigs, their location, and their posture. The performance of object detection models can be affected by variations in lighting conditions (e.g., intensity, spectrum, and uniformity). Furthermore, lighting conditions can influence pigs' active and resting behavior. In the context of experiments testing different lighting conditions, a detection model was developed to detect the location and postures of group-housed growing-finishing pigs. The objective of this paper is to validate the model developed using YOLOv8 detecting standing, sitting, sternal lying, and lateral lying pigs. Training, validation, and test datasets included annotation of pigs from 10 to 24 wk of age in 10 different light settings; varying in intensity, spectrum, and uniformity. Pig detection was comparable for the different lighting conditions, despite a slightly lower posture agreement for warm light and uneven light distribution, likely due to a less clear contrast between pigs and their background and the presence of shadows. The detection reached a mean average precision (mAP) of 89.4%. Standing was the best-detected posture with the highest precision, sensitivity, and F1 score, while the sensitivity and F1 score of sitting was the lowest. This lower performance resulted from confusion of sitting with sternal lying and standing, as a consequence of the top camera view and a low occurrence of sitting pigs in the annotated dataset. This issue is inherent to pig behavior and could be tackled using data augmentation. Some confusion was reported between types of lying due to occlusion by pen mates or pigs' own bodies, and grouping both types of lying postures resulted in an improvement in the detection (mAP = 97.0%). Therefore, comparing resting postures (both lying types) to active postures could lead to a more reliable interpretation of pigs' behavior. Some detection errors were observed, e.g., two detections for the same pig were generated due to posture uncertainty, dirt on cameras detected as a pig, and undetected pigs due to occlusion. The localization accuracy measured by the intersection over union was higher than 95.5% for 75% of the dataset, meaning that the location of predicted pigs was very close to annotated pigs. Tracking individual pigs revealed challenges with ID changes and switches between pen mates, requiring further work.
Collapse
Affiliation(s)
- Alice J Scaillierez
- Animal Production Systems group, Wageningen University & Research, P.O. Box 338, 6700 AH Wageningen, The Netherlands
| | - Tomás Izquierdo García-Faria
- Wageningen Livestock Research, Wageningen University & Research, P.O. Box 338, 6700 AH Wageningen, The Netherlands
| | - Harry Broers
- Signify Research, Signify, High Tech Campus 7, 5656 AE Eindhoven, The Netherlands
| | | | - Rik P P J van der Tol
- Agricultural Biosystems Engineering group, Wageningen University & Research, P.O. Box 16, 6700 AA Wageningen, The Netherlands
| | - Eddie A M Bokkers
- Animal Production Systems group, Wageningen University & Research, P.O. Box 338, 6700 AH Wageningen, The Netherlands
| | - Iris J M M Boumans
- Animal Production Systems group, Wageningen University & Research, P.O. Box 338, 6700 AH Wageningen, The Netherlands
| |
Collapse
|
14
|
Kim YS, Kim JG, Choi HY, Lee D, Kong JW, Kang GH, Jang YS, Kim W, Lee Y, Kim J, Shin DG, Park JK, Lee G, Kim B. Detection of Aortic Dissection and Intramural Hematoma in Non-Contrast Chest Computed Tomography Using a You Only Look Once-Based Deep Learning Model. J Clin Med 2024; 13:6868. [PMID: 39598012 PMCID: PMC11594775 DOI: 10.3390/jcm13226868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2024] [Revised: 11/06/2024] [Accepted: 11/12/2024] [Indexed: 11/29/2024] Open
Abstract
Background/Objectives: Aortic dissection (AD) and aortic intramural hematoma (IMH) are fatal diseases with similar clinical characteristics. Immediate computed tomography (CT) with a contrast medium is required to confirm the presence of AD or IMH. This retrospective study aimed to use CT images to differentiate AD and IMH from normal aorta (NA) using a deep learning algorithm. Methods: A 6-year retrospective study of non-contrast chest CT images was conducted at a university hospital in Seoul, Republic of Korea, from January 2016 to July 2021. The position of the aorta was analyzed in each CT image and categorized as NA, AD, or IMH. The images were divided into training, validation, and test sets in an 8:1:1 ratio. A deep learning model that can differentiate between AD and IMH from NA using non-contrast CT images alone, called YOLO (You Only Look Once) v4, was developed. The YOLOv4 model was used to analyze 8881 non-contrast CT images from 121 patients. Results: The YOLOv4 model can distinguish AD, IMH, and NA from each other simultaneously with a probability of over 92% using non-contrast CT images. Conclusions: This model can help distinguish AD and IMH from NA when applying a contrast agent is challenging.
Collapse
Affiliation(s)
- Yu-Seop Kim
- Department of Convergence Software, Hallym University, Chuncheon 24252, Republic of Korea; (Y.-S.K.); (D.L.); (J.-W.K.)
| | - Jae Guk Kim
- Department of Emergency Medicine, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07441, Republic of Korea; (J.G.K.); (G.H.K.); (Y.S.J.); (W.K.); (Y.L.)
- Hallym Biomedical Informatics Convergence Research Center, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07441, Republic of Korea; (G.L.); (B.K.)
| | - Hyun Young Choi
- Department of Emergency Medicine, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07441, Republic of Korea; (J.G.K.); (G.H.K.); (Y.S.J.); (W.K.); (Y.L.)
- Hallym Biomedical Informatics Convergence Research Center, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07441, Republic of Korea; (G.L.); (B.K.)
| | - Dain Lee
- Department of Convergence Software, Hallym University, Chuncheon 24252, Republic of Korea; (Y.-S.K.); (D.L.); (J.-W.K.)
| | - Jin-Woo Kong
- Department of Convergence Software, Hallym University, Chuncheon 24252, Republic of Korea; (Y.-S.K.); (D.L.); (J.-W.K.)
| | - Gu Hyun Kang
- Department of Emergency Medicine, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07441, Republic of Korea; (J.G.K.); (G.H.K.); (Y.S.J.); (W.K.); (Y.L.)
- Hallym Biomedical Informatics Convergence Research Center, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07441, Republic of Korea; (G.L.); (B.K.)
| | - Yong Soo Jang
- Department of Emergency Medicine, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07441, Republic of Korea; (J.G.K.); (G.H.K.); (Y.S.J.); (W.K.); (Y.L.)
- Hallym Biomedical Informatics Convergence Research Center, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07441, Republic of Korea; (G.L.); (B.K.)
| | - Wonhee Kim
- Department of Emergency Medicine, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07441, Republic of Korea; (J.G.K.); (G.H.K.); (Y.S.J.); (W.K.); (Y.L.)
- Hallym Biomedical Informatics Convergence Research Center, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07441, Republic of Korea; (G.L.); (B.K.)
| | - Yoonje Lee
- Department of Emergency Medicine, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07441, Republic of Korea; (J.G.K.); (G.H.K.); (Y.S.J.); (W.K.); (Y.L.)
- Hallym Biomedical Informatics Convergence Research Center, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07441, Republic of Korea; (G.L.); (B.K.)
| | - Jihoon Kim
- Department of Thoracic and Cardiovascular Surgery, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07441, Republic of Korea;
| | - Dong Geum Shin
- Division of Cardiology, Department of Internal Medicine, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07441, Republic of Korea;
| | - Jae Keun Park
- Department of Internal Medicine, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07441, Republic of Korea;
| | - Gayoung Lee
- Hallym Biomedical Informatics Convergence Research Center, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07441, Republic of Korea; (G.L.); (B.K.)
- Department of Health Policy and Management, Ewha Womans University Graduate School of Clinical Biohealth, Seoul 03760, Republic of Korea
| | - Bitnarae Kim
- Hallym Biomedical Informatics Convergence Research Center, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul 07441, Republic of Korea; (G.L.); (B.K.)
| |
Collapse
|
15
|
Horikawa H, Tanese K, Nonaka N, Seita J, Amagai M, Saito M. Reliable and easy-to-use calculating tool for the Nail Psoriasis Severity Index using deep learning. NPJ Syst Biol Appl 2024; 10:130. [PMID: 39511184 PMCID: PMC11544089 DOI: 10.1038/s41540-024-00458-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Accepted: 10/21/2024] [Indexed: 11/15/2024] Open
Abstract
Since nail psoriasis restricts the patient's daily activities, therapeutic intervention based on reliable and reproducible evaluation is critical. The Nail Psoriasis Severity Index (NAPSI) is a validated scoring tool, but its usefulness is limited by interobserver variability. This study aimed to develop a reliable and accurate NAPSI scoring tool using deep learning. The tool "NAPSI calculator" includes two parts: nail detection from images and NAPSI scoring. NAPSI was annotated by nine nail experts who are board-certified dermatologists with sufficient experience in a specialized clinic for nail diseases. In the final test set, the "NAPSI calculator" correctly located 137/138 nails and scored NAPSI with higher accuracy than the compared six non-board-certified residents: 83.9% vs 65.7%; P = 0.008 and four board-certified non-nail expert dermatologists: 83.9% vs 73.0%; P = 0.005. The "NAPSI calculator" can be readily used in a clinical situation, contributing to raising the medical practice level for nail psoriasis.
Collapse
Affiliation(s)
- Hiroto Horikawa
- Department of Dermatology, Keio University School of Medicine. 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan
| | - Keiji Tanese
- Department of Dermatology, Keio University School of Medicine. 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan
| | - Naoki Nonaka
- Advanced Data Science Project, RIKEN Information R&D and Strategy Headquarters, RIKEN, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027, Japan
| | - Jun Seita
- Advanced Data Science Project, RIKEN Information R&D and Strategy Headquarters, RIKEN, 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027, Japan
| | - Masayuki Amagai
- Department of Dermatology, Keio University School of Medicine. 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan
| | - Masataka Saito
- Department of Dermatology, Keio University School of Medicine. 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan.
| |
Collapse
|
16
|
Neupane N, Goswami R, Harrison K, Oberhauser K, Ries L, McCormick C. Artificial intelligence correctly classifies developmental stages of monarch caterpillars enabling better conservation through the use of community science photographs. Sci Rep 2024; 14:27039. [PMID: 39511275 PMCID: PMC11544161 DOI: 10.1038/s41598-024-78509-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Accepted: 10/31/2024] [Indexed: 11/15/2024] Open
Abstract
Rapid technological advances and growing participation from amateur naturalists have made countless images of insects in their natural habitats available on global web portals. Despite advances in automated species identification, traits like developmental stage or health remain underexplored or manually annotated, with limited focus on automating these features. As a proof-of-concept, we developed a computer vision model utilizing the YOLOv5 algorithm to accurately detect monarch butterfly caterpillars in photographs and classify them into their five developmental stages (instars). The training data were obtained from the iNaturalist portal, and the photographs were first classified and annotated by experts to allow supervised training of models. Our best trained model demonstrates excellent performance on object detection, achieving a mean average precision score of 95% across all five instars. In terms of classification, the YOLOv5l version yielded the best performance, reaching 87% instar classification accuracy for all classes in the test set. Our approach and model show promise in developing detection and classification models for developmental stages for insects, a resource that can be used for large-scale mechanistic studies. These photos hold valuable untapped information, and we've released our annotated collection as an open dataset to support replication and expansion of our methods.
Collapse
|
17
|
Herve Q, Ipek N, Verwaeren J, De Beer T. Automated particle inspection of continuously freeze-dried products using computer vision. Int J Pharm 2024; 664:124629. [PMID: 39181173 DOI: 10.1016/j.ijpharm.2024.124629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Revised: 08/14/2024] [Accepted: 08/21/2024] [Indexed: 08/27/2024]
Abstract
The pharmaceutical industry is progressing towards more continuous manufacturing techniques. To dry biopharmaceuticals, continuous freeze drying has several advantages on manufacturing and process analytical control compared to batch freeze-drying, including better visual inspection potential. Visual inspection of every freeze-dried product is a key quality assessment after the lyophilization process to ensure that freeze-dried products are free from foreign particles and defects. This quality assessment is labor-intensive for operators who need to assess thousands of samples for an extensive amount of time leading to certain drawbacks. Applying Artificial Intelligence, specifically computer vision, on high-resolution images from every freeze-dried product can quantitatively and qualitatively outperform human visual inspection. For this study, continuously freeze-dried samples were prepared based on a real-world pharmaceutical product using manually induced particles of different sizes and subsequently imaged using a tailor-made setup to develop an image dataset (with particle sizes from 50μm to 1 mm) used to train multiple object detection models. You Only Look Once version 7 (YOLOv7) outperforms human inspection by a large margin, obtaining particle detection precision of up to 88.9% while controlling the recall at 81.2%, thus detecting most of the object present in the images, with an inference time of less than 1 s per vial.
Collapse
Affiliation(s)
- Quentin Herve
- Laboratory of Pharmaceutical Process Analytical Technology, Department of Pharmaceutical Analysis, Ghent University, 9000 Gent, Belgium.
| | - Nusret Ipek
- Department of Data Analysis and Mathematical Modelling, Ghent University, Coupure Links 653 B-9000 Gent, Belgium
| | - Jan Verwaeren
- Department of Data Analysis and Mathematical Modelling, Ghent University, Coupure Links 653 B-9000 Gent, Belgium
| | - Thomas De Beer
- Laboratory of Pharmaceutical Process Analytical Technology, Department of Pharmaceutical Analysis, Ghent University, 9000 Gent, Belgium.
| |
Collapse
|
18
|
Corral-Sanz P, Barreiro-Garrido A, Moreno AB, Sanchez A. On the influence of artificially distorted images in firearm detection performance using deep learning. PeerJ Comput Sci 2024; 10:e2381. [PMID: 39650342 PMCID: PMC11622868 DOI: 10.7717/peerj-cs.2381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Accepted: 09/10/2024] [Indexed: 12/11/2024]
Abstract
Detecting people carrying firearms in outdoor or indoor scenes usually identifies (or avoids) potentially dangerous situations. Nevertheless, the automatic detection of these weapons can be greatly affected by the scene conditions. Commonly, in real scenes these firearms can be seen from different perspectives. They also may have different real and apparent sizes. Moreover, the images containing these targets are usually cluttered, and firearms can appear as partially occluded. It is also common that the images can be affected by several types of distortions such as impulse noise, image darkening or blurring. All these perceived variabilities could significantly degrade the accuracy of firearm detection. Current deep detection networks offer good classification accuracy, with high efficiency and under constrained computational resources. However, the influence of practical conditions in which the objects are to be detected has not sufficiently been analyzed. Our article describes an experimental study on how a set of selected image distortions quantitatively degrade the detection performance on test images when the detection networks have only been trained with images that do not present the alterations. The analyzed test image distortions include impulse noise, blurring (or defocus), image darkening, image shrinking and occlusions. In order to quantify the impact of each individual distortion on the firearm detection problem, we have used a standard YOLOv5 network. Our experimental results have shown that the increased addition of impulse salt-and-pepper noise is by far the distortion that affects the most the performance of the detection network.
Collapse
Affiliation(s)
- Patricia Corral-Sanz
- Department of Computer Science and Statistics, Universidad Rey Juan Carlos, Mostoles, Madrid, Spain
| | - Alvaro Barreiro-Garrido
- Department of Computer Science and Statistics, Universidad Rey Juan Carlos, Mostoles, Madrid, Spain
| | - A. Belen Moreno
- Department of Computer Science and Statistics, Universidad Rey Juan Carlos, Mostoles, Madrid, Spain
| | - Angel Sanchez
- Department of Computer Science and Statistics, Universidad Rey Juan Carlos, Mostoles, Madrid, Spain
| |
Collapse
|
19
|
Aulia U, Hasanuddin I, Dirhamsyah M, Nasaruddin N. A new CNN-BASED object detection system for autonomous mobile robots based on real-world vehicle datasets. Heliyon 2024; 10:e35247. [PMID: 39166079 PMCID: PMC11334655 DOI: 10.1016/j.heliyon.2024.e35247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 07/21/2024] [Accepted: 07/25/2024] [Indexed: 08/22/2024] Open
Abstract
Recently, autonomous mobile robots (AMRs) have begun to be used in the delivery of goods, but one of the biggest challenges faced in this field is the navigation system that guides a robot to its destination. The navigation system must be able to identify objects in the robot's path and take evasive actions to avoid them. Developing an object detection system for an AMR requires a deep learning model that is able to achieve a high level of accuracy, with fast inference times, and a model with a compact size that can be run on embedded control systems. Consequently, object recognition requires a convolutional neural network (CNN)-based model that can yield high object classification accuracy and process data quickly. This paper introduces a new CNN-based object detection system for an AMR that employs real-world vehicle datasets. First, we create original real-world datasets of images from Banda Aceh city. We then develop a new CNN-based object identification system that is capable of identifying cars, motorcycles, people, and rickshaws under morning, afternoon, and evening lighting conditions. An SSD Mobilenetv2 FPN Lite 320 × 320 architecture is employed for retraining using these real-world datasets. Quantitative and qualitative performance indicators are then applied to evaluate the CNN model. Training the pre-trained SSD Mobilenetv2 FPN Lite 320 × 320 model improves its classification and detection accuracy, as indicated by its performance results. We conclude that the proposed CNN-based object detection system has the potential for use in an AMR.
Collapse
Affiliation(s)
- Udink Aulia
- Doctoral Program, School of Engineering, Post Graduate Program, Universitas Syiah Kuala, Banda Aceh, 23111, Indonesia
- Dept. of Mechanical and Industrial Engineering, Universitas Syiah Kuala, Banda Aceh, 23111, Indonesia
| | - Iskandar Hasanuddin
- Dept. of Mechanical and Industrial Engineering, Universitas Syiah Kuala, Banda Aceh, 23111, Indonesia
| | - Muhammad Dirhamsyah
- Dept. of Mechanical and Industrial Engineering, Universitas Syiah Kuala, Banda Aceh, 23111, Indonesia
| | - Nasaruddin Nasaruddin
- Dept. of Electrical and Computer Engineering, Universitas Syiah Kuala, Banda Aceh, 23111, Indonesia
| |
Collapse
|
20
|
Hanson NN, Ounsley JP, Henry J, Terzić K, Caneco B. Automatic detection of fish scale circuli using deep learning. Biol Methods Protoc 2024; 9:bpae056. [PMID: 39155982 PMCID: PMC11330318 DOI: 10.1093/biomethods/bpae056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 07/25/2024] [Accepted: 07/31/2024] [Indexed: 08/20/2024] Open
Abstract
Teleost fish scales form distinct growth rings deposited in proportion to somatic growth in length, and are routinely used in fish ageing and growth analyses. Extraction of incremental growth data from scales is labour intensive. We present a fully automated method to retrieve this data from fish scale images using Convolutional Neural Networks (CNNs). Our pipeline of two CNNs automatically detects the centre of the scale and individual growth rings (circuli) along multiple radial transect emanating from the centre. The focus detector was trained on 725 scale images and achieved an average precision of 99%; the circuli detector was trained on 40 678 circuli annotations and achieved an average precision of 95.1%. Circuli detections were made with less confidence in the freshwater zone of the scale image where the growth bands are most narrowly spaced. However, the performance of the circuli detector was similar to that of another human labeller, highlighting the inherent ambiguity of the labelling process. The system predicts the location of scale growth rings rapidly and with high accuracy, enabling the calculation of spacings and thereby growth inferences from salmon scales. The success of our method suggests its potential for expansion to other species.
Collapse
Affiliation(s)
- Nora N Hanson
- Freshwater Fisheries Laboratory, Marine Directorate, Scottish Government, Pitlochry PH16 5LB, United Kingdom
| | - James P Ounsley
- Freshwater Fisheries Laboratory, Marine Directorate, Scottish Government, Pitlochry PH16 5LB, United Kingdom
| | - Jason Henry
- Freshwater Fisheries Laboratory, Marine Directorate, Scottish Government, Pitlochry PH16 5LB, United Kingdom
| | - Kasim Terzić
- School of Computer Science, University of St Andrews, St Andrews KY16 9SX, United Kingdom
| | - Bruno Caneco
- Freshwater Fisheries Laboratory, Marine Directorate, Scottish Government, Pitlochry PH16 5LB, United Kingdom
| |
Collapse
|
21
|
Huang S, Deng G, Kang Y, Li J, Li J, Lyu M. Exploring deep learning strategies for intervertebral disc herniation detection on veterinary MRI. Sci Rep 2024; 14:16705. [PMID: 39030338 PMCID: PMC11271534 DOI: 10.1038/s41598-024-67749-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 07/15/2024] [Indexed: 07/21/2024] Open
Abstract
Intervertebral Disc Herniation (IVDH) is a common spinal disease in dogs, significantly impacting their health, mobility, and overall well-being. This study initiates an effort to automate the detection and localization of IVDH lesions in veterinary MRI scans, utilizing advanced artificial intelligence (AI) methods. A comprehensive canine IVDH dataset, comprising T2-weighted sagittal MRI images from 213 pet dogs of various breeds, ages, and sizes, was compiled and utilized to train and test the IVDH detection models. The experimental results showed that traditional two-stage detection models reliably outperformed one-stage models, including the recent You Only Look Once X (YOLOX) detector. In terms of methodology, this study introduced a novel spinal localization module, successfully integrated into different object detection models to enhance IVDH detection, achieving an average precision (AP) of up to 75.32%. Additionally, transfer learning was explored to adapt the IVDH detection model for a smaller feline dataset. Overall, this study provides insights into advancing AI for veterinary care, identifying challenges and exploring potential strategies for future development in veterinary radiology.
Collapse
Affiliation(s)
| | | | - Yan Kang
- Shenzhen Technology University, Shenzhen, China
| | - Jianzhong Li
- Shenzhen GoldenStone Medical Technology Co., Ltd., Shenzhen, China
| | - Jingyu Li
- Shenzhen Technology University, Shenzhen, China.
| | - Mengye Lyu
- Shenzhen Technology University, Shenzhen, China.
| |
Collapse
|
22
|
Yu K, Wu J, Wang M, Cai Y, Zhu M, Yao S, Zhou Y. Using UAV images and deep learning in investigating potential breeding sites of Aedes albopictus. Acta Trop 2024; 255:107234. [PMID: 38688444 DOI: 10.1016/j.actatropica.2024.107234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 04/27/2024] [Accepted: 04/27/2024] [Indexed: 05/02/2024]
Abstract
Aedes albopictus (Diptera: Culicidae) plays a crucial role as a vector for mosquito-borne diseases like dengue and zika. Given the limited availability of effective vaccines, the prevention of Aedes-borne diseases mainly relies on extensive efforts in vector surveillance and control. In multiple mosquito control methods, the identification and elimination of potential breeding sites (PBS) for Aedes are recognized as effective methods for population control. Previous studies utilizing unmanned aerial vehicles (UAVs) and deep learning to identify PBS have primarily focused on large, regularly-shaped containers. However, there has been a small amount of empirical research into their practical application in the field. We have thus constructed a PBS dataset specifically tailored for Ae. albopictus, including items such as buckets, bowls, bins, aquatic plants, jars, lids, pots, boxes, and sinks that were common in the Yangtze River Basin in China. Then, a YOLO v7 model for identifying these PBS was developed. Finally, we recognized and labeled the area with the highest PBS density, as well as the subarea with the most urgent need for source reduction in the empirical region, by calculating the kernel density value. Based on the above research, we proposed a UAV-AI-based methodological framework to locate the spatial distribution of PBS, and conducted empirical research on Jinhulu New Village, a typical model community. The results revealed that the YOLO v7 model achieved an excellent result on the F1 score and mAP(both above 0.99), with 97% of PBS correctly located. The predicted distribution of different PBS categories in each subarea was completely consistent with true distribution; the five houses with the most PBS were correctly located. The results of the kernel density map indicate the subarea 4 with the highest density of PBS, where PBS needs to be removed or destroyed with immediate effect. These results demonstrate the reliability of the prediction results and the feasibility of the UAV-AI-based methodological framework. It can minimize repetitive labor, enhance efficiency, and provide guidance for the removal and destruction of PBS. The research can shed light on the investigation of mosquito PBS investigation both methodologically and practically.
Collapse
Affiliation(s)
- Keyi Yu
- Key Laboratory of Geographic Information Science, Ministry of Education, East China Normal University, Shanghai, 200241, China; School of Geographic Sciences, East China Normal University, Shanghai, 200241, China; Key Laboratory of Spatial-temporal Big Data Analysis and Application of Natural Resources in Megacities, Ministry of Natural Resources, Shanghai, 200241, China
| | - Jianping Wu
- Key Laboratory of Geographic Information Science, Ministry of Education, East China Normal University, Shanghai, 200241, China; School of Geographic Sciences, East China Normal University, Shanghai, 200241, China; Key Laboratory of Spatial-temporal Big Data Analysis and Application of Natural Resources in Megacities, Ministry of Natural Resources, Shanghai, 200241, China
| | - Minghao Wang
- Key Laboratory of Geographic Information Science, Ministry of Education, East China Normal University, Shanghai, 200241, China; School of Geographic Sciences, East China Normal University, Shanghai, 200241, China; Key Laboratory of Spatial-temporal Big Data Analysis and Application of Natural Resources in Megacities, Ministry of Natural Resources, Shanghai, 200241, China
| | - Yizhou Cai
- Minhang District Centre for Disease Control and Prevention, Shanghai, 201011, China
| | - Minhui Zhu
- Minhang District Centre for Disease Control and Prevention, Shanghai, 201011, China
| | - Shenjun Yao
- Key Laboratory of Geographic Information Science, Ministry of Education, East China Normal University, Shanghai, 200241, China; School of Geographic Sciences, East China Normal University, Shanghai, 200241, China; Key Laboratory of Spatial-temporal Big Data Analysis and Application of Natural Resources in Megacities, Ministry of Natural Resources, Shanghai, 200241, China.
| | - Yibin Zhou
- Minhang District Centre for Disease Control and Prevention, Shanghai, 201011, China.
| |
Collapse
|
23
|
Lee HS, Yang S, Han JY, Kang JH, Kim JE, Huh KH, Yi WJ, Heo MS, Lee SS. Automatic detection and classification of nasopalatine duct cyst and periapical cyst on panoramic radiographs using deep convolutional neural networks. Oral Surg Oral Med Oral Pathol Oral Radiol 2024; 138:184-195. [PMID: 38158267 DOI: 10.1016/j.oooo.2023.09.012] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 08/01/2023] [Accepted: 09/15/2023] [Indexed: 01/03/2024]
Abstract
OBJECTIVE The aim of this study was to evaluate a deep convolutional neural network (DCNN) method for the detection and classification of nasopalatine duct cysts (NPDC) and periapical cysts (PAC) on panoramic radiographs. STUDY DESIGN A total of 1,209 panoramic radiographs with 606 NPDC and 603 PAC were labeled with a bounding box and divided into training, validation, and test sets with an 8:1:1 ratio. The networks used were EfficientDet-D3, Faster R-CNN, YOLO v5, RetinaNet, and SSD. Mean average precision (mAP) was used to assess performance. Sixty images with no lesion in the anterior maxilla were added to the previous test set and were tested on 2 dentists with no training in radiology (GP) and on EfficientDet-D3. The performances were comparatively examined. RESULTS The mAP for each DCNN was EfficientDet-D3 93.8%, Faster R-CNN 90.8%, YOLO v5 89.5%, RetinaNet 79.4%, and SSD 60.9%. The classification performance of EfficientDet-D3 was higher than that of the GPs' with accuracy, sensitivity, specificity, positive predictive value, and negative predictive value of 94.4%, 94.4%, 97.2%, 94.6%, and 97.2%, respectively. CONCLUSIONS The proposed method achieved high performance for the detection and classification of NPDC and PAC compared with the GPs and presented promising prospects for clinical application.
Collapse
Affiliation(s)
- Han-Sol Lee
- Department of Oral and Maxillofacial Radiology and Dental Research Institute, School of Dentistry, Seoul National University, Seoul, South Korea
| | - Su Yang
- Department of Applied Bioengineering, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, South Korea
| | - Ji-Yong Han
- Interdisciplinary Program in Bioengineering, College of Engineering, Seoul National University, Seoul, South Korea
| | - Ju-Hee Kang
- Department of Oral and Maxillofacial Radiology, Seoul National University Dental Hospital, Seoul, South Korea
| | - Jo-Eun Kim
- Department of Oral and Maxillofacial Radiology and Dental Research Institute, School of Dentistry, Seoul National University, Seoul, South Korea
| | - Kyung-Hoe Huh
- Department of Oral and Maxillofacial Radiology and Dental Research Institute, School of Dentistry, Seoul National University, Seoul, South Korea
| | - Won-Jin Yi
- Department of Oral and Maxillofacial Radiology and Dental Research Institute, School of Dentistry, Seoul National University, Seoul, South Korea; Department of Applied Bioengineering, Graduate School of Convergence Science and Technology, Seoul National University, Seoul, South Korea; Interdisciplinary Program in Bioengineering, College of Engineering, Seoul National University, Seoul, South Korea.
| | - Min-Suk Heo
- Department of Oral and Maxillofacial Radiology and Dental Research Institute, School of Dentistry, Seoul National University, Seoul, South Korea.
| | - Sam-Sun Lee
- Department of Oral and Maxillofacial Radiology and Dental Research Institute, School of Dentistry, Seoul National University, Seoul, South Korea
| |
Collapse
|
24
|
Ataguba G, Orji R. Toward the design of persuasive systems for a healthy workplace: a real-time posture detection. Front Big Data 2024; 7:1359906. [PMID: 38953011 PMCID: PMC11215059 DOI: 10.3389/fdata.2024.1359906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Accepted: 05/10/2024] [Indexed: 07/03/2024] Open
Abstract
Persuasive technologies, in connection with human factor engineering requirements for healthy workplaces, have played a significant role in ensuring a change in human behavior. Healthy workplaces suggest different best practices applicable to body posture, proximity to the computer system, movement, lighting conditions, computer system layout, and other significant psychological and cognitive aspects. Most importantly, body posture suggests how users should sit or stand in workplaces in line with best and healthy practices. In this study, we developed two study phases (pilot and main) using two deep learning models: convolutional neural networks (CNN) and Yolo-V3. To train the two models, we collected posture datasets from creative common license YouTube videos and Kaggle. We classified the dataset into comfortable and uncomfortable postures. Results show that our YOLO-V3 model outperformed CNN model with a mean average precision of 92%. Based on this finding, we recommend that YOLO-V3 model be integrated in the design of persuasive technologies for a healthy workplace. Additionally, we provide future implications for integrating proximity detection taking into consideration the ideal number of centimeters users should maintain in a healthy workplace.
Collapse
Affiliation(s)
- Grace Ataguba
- Department of Computer Science, Dalhousie University, Halifax, NS, Canada
| | | |
Collapse
|
25
|
Divasón J, Romero A, Martinez-de-Pison FJ, Casalongue M, Silvestre MA, Santolaria P, Yániz JL. Analysis of Varroa Mite Colony Infestation Level Using New Open Software Based on Deep Learning Techniques. SENSORS (BASEL, SWITZERLAND) 2024; 24:3828. [PMID: 38931612 PMCID: PMC11207890 DOI: 10.3390/s24123828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 06/07/2024] [Accepted: 06/11/2024] [Indexed: 06/28/2024]
Abstract
Varroa mites, scientifically identified as Varroa destructor, pose a significant threat to beekeeping and cause one of the most destructive diseases affecting honey bee populations. These parasites attach to bees, feeding on their fat tissue, weakening their immune systems, reducing their lifespans, and even causing colony collapse. They also feed during the pre-imaginal stages of the honey bee in brood cells. Given the critical role of honey bees in pollination and the global food supply, controlling Varroa mites is imperative. One of the most common methods used to evaluate the level of Varroa mite infestation in a bee colony is to count all the mites that fall onto sticky boards placed at the bottom of a colony. However, this is usually a manual process that takes a considerable amount of time. This work proposes a deep learning approach for locating and counting Varroa mites using images of the sticky boards taken by smartphone cameras. To this end, a new realistic dataset has been built: it includes images containing numerous artifacts and blurred parts, which makes the task challenging. After testing various architectures (mainly based on two-stage detectors with feature pyramid networks), combination of hyperparameters and some image enhancement techniques, we have obtained a system that achieves a mean average precision (mAP) metric of 0.9073 on the validation set.
Collapse
Affiliation(s)
- Jose Divasón
- Departament of Mathematics and Computer Science, University of La Rioja, 26006 Logroño, Spain;
| | - Ana Romero
- Departament of Mathematics and Computer Science, University of La Rioja, 26006 Logroño, Spain;
| | | | - Matías Casalongue
- BIOFITER Research Group, Environmental Sciences Institute (IUCA), Department of Animal Production and Food Sciences, University of Zaragoza, 22071 Huesca, Spain; (M.C.); (P.S.)
| | - Miguel A. Silvestre
- Department of Cell Biology, Functional Biology and Physical Anthropology, University of Valencia, 46100 Burjassot, Spain;
| | - Pilar Santolaria
- BIOFITER Research Group, Environmental Sciences Institute (IUCA), Department of Animal Production and Food Sciences, University of Zaragoza, 22071 Huesca, Spain; (M.C.); (P.S.)
| | - Jesús L. Yániz
- BIOFITER Research Group, Environmental Sciences Institute (IUCA), Department of Animal Production and Food Sciences, University of Zaragoza, 22071 Huesca, Spain; (M.C.); (P.S.)
| |
Collapse
|
26
|
Laranjeira C, Pereira M, Oliveira R, Barbosa G, Fernandes C, Bermudi P, Resende E, Fernandes E, Nogueira K, Andrade V, Quintanilha JA, dos Santos JA, Chiaravalloti-Neto F. Automatic mapping of high-risk urban areas for Aedes aegypti infestation based on building facade image analysis. PLoS Negl Trop Dis 2024; 18:e0011811. [PMID: 38829905 PMCID: PMC11192312 DOI: 10.1371/journal.pntd.0011811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 06/21/2024] [Accepted: 05/17/2024] [Indexed: 06/05/2024] Open
Abstract
BACKGROUND Dengue, Zika, and chikungunya, whose viruses are transmitted mainly by Aedes aegypti, significantly impact human health worldwide. Despite the recent development of promising vaccines against the dengue virus, controlling these arbovirus diseases still depends on mosquito surveillance and control. Nonetheless, several studies have shown that these measures are not sufficiently effective or ineffective. Identifying higher-risk areas in a municipality and directing control efforts towards them could improve it. One tool for this is the premise condition index (PCI); however, its measure requires visiting all buildings. We propose a novel approach capable of predicting the PCI based on facade street-level images, which we call PCINet. METHODOLOGY Our study was conducted in Campinas, a one million-inhabitant city in São Paulo, Brazil. We surveyed 200 blocks, visited their buildings, and measured the three traditional PCI components (building and backyard conditions and shading), the facade conditions (taking pictures of them), and other characteristics. We trained a deep neural network with the pictures taken, creating a computational model that can predict buildings' conditions based on the view of their facades. We evaluated PCINet in a scenario emulating a real large-scale situation, where the model could be deployed to automatically monitor four regions of Campinas to identify risk areas. PRINCIPAL FINDINGS PCINet produced reasonable results in differentiating the facade condition into three levels, and it is a scalable strategy to triage large areas. The entire process can be automated through data collection from facade data sources and inferences through PCINet. The facade conditions correlated highly with the building and backyard conditions and reasonably well with shading and backyard conditions. The use of street-level images and PCINet could help to optimize Ae. aegypti surveillance and control, reducing the number of in-person visits necessary to identify buildings, blocks, and neighborhoods at higher risk from mosquito and arbovirus diseases.
Collapse
Affiliation(s)
- Camila Laranjeira
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Matheus Pereira
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Raul Oliveira
- Department of Epidemiology, School of Public Health of University of São Paulo, São Paulo, Brazil
| | - Gerson Barbosa
- Pasteur Institute, Secretary of Health of the State of São Paulo, São Paulo, Brazil
| | - Camila Fernandes
- Department of Epidemiology, School of Public Health of University of São Paulo, São Paulo, Brazil
| | - Patricia Bermudi
- Department of Epidemiology, School of Public Health of University of São Paulo, São Paulo, Brazil
| | - Ester Resende
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Eduardo Fernandes
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Keiller Nogueira
- Computer Science and Mathematics, University of Stirling, Stirling, United Kingdom
| | - Valmir Andrade
- Epidemiologic Surveillance Center, Secretary of Health of the State of São Paulo, São Paulo, Brazil
| | | | - Jefersson A. dos Santos
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
- Department of Computer Science, University of Sheffield, Sheffield, United Kingdom
| | | |
Collapse
|
27
|
Aneja P, Kinna T, Newman J, Sami S, Cassidy J, McCarthy J, Tiwari M, Kumar A, Spencer JP. Leveraging technological advances to assess dyadic visual cognition during infancy in high- and low-resource settings. Front Psychol 2024; 15:1376552. [PMID: 38873529 PMCID: PMC11169819 DOI: 10.3389/fpsyg.2024.1376552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Accepted: 05/08/2024] [Indexed: 06/15/2024] Open
Abstract
Caregiver-infant interactions shape infants' early visual experience; however, there is limited work from low-and middle-income countries (LMIC) in characterizing the visual cognitive dynamics of these interactions. Here, we present an innovative dyadic visual cognition pipeline using machine learning methods which captures, processes, and analyses the visual dynamics of caregiver-infant interactions across cultures. We undertake two studies to examine its application in both low (rural India) and high (urban UK) resource settings. Study 1 develops and validates the pipeline to process caregiver-infant interaction data captured using head-mounted cameras and eye-trackers. We use face detection and object recognition networks and validate these tools using 12 caregiver-infant dyads (4 dyads from a 6-month-old UK cohort, 4 dyads from a 6-month-old India cohort, and 4 dyads from a 9-month-old India cohort). Results show robust and accurate face and toy detection, as well as a high percent agreement between processed and manually coded dyadic interactions. Study 2 applied the pipeline to a larger data set (25 6-month-olds from the UK, 31 6-month-olds from India, and 37 9-month-olds from India) with the aim of comparing the visual dynamics of caregiver-infant interaction across the two cultural settings. Results show remarkable correspondence between key measures of visual exploration across cultures, including longer mean look durations during infant-led joint attention episodes. In addition, we found several differences across cultures. Most notably, infants in the UK had a higher proportion of infant-led joint attention episodes consistent with a child-centered view of parenting common in western middle-class families. In summary, the pipeline we report provides an objective assessment tool to quantify the visual dynamics of caregiver-infant interaction across high- and low-resource settings.
Collapse
Affiliation(s)
- Prerna Aneja
- School of Psychology, University of East Anglia, Norwich, United Kingdom
| | - Thomas Kinna
- School of Medicine, University of East Anglia, Norwich, United Kingdom
- School of Pharmacy, University of East Anglia, Norwich, United Kingdom
| | - Jacob Newman
- IT and Computing, University of East Anglia, Norwich, United Kingdom
| | - Saber Sami
- School of Medicine, University of East Anglia, Norwich, United Kingdom
| | - Joe Cassidy
- School of Psychology, University of East Anglia, Norwich, United Kingdom
| | - Jordan McCarthy
- School of Psychology, University of East Anglia, Norwich, United Kingdom
| | | | | | - John P. Spencer
- School of Psychology, University of East Anglia, Norwich, United Kingdom
| |
Collapse
|
28
|
Yıldız M, Sarpdağı Y, Okuyar M, Yildiz M, Çiftci N, Elkoca A, Yildirim MS, Aydin MA, Parlak M, Bingöl B. Segmentation and classification of skin burn images with artificial intelligence: Development of a mobile application. Burns 2024; 50:966-979. [PMID: 38331663 DOI: 10.1016/j.burns.2024.01.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 12/26/2023] [Accepted: 01/10/2024] [Indexed: 02/10/2024]
Abstract
AIM This study was conducted to determine the segmentation, classification, object detection, and accuracy of skin burn images using artificial intelligence and a mobile application. With this study, individuals were able to determine the degree of burns and see how to intervene through the mobile application. METHODS This research was conducted between 26.10.2021-01.09.2023. In this study, the dataset was handled in two stages. In the first stage, the open-access dataset was taken from https://universe.roboflow.com/, and the burn images dataset was created. In the second stage, in order to determine the accuracy of the developed system and artificial intelligence model, the patients admitted to the hospital were identified with our own design Burn Wound Detection Android application. RESULTS In our study, YOLO V7 architecture was used for segmentation, classification, and object detection. There are 21018 data in this study, and 80% of them are used as training data, and 20% of them are used as test data. The YOLO V7 model achieved a success rate of 75.12% on the test data. The Burn Wound Detection Android mobile application that we developed in the study was used to accurately detect images of individuals. CONCLUSION In this study, skin burn images were segmented, classified, object detected, and a mobile application was developed using artificial intelligence. First aid is crucial in burn cases, and it is an important development for public health that people living in the periphery can quickly determine the degree of burn through the mobile application and provide first aid according to the instructions of the mobile application.
Collapse
Affiliation(s)
- Metin Yıldız
- Department of Nursing, Sakarya University, Sakarya, Turkey.
| | - Yakup Sarpdağı
- Department of Nursing Van Yuzuncu Yil University, Turkey
| | - Mehmet Okuyar
- Sakarya University of Applied Sciences Biomedical Engineering, Sakarya, Turkey
| | - Mehmet Yildiz
- Sakarya University of Applied Sciences, Distance Education Research and Application Center, Sakarya, Turkey
| | - Necmettin Çiftci
- Muş Alparslan University, Faculty of Health Sciences, Department of Nursing, 49100 Muş, Turkey
| | - Ayşe Elkoca
- Gaziantep Islamic University of Science and Technology Faculty of Health Sciences, Midwifery, Turkey
| | - Mehmet Salih Yildirim
- Vocational School of Health Services, Agri Ibrahim Cecen University School of Health, Agri, Turkey
| | | | - Mehmet Parlak
- Ataturk University, Department of Nursing, Erzurum, Turkey
| | - Bünyamin Bingöl
- Sakarya University, Electrical and Electronics Engineering, Sakarya, Turkey
| |
Collapse
|
29
|
Ashraf AR, Somogyi-Végh A, Merczel S, Gyimesi N, Fittler A. Leveraging code-free deep learning for pill recognition in clinical settings: A multicenter, real-world study of performance across multiple platforms. Artif Intell Med 2024; 150:102844. [PMID: 38553153 DOI: 10.1016/j.artmed.2024.102844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 03/11/2024] [Accepted: 03/11/2024] [Indexed: 04/02/2024]
Abstract
BACKGROUND Preventable patient harm, particularly medication errors, represent significant challenges in healthcare settings. Dispensing the wrong medication is often associated with mix-up of lookalike and soundalike drugs in high workload environments. Replacing manual dispensing with automated unit dose and medication dispensing systems to reduce medication errors is not always feasible in clinical facilities experiencing high patient turn-around or frequent dose changes. Artificial intelligence (AI) based pill recognition tools and smartphone applications could potentially aid healthcare workers in identifying pills in situations where more advanced dispensing systems are not implemented. OBJECTIVE Most of the published research on pill recognition focuses on theoretical aspects of model development using traditional coding and deep learning methods. The use of code-free deep learning (CFDL) as a practical alternative for accessible model development, and implementation of such models in tools intended to aid decision making in clinical settings, remains largely unexplored. In this study, we sought to address this gap in existing literature by investigating whether CFDL is a viable approach for developing pill recognition models using a custom dataset, followed by a thorough evaluation of the model across various deployment scenarios, and in multicenter clinical settings. Furthermore, we aimed to highlight challenges and propose solutions to achieve optimal performance and real-world applicability of pill recognition models, including when deployed on smartphone applications. METHODS A pill recognition model was developed utilizing Microsoft Azure Custom Vision platform and a large custom training dataset of 26,880 images captured from the top 30 most dispensed solid oral dosage forms (SODFs) at the three participating hospitals. A comprehensive internal and external testing strategy was devised, model's performance was investigated through the online API, and offline using exported TensorFlow Lite model running on a Windows PC and on Android, using a tailor-made testing smartphone application. Additionally, model's calibration, degree of reliance on color features and device dependency was thoroughly evaluated. Real-world performance was assessed using images captured by hospital pharmacists at three participating clinical centers. RESULTS The pill recognition model showed high performance in Microsoft Azure Custom Vision platform with 98.7 % precision, 95.1 % recall, and 98.2 % mean average precision (mAP), with thresholds set to 50 %. During internal testing utilizing the online API, the model reached 93.7 % precision, 88.96 % recall, 90.81 % F1-score and 87.35 % mAP. Testing the offline TensorFlow Lite model on Windows PC showed a slight performance reduction, with 91.16 % precision, 83.82 % recall, 86.18 % F1-score and 82.55 % mAP. Performance of the model running offline on the Android application was further reduced to 86.50 % precision, 75.00 % recall, 77.83 % F1-score and 69.24 % mAP. During external clinical testing through the online API an overall precision of 83.10 %, recall of 71.39 %, and F1-score of 75.76 % was achieved. CONCLUSION Our study demonstrates that using a CFDL approach is a feasible and cost-effective method for developing AI-based pill recognition systems. Despite the limitations encountered, our model performed well, particularly when accessed through the online API. The use of CFDL facilitates interdisciplinary collaboration, resulting in human-centered AI models with enhanced real-world applicability. We suggest that rather than striving to build a universally applicable pill recognition system, models should be tailored to the medications in a regional formulary or needs of a specific clinic, which can in turn lead to improved performance in real-world deployment in these locations. Parallel to focusing on model development, it is crucial to employ a human centered approach by training the end users on how to properly interact with the AI based system to maximize benefits. Future research is needed on refining pill recognition models for broader adaptability. This includes investigating image pre-processing and optimization techniques to enhance offline performance and operation on handheld devices. Moreover, future studies should explore methods to overcome limitations of CFDL development to enhance the robustness of models and reduce overfitting. Collaborative efforts between researchers in this domain and sharing of best practices are vital to improve pill recognition systems, ultimately enhancing patient safety and healthcare outcomes.
Collapse
Affiliation(s)
- Amir Reza Ashraf
- Department of Pharmaceutics and Central Clinical Pharmacy, Faculty of Pharmacy, University of Pécs, Pécs, Hungary.
| | - Anna Somogyi-Végh
- Department of Pharmaceutics and Central Clinical Pharmacy, Faculty of Pharmacy, University of Pécs, Pécs, Hungary
| | - Sára Merczel
- Department of Pharmacy, Somogy County Kaposi Mór Teaching Hospital, Kaposvár, Hungary
| | - Nóra Gyimesi
- Péterfy Hospital and Jenő Manninger Traumatology Center, Budapest, Hungary
| | - András Fittler
- Department of Pharmaceutics and Central Clinical Pharmacy, Faculty of Pharmacy, University of Pécs, Pécs, Hungary
| |
Collapse
|
30
|
Pérez de Frutos J, Holden Helland R, Desai S, Nymoen LC, Langø T, Remman T, Sen A. AI-Dentify: deep learning for proximal caries detection on bitewing x-ray - HUNT4 Oral Health Study. BMC Oral Health 2024; 24:344. [PMID: 38494481 PMCID: PMC10946166 DOI: 10.1186/s12903-024-04120-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Accepted: 03/07/2024] [Indexed: 03/19/2024] Open
Abstract
BACKGROUND Dental caries diagnosis requires the manual inspection of diagnostic bitewing images of the patient, followed by a visual inspection and probing of the identified dental pieces with potential lesions. Yet the use of artificial intelligence, and in particular deep-learning, has the potential to aid in the diagnosis by providing a quick and informative analysis of the bitewing images. METHODS A dataset of 13,887 bitewings from the HUNT4 Oral Health Study were annotated individually by six different experts, and used to train three different object detection deep-learning architectures: RetinaNet (ResNet50), YOLOv5 (M size), and EfficientDet (D0 and D1 sizes). A consensus dataset of 197 images, annotated jointly by the same six dental clinicians, was used for evaluation. A five-fold cross validation scheme was used to evaluate the performance of the AI models. RESULTS The trained models show an increase in average precision and F1-score, and decrease of false negative rate, with respect to the dental clinicians. When compared against the dental clinicians, the YOLOv5 model shows the largest improvement, reporting 0.647 mean average precision, 0.548 mean F1-score, and 0.149 mean false negative rate. Whereas the best annotators on each of these metrics reported 0.299, 0.495, and 0.164 respectively. CONCLUSION Deep-learning models have shown the potential to assist dental professionals in the diagnosis of caries. Yet, the task remains challenging due to the artifacts natural to the bitewing images.
Collapse
Affiliation(s)
- Javier Pérez de Frutos
- Department of Health Research, SINTEF Digital, Professor Brochs gate 2, Trondheim, 7030, Norway.
| | - Ragnhild Holden Helland
- Department of Health Research, SINTEF Digital, Professor Brochs gate 2, Trondheim, 7030, Norway
| | | | - Line Cathrine Nymoen
- Department of public Health and Nursing, Norwegian University of Science and Technology, Trondheim, Norway
- Kompetansesenteret Tannhelse Midt (TkMidt), Trondheim, Norway
| | - Thomas Langø
- Department of Health Research, SINTEF Digital, Professor Brochs gate 2, Trondheim, 7030, Norway
| | | | - Abhijit Sen
- Department of public Health and Nursing, Norwegian University of Science and Technology, Trondheim, Norway
- Kompetansesenteret Tannhelse Midt (TkMidt), Trondheim, Norway
| |
Collapse
|
31
|
Tournois L, Hatsch D, Ludes B, Delabarde T. Automatic detection and identification of diatoms in complex background for suspected drowning cases through object detection models. Int J Legal Med 2024; 138:659-670. [PMID: 37804333 DOI: 10.1007/s00414-023-03096-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 09/14/2023] [Indexed: 10/09/2023]
Abstract
The diagnosis of drowning is one of the most difficult tasks in forensic medicine. The diatom test is a complementary analysis method that may help the forensic pathologist in the diagnosis of drowning and the localization of the drowning site. This test consists in detecting or identifying diatoms, unicellular algae, in tissue and water samples. In order to observe diatoms under light microscopy, those samples may be digested by enzymes such as proteinase K. However, this digestion method may leave high amounts of debris, leading thus to a difficult detection and identification of diatoms. To the best of our knowledge, no model is proved to detect and identify accurately diatom species observed in highly complex backgrounds under light microscopy. Therefore, a novel method of model development for diatom detection and identification in a forensic context, based on sequential transfer learning of object detection models, is proposed in this article. The best resulting models are able to detect and identify up to 50 species of forensically relevant diatoms with an average precision and an average recall ranging from 0.7 to 1 depending on the concerned species. The models were developed by sequential transfer learning and globally outperformed those developed by traditional transfer learning. The best model of diatom species identification is expected to be used in routine at the Medicolegal Institute of Paris.
Collapse
Affiliation(s)
- Laurent Tournois
- UMR 8045 BABEL, Université Paris Cité, CNRS, 75012, Paris, France.
- BioSilicium, Riom, France.
| | | | - Bertrand Ludes
- UMR 8045 BABEL, Université Paris Cité, CNRS, 75012, Paris, France
- Institut Médico-Légal de Paris, Paris, France
| | - Tania Delabarde
- UMR 8045 BABEL, Université Paris Cité, CNRS, 75012, Paris, France
- Institut Médico-Légal de Paris, Paris, France
| |
Collapse
|
32
|
Bergman N, Yitzhaky Y, Halachmi I. Biometric identification of dairy cows via real-time facial recognition. Animal 2024; 18:101079. [PMID: 38377806 DOI: 10.1016/j.animal.2024.101079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 01/02/2024] [Accepted: 01/11/2024] [Indexed: 02/22/2024] Open
Abstract
Biometrics methods, which currently identify humans, can potentially identify dairy cows. Given that animal movements cannot be easily controlled, identification accuracy and system robustness are challenging when deploying an animal biometrics recognition system on a real farm. Our proposed method performs multiple-cow face detection and face classification from videos by adjusting recent state-of-the-art deep-learning methods. As part of this study, a system was designed and installed at four meters above a feeding zone at the Volcani Institute's dairy farm. Two datasets were acquired and annotated, one for facial detection and the second for facial classification of 77 cows. We achieved for facial detection a mean average precision (at Intersection over Union of 0.5) of 97.8% using the YOLOv5 algorithm, and facial classification accuracy of 96.3% using a Vision-Transformer model with a unique loss-function borrowed from human facial recognition. Our combined system can process video frames with 10 cows' faces, localize their faces, and correctly classify their identities in less than 20 ms per frame. Thus, up to 50 frames per second video files can be processed with our system in real-time at a dairy farm. Our method efficiently performs real-time facial detection and recognition on multiple cow faces using deep neural networks, achieving a high precision in real-time operation. These qualities can make the proposed system a valuable tool for an automatic biometric cow recognition on farms.
Collapse
Affiliation(s)
- N Bergman
- School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, 1 Ben Gurion Avenue, P.O.B. 653, Be'er Sheva 8410501, Israel; Precision Livestock Farming (PLF) Laboratory, Institute of Agricultural Engineering, Agricultural Research Organization (A.R.O.) - The Volcani Center, 68 Hamaccabim Road, P.O.B 15159, Rishon Lezion 7505101, Israel
| | - Y Yitzhaky
- School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, 1 Ben Gurion Avenue, P.O.B. 653, Be'er Sheva 8410501, Israel
| | - I Halachmi
- Precision Livestock Farming (PLF) Laboratory, Institute of Agricultural Engineering, Agricultural Research Organization (A.R.O.) - The Volcani Center, 68 Hamaccabim Road, P.O.B 15159, Rishon Lezion 7505101, Israel.
| |
Collapse
|
33
|
Cheng R, Lucyszyn S. Few-shot concealed object detection in sub-THz security images using improved pseudo-annotations. Sci Rep 2024; 14:3150. [PMID: 38326507 PMCID: PMC10850053 DOI: 10.1038/s41598-024-53045-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 01/27/2024] [Indexed: 02/09/2024] Open
Abstract
In this research, we explore the few-shot object detection application for identifying concealed objects in sub-terahertz security images, using fine-tuning based frameworks. To adapt these machine learning frameworks for the (sub-)terahertz domain, we propose an innovative pseudo-annotation method to augment the object detector by sourcing high-quality training samples from unlabeled images. This approach employs multiple one-class detectors coupled with a fine-grained classifier, trained on supporting thermal-infrared images, to prevent overfitting. Consequently, our approach enhances the model's ability to detect challenging objects (e.g., 3D-printed guns and ceramic knives) when few-shot training examples are available, especially in the real-world scenario where images of concealed dangerous items are scarce.
Collapse
Affiliation(s)
- Ran Cheng
- Department of Electrical and Electronic Engineering, Imperial College London, London, SW7 2AZ, UK
| | - Stepan Lucyszyn
- Department of Electrical and Electronic Engineering, Imperial College London, London, SW7 2AZ, UK.
| |
Collapse
|
34
|
Abu Awwad Y, Rana O, Perera C. Anomaly Detection on the Edge Using Smart Cameras under Low-Light Conditions. SENSORS (BASEL, SWITZERLAND) 2024; 24:772. [PMID: 38339490 PMCID: PMC10857634 DOI: 10.3390/s24030772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 01/10/2024] [Accepted: 01/17/2024] [Indexed: 02/12/2024]
Abstract
The number of cameras utilised in smart city domains is increasingly prominent and notable for monitoring outdoor urban and rural areas such as farms and forests to deter thefts of farming machinery and livestock, as well as monitoring workers to guarantee their safety. However, anomaly detection tasks become much more challenging in environments with low-light conditions. Consequently, achieving efficient outcomes in recognising surrounding behaviours and events becomes difficult. Therefore, this research has developed a technique to enhance images captured in poor visibility. This enhancement aims to boost object detection accuracy and mitigate false positive detections. The proposed technique consists of several stages. In the first stage, features are extracted from input images. Subsequently, a classifier assigns a unique label to indicate the optimum model among multi-enhancement networks. In addition, it can distinguish scenes captured with sufficient light from low-light ones. Finally, a detection algorithm is applied to identify objects. Each task was implemented on a separate IoT-edge device, improving detection performance on the ExDark database with a nearly one-second response time across all stages.
Collapse
Affiliation(s)
- Yaser Abu Awwad
- Department of Computer Science and Informatics, Cardiff University, Cardiff CF24 4AG, UK; (O.R.); (C.P.)
| | | | | |
Collapse
|
35
|
Olar A, Tyler T, Hoppa P, Frank E, Csabai I, Adorjan I, Pollner P. Annotated dataset for training deep learning models to detect astrocytes in human brain tissue. Sci Data 2024; 11:96. [PMID: 38242926 PMCID: PMC10798998 DOI: 10.1038/s41597-024-02908-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 12/29/2023] [Indexed: 01/21/2024] Open
Abstract
Astrocytes, a type of glial cell, significantly influence neuronal function, with variations in morphology and density linked to neurological disorders. Traditional methods for their accurate detection and density measurement are laborious and unsuited for large-scale operations. We introduce a dataset from human brain tissues stained with aldehyde dehydrogenase 1 family member L1 (ALDH1L1) and glial fibrillary acidic protein (GFAP). The digital whole slide images of these tissues were partitioned into 8730 patches of 500 × 500 pixels, comprising 2323 ALDH1L1 and 4714 GFAP patches at a pixel size of 0.5019/pixel, furthermore 1382 ADHD1L1 and 311 GFAP patches at 0.3557/pixel. Sourced from 16 slides and 8 patients our dataset promotes the development of tools for glial cell detection and quantification, offering insights into their density distribution in various brain areas, thereby broadening neuropathological study horizons. These samples hold value for automating detection methods, including deep learning. Derived from human samples, our dataset provides a platform for exploring astrocyte functionality, potentially guiding new diagnostic and treatment strategies for neurological disorders.
Collapse
Affiliation(s)
- Alex Olar
- Eötvös Loránd University, Department of Physics of Complex Systems, Budapest, Hungary
- Eötvös Loránd University, Doctoral School of Informatics, Budapest, Hungary
| | - Teadora Tyler
- Semmelweis University, Department of Anatomy, Histology and Embryology, Budapest, Hungary
| | - Paulina Hoppa
- Semmelweis University, Department of Anatomy, Histology and Embryology, Budapest, Hungary
| | - Erzsébet Frank
- Semmelweis University, Department of Anatomy, Histology and Embryology, Budapest, Hungary
| | - István Csabai
- Eötvös Loránd University, Department of Physics of Complex Systems, Budapest, Hungary
| | - Istvan Adorjan
- Semmelweis University, Department of Anatomy, Histology and Embryology, Budapest, Hungary.
| | - Péter Pollner
- Semmelweis University, Data-Driven Health Division of National Laboratory for Health Security, Health Services Management Training Centre, Budapest, Hungary.
| |
Collapse
|
36
|
Eversberg L, Lambrecht J. Combining Synthetic Images and Deep Active Learning: Data-Efficient Training of an Industrial Object Detection Model. J Imaging 2024; 10:16. [PMID: 38249001 PMCID: PMC11154516 DOI: 10.3390/jimaging10010016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 12/29/2023] [Accepted: 01/04/2024] [Indexed: 01/23/2024] Open
Abstract
Generating synthetic data is a promising solution to the challenge of limited training data for industrial deep learning applications. However, training on synthetic data and testing on real-world data creates a sim-to-real domain gap. Research has shown that the combination of synthetic and real images leads to better results than those that are generated using only one source of data. In this work, the generation of synthetic training images via physics-based rendering is combined with deep active learning for an industrial object detection task to iteratively improve model performance over time. Our experimental results show that synthetic images improve model performance, especially at the beginning of the model's life cycle with limited training data. Furthermore, our implemented hybrid query strategy selects diverse and informative new training images in each active learning cycle, which outperforms random sampling. In conclusion, this work presents a workflow to train and iteratively improve object detection models with a small number of real-world images, leading to data-efficient and cost-effective computer vision models.
Collapse
Affiliation(s)
- Leon Eversberg
- Industry Grade Networks and Clouds, Faculty IV Electrical Engineering and Computer Science, Technische Universität Berlin, Straße des 17. Juni 135, 10623 Berlin, Germany;
| | | |
Collapse
|
37
|
Yan W, Chiu B, Shen Z, Yang Q, Syer T, Min Z, Punwani S, Emberton M, Atkinson D, Barratt DC, Hu Y. Combiner and HyperCombiner networks: Rules to combine multimodality MR images for prostate cancer localisation. Med Image Anal 2024; 91:103030. [PMID: 37995627 DOI: 10.1016/j.media.2023.103030] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 09/22/2023] [Accepted: 11/13/2023] [Indexed: 11/25/2023]
Abstract
One of the distinct characteristics of radiologists reading multiparametric prostate MR scans, using reporting systems like PI-RADS v2.1, is to score individual types of MR modalities, including T2-weighted, diffusion-weighted, and dynamic contrast-enhanced, and then combine these image-modality-specific scores using standardised decision rules to predict the likelihood of clinically significant cancer. This work aims to demonstrate that it is feasible for low-dimensional parametric models to model such decision rules in the proposed Combiner networks, without compromising the accuracy of predicting radiologic labels. First, we demonstrate that either a linear mixture model or a nonlinear stacking model is sufficient to model PI-RADS decision rules for localising prostate cancer. Second, parameters of these combining models are proposed as hyperparameters, weighing independent representations of individual image modalities in the Combiner network training, as opposed to end-to-end modality ensemble. A HyperCombiner network is developed to train a single image segmentation network that can be conditioned on these hyperparameters during inference for much-improved efficiency. Experimental results based on 751 cases from 651 patients compare the proposed rule-modelling approaches with other commonly-adopted end-to-end networks, in this downstream application of automating radiologist labelling on multiparametric MR. By acquiring and interpreting the modality combining rules, specifically the linear-weights or odds ratios associated with individual image modalities, three clinical applications are quantitatively presented and contextualised in the prostate cancer segmentation application, including modality availability assessment, importance quantification and rule discovery.
Collapse
Affiliation(s)
- Wen Yan
- Department of Electrical Engineering, City University of Hong Kong, 83 Tat Chee Avenue, Hong Kong China; Centre for Medical Image Computing; Department of Medical Physics & Biomedical Engineering; Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, Gower St, WC1E 6BT, London, UK.
| | - Bernard Chiu
- Department of Electrical Engineering, City University of Hong Kong, 83 Tat Chee Avenue, Hong Kong China; Department of Physics & Computer Science, Wilfrid Laurier University, 75 University Avenue West Waterloo, Ontario N2L 3C5, Canada.
| | - Ziyi Shen
- Centre for Medical Image Computing; Department of Medical Physics & Biomedical Engineering; Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, Gower St, WC1E 6BT, London, UK.
| | - Qianye Yang
- Centre for Medical Image Computing; Department of Medical Physics & Biomedical Engineering; Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, Gower St, WC1E 6BT, London, UK.
| | - Tom Syer
- Centre for Medical Imaging, Division of Medicine, University College London, London W1 W 7TS, UK.
| | - Zhe Min
- Centre for Medical Image Computing; Department of Medical Physics & Biomedical Engineering; Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, Gower St, WC1E 6BT, London, UK.
| | - Shonit Punwani
- Centre for Medical Imaging, Division of Medicine, University College London, London W1 W 7TS, UK.
| | - Mark Emberton
- Division of Surgery & Interventional Science, University College London, Gower St, WC1E 6BT, London, UK.
| | - David Atkinson
- Centre for Medical Imaging, Division of Medicine, University College London, London W1 W 7TS, UK.
| | - Dean C Barratt
- Centre for Medical Image Computing; Department of Medical Physics & Biomedical Engineering; Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, Gower St, WC1E 6BT, London, UK.
| | - Yipeng Hu
- Centre for Medical Image Computing; Department of Medical Physics & Biomedical Engineering; Wellcome/EPSRC Centre for Interventional and Surgical Sciences, University College London, Gower St, WC1E 6BT, London, UK.
| |
Collapse
|
38
|
Kunt L, Kybic J, Nagyová V, Tichý A. Automatic caries detection in bitewing radiographs: part I-deep learning. Clin Oral Investig 2023; 27:7463-7471. [PMID: 37968358 DOI: 10.1007/s00784-023-05335-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 10/11/2023] [Indexed: 11/17/2023]
Abstract
OBJECTIVE The aim of this work was to assemble a large annotated dataset of bitewing radiographs and to use convolutional neural networks to automate the detection of dental caries in bitewing radiographs with human-level performance. MATERIALS AND METHODS A dataset of 3989 bitewing radiographs was created, and 7257 carious lesions were annotated using minimal bounding boxes. The dataset was then divided into 3 parts for the training (70%), validation (15%), and testing (15%) of multiple object detection convolutional neural networks (CNN). The tested CNN architectures included YOLOv5, Faster R-CNN, RetinaNet, and EfficientDet. To further improve the detection performance, model ensembling was used, and nested predictions were removed during post-processing. The models were compared in terms of the [Formula: see text] score and average precision (AP) with various thresholds of the intersection over union (IoU). RESULTS The twelve tested architectures had [Formula: see text] scores of 0.72-0.76. Their performance was improved by ensembling which increased the [Formula: see text] score to 0.79-0.80. The best-performing ensemble detected caries with the precision of 0.83, recall of 0.77, [Formula: see text], and AP of 0.86 at IoU=0.5. Small carious lesions were predicted with slightly lower accuracy (AP 0.82) than medium or large lesions (AP 0.88). CONCLUSIONS The trained ensemble of object detection CNNs detected caries with satisfactory accuracy and performed at least as well as experienced dentists (see companion paper, Part II). The performance on small lesions was likely limited by inconsistencies in the training dataset. CLINICAL SIGNIFICANCE Caries can be automatically detected using convolutional neural networks. However, detecting incipient carious lesions remains challenging.
Collapse
Affiliation(s)
- Lukáš Kunt
- Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, Czech Republic
| | - Jan Kybic
- Faculty of Electrical Engineering, Czech Technical University in Prague, Prague, Czech Republic.
| | - Valéria Nagyová
- Institute of Dental Medicine, First Faculty of Medicine of the Charles University and General University Hospital, Prague, Czech Republic
| | - Antonín Tichý
- Institute of Dental Medicine, First Faculty of Medicine of the Charles University and General University Hospital, Prague, Czech Republic
| |
Collapse
|
39
|
Panyarak W, Wantanajittikul K, Charuakkra A, Prapayasatok S, Suttapak W. Enhancing Caries Detection in Bitewing Radiographs Using YOLOv7. J Digit Imaging 2023; 36:2635-2647. [PMID: 37640971 PMCID: PMC10584768 DOI: 10.1007/s10278-023-00871-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 06/09/2023] [Accepted: 06/09/2023] [Indexed: 08/31/2023] Open
Abstract
The study aimed to evaluate the impact of image size, area of detection (IoU) thresholds and confidence thresholds on the performance of the YOLO models in the detection of dental caries in bitewing radiographs. A total of 2575 bitewing radiographs were annotated with seven classes according to the ICCMS™ radiographic scoring system. YOLOv3 and YOLOv7 models were employed with different configurations, and their performances were evaluated based on precision, recall, F1-score and mean average precision (mAP). Results showed that YOLOv7 with 640 × 640 pixel images exhibited significantly superior performance compared to YOLOv3 in terms of precision (0.557 vs. 0.268), F1-score (0.555 vs. 0.375) and mAP (0.562 vs. 0.458), while the recall was significantly lower (0.552 vs. 0.697). The following experiment found that the overall mAPs did not significantly differ between 640 × 640 pixel and 1280 × 1280 pixel images, for YOLOv7 with an IoU of 50% and a confidence threshold of 0.001 (p = 0.866). The last experiment revealed that the precision significantly increased from 0.570 to 0.593 for YOLOv7 with an IoU of 75% and a confidence threshold of 0.5, but the mean-recall significantly decreased and led to lower mAPs in both IoUs. In conclusion, YOLOv7 outperformed YOLOv3 in caries detection and increasing the image size did not enhance the model's performance. Elevating the IoU from 50% to 75% and confidence threshold from 0.001 to 0.5 led to a reduction of the model's performance, while simultaneously improving precision and reducing recall (minimizing false positives and negatives) for carious lesion detection in bitewing radiographs.
Collapse
Affiliation(s)
- Wannakamon Panyarak
- Division of Oral and Maxillofacial Radiology, Department of Oral Biology and Diagnostic Sciences, Faculty of Dentistry, Chiang Mai University, Suthep Road, Suthep, Mueang Chiang Mai District, Chiang Mai, 50200, Thailand
| | - Kittichai Wantanajittikul
- Department of Radiologic Technology, Faculty of Associated Medical Sciences, Chiang Mai University, Suthep Road, Suthep, Mueang Chiang Mai District, Chiang Mai, 50200, Thailand
| | - Arnon Charuakkra
- Division of Oral and Maxillofacial Radiology, Department of Oral Biology and Diagnostic Sciences, Faculty of Dentistry, Chiang Mai University, Suthep Road, Suthep, Mueang Chiang Mai District, Chiang Mai, 50200, Thailand
| | - Sangsom Prapayasatok
- Division of Oral and Maxillofacial Radiology, Department of Oral Biology and Diagnostic Sciences, Faculty of Dentistry, Chiang Mai University, Suthep Road, Suthep, Mueang Chiang Mai District, Chiang Mai, 50200, Thailand
| | - Wattanapong Suttapak
- Division of Computer Engineering, School of Information and Communication Technology, University of Phayao, Phahon Yothin Road, Mae Ka, Mueang Phayao District, Phayao, 56000, Thailand.
| |
Collapse
|
40
|
Dong G, Wang N, Xu T, Liang J, Qiao R, Yin D, Lin S. Deep Learning-Enabled Morphometric Analysis for Toxicity Screening Using Zebrafish Larvae. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:18127-18138. [PMID: 36971266 DOI: 10.1021/acs.est.3c00593] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Toxicology studies heavily rely on morphometric analysis to detect abnormalities and diagnose disease processes. The emergence of ever-increasing varieties of environmental pollutants makes it difficult to perform timely assessments, especially using in vivo models. Herein, we propose a deep learning-based morphometric analysis (DLMA) to quantitatively identify eight abnormal phenotypes (head hemorrhage, jaw malformation, uninflated swim bladder, pericardial edema, yolk edema, bent spine, dead, unhatched) and eight vital organ features (eye, head, jaw, heart, yolk, swim bladder, body length, and curvature) of zebrafish larvae. A data set composed of 2532 bright-field micrographs of zebrafish larvae at 120 h post fertilization was generated from toxicity screening of three categories of chemicals, i.e., endocrine disruptors (perfluorooctanesulfonate and bisphenol A), heavy metals (CdCl2 and PbI2), and emerging organic pollutants (acetaminophen, 2,7-dibromocarbazole, 3-monobromocarbazo, 3,6-dibromocarbazole, and 1,3,6,8-tetrabromocarbazo). Two typical deep learning models, one-stage and two-stage models (TensorMask, Mask R-CNN), were trained to implement phenotypic feature classification and segmentation. The accuracy was statistically validated with a mean average precision >0.93 in unlabeled data sets and a mean accuracy >0.86 in previously published data sets. Such a method effectively enables subjective morphometric analysis of zebrafish larvae to achieve efficient hazard identification of both chemicals and environmental pollutants.
Collapse
Affiliation(s)
- Gongqing Dong
- College of Environmental Science and Engineering, Biomedical Multidisciplinary Innovation Research Institute, Shanghai East Hospital, Tongji University, Shanghai 200092, China
- Key Laboratory of Yangtze River Water Environment, Shanghai Institute of Pollution Control and Ecological Security, Tongji University, Shanghai 200092, China
| | - Nan Wang
- College of Environmental Science and Engineering, Biomedical Multidisciplinary Innovation Research Institute, Shanghai East Hospital, Tongji University, Shanghai 200092, China
- Key Laboratory of Yangtze River Water Environment, Shanghai Institute of Pollution Control and Ecological Security, Tongji University, Shanghai 200092, China
| | - Ting Xu
- College of Environmental Science and Engineering, Biomedical Multidisciplinary Innovation Research Institute, Shanghai East Hospital, Tongji University, Shanghai 200092, China
- Key Laboratory of Yangtze River Water Environment, Shanghai Institute of Pollution Control and Ecological Security, Tongji University, Shanghai 200092, China
| | - Jingyu Liang
- College of Environmental Science and Engineering, Biomedical Multidisciplinary Innovation Research Institute, Shanghai East Hospital, Tongji University, Shanghai 200092, China
- Key Laboratory of Yangtze River Water Environment, Shanghai Institute of Pollution Control and Ecological Security, Tongji University, Shanghai 200092, China
| | - Ruxia Qiao
- College of Environmental Science and Engineering, Biomedical Multidisciplinary Innovation Research Institute, Shanghai East Hospital, Tongji University, Shanghai 200092, China
- Key Laboratory of Yangtze River Water Environment, Shanghai Institute of Pollution Control and Ecological Security, Tongji University, Shanghai 200092, China
| | - Daqiang Yin
- College of Environmental Science and Engineering, Biomedical Multidisciplinary Innovation Research Institute, Shanghai East Hospital, Tongji University, Shanghai 200092, China
- Key Laboratory of Yangtze River Water Environment, Shanghai Institute of Pollution Control and Ecological Security, Tongji University, Shanghai 200092, China
| | - Sijie Lin
- College of Environmental Science and Engineering, Biomedical Multidisciplinary Innovation Research Institute, Shanghai East Hospital, Tongji University, Shanghai 200092, China
- Key Laboratory of Yangtze River Water Environment, Shanghai Institute of Pollution Control and Ecological Security, Tongji University, Shanghai 200092, China
| |
Collapse
|
41
|
Shan C, Liu H, Yu Y. Research on improved algorithm for helmet detection based on YOLOv5. Sci Rep 2023; 13:18056. [PMID: 37872253 PMCID: PMC10593779 DOI: 10.1038/s41598-023-45383-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 10/19/2023] [Indexed: 10/25/2023] Open
Abstract
The continuous development of smart industrial parks has imposed increasingly stringent requirements on safety helmet detection in environments such as factories, construction sites, rail transit, and fire protection. Current models often suffer from issues like false alarms or missed detections, especially when dealing with small and densely packed targets. This study aims to enhance the YOLOv5 target detection method to provide real-time alerts for individuals not wearing safety helmets in complex scenarios. Our approach involves incorporating the ECA channel attention mechanism into the YOLOv5 backbone network, allowing for efficient feature extraction while reducing computational load. We adopt a weighted bi-directional feature pyramid network structure (BiFPN) to facilitate effective feature fusion and cross-scale information transmission. Additionally, the introduction of a decoupling head in YOLOv5 improves detection performance and convergence rate. The experimental results demonstrate a substantial improvement in the YOLOv5 model's performance. The enhanced YOLOv5 model achieved an average accuracy of 95.9% on a custom-made helmet dataset, a 3.0 percentage point increase compared to the original YOLOv5 model. This study holds significant implications for enhancing the accuracy and robustness of helmet-wearing detection in various settings.
Collapse
Affiliation(s)
- Chun Shan
- Guangdong Polytechnic Normal University, Guangzhou, China.
- Guangzhou University, Guangzhou, China.
| | - HongMing Liu
- Guangdong Polytechnic Normal University, Guangzhou, China
| | - Yu Yu
- Guangdong Polytechnic Normal University, Guangzhou, China
| |
Collapse
|
42
|
Vayssade JA, Arquet R, Troupe W, Bonneau M. CherryChèvre: A fine-grained dataset for goat detection in natural environments. Sci Data 2023; 10:689. [PMID: 37821512 PMCID: PMC10567779 DOI: 10.1038/s41597-023-02555-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Accepted: 09/08/2023] [Indexed: 10/13/2023] Open
Abstract
We introduce a new dataset for goat detection that contains 6160 annotated images captured under varying environmental conditions. The dataset is intended for developing machine learning algorithms for goat detection, with applications in precision agriculture, animal welfare, behaviour analysis, and animal husbandry. The annotations were performed by expert in computer vision, ensuring high accuracy and consistency. The dataset is publicly available and can be used as a benchmark for evaluating existing algorithms. This dataset advances research in computer vision for agriculture.
Collapse
Affiliation(s)
| | - Rémy Arquet
- INRAe - UE PTEA, 97170 Petit-Bourg, Guadeloupe
| | | | - Mathieu Bonneau
- INRAe - ASSET, Animal Genetic, 97170 Petit-Bourg, Guadeloupe.
| |
Collapse
|
43
|
García-Garví A, Layana-Castro PE, Puchalt JC, Sánchez-Salmerón AJ. Automation of Caenorhabditis elegans lifespan assay using a simplified domain synthetic image-based neural network training strategy. Comput Struct Biotechnol J 2023; 21:5049-5065. [PMID: 37867965 PMCID: PMC10589381 DOI: 10.1016/j.csbj.2023.10.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 10/04/2023] [Accepted: 10/04/2023] [Indexed: 10/24/2023] Open
Abstract
Performing lifespan assays with Caenorhabditis elegans (C. elegans) nematodes manually is a time consuming and laborious task. Therefore, automation is necessary to increase productivity. In this paper, we propose a method to automate the counting of live C. elegans using deep learning. The survival curves of the experiment are obtained using a sequence formed by an image taken on each day of the assay. Solving this problem would require a very large labeled dataset; thus, to facilitate its generation, we propose a simplified image-based strategy. This simplification consists of transforming the real images of the nematodes in the Petri dish to a synthetic image, in which circular blobs are drawn on a constant background to mark the position of the C. elegans. To apply this simplification method, it is divided into two steps. First, a Faster R-CNN network detects the C. elegans, allowing its transformation into a synthetic image. Second, using the simplified image sequence as input, a regression neural network is in charge of predicting the count of live nematodes on each day of the experiment. In this way, the counting network was trained using a simple simulator, avoiding labeling a very large real dataset or developing a realistic simulator. Results showed that the differences between the curves obtained by the proposed method and the manual curves are not statistically significant for either short-lived N2 (p-value log rank test 0.45) or long-lived daf-2 (p-value log rank test 0.83) strains.
Collapse
Affiliation(s)
- Antonio García-Garví
- Instituto de Automática e Informática Industrial, Universitat Politècnica de València, Camino de Vera S/N, Valencia, 46022, Spain
| | - Pablo E. Layana-Castro
- Instituto de Automática e Informática Industrial, Universitat Politècnica de València, Camino de Vera S/N, Valencia, 46022, Spain
| | - Joan Carles Puchalt
- Instituto de Automática e Informática Industrial, Universitat Politècnica de València, Camino de Vera S/N, Valencia, 46022, Spain
| | - Antonio-José Sánchez-Salmerón
- Instituto de Automática e Informática Industrial, Universitat Politècnica de València, Camino de Vera S/N, Valencia, 46022, Spain
| |
Collapse
|
44
|
Ali MA, Fujita D, Kobashi S. Teeth and prostheses detection in dental panoramic X-rays using CNN-based object detector and a priori knowledge-based algorithm. Sci Rep 2023; 13:16542. [PMID: 37783773 PMCID: PMC10545749 DOI: 10.1038/s41598-023-43591-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 09/26/2023] [Indexed: 10/04/2023] Open
Abstract
Deep learning techniques for automatically detecting teeth in dental X-rays have gained popularity, providing valuable assistance to healthcare professionals. However, teeth detection in X-ray images is often hindered by alterations in tooth appearance caused by dental prostheses. To address this challenge, our paper proposes a novel method for teeth detection and numbering in dental panoramic X-rays, leveraging two separate CNN-based object detectors, namely YOLOv7, for detecting teeth and prostheses, alongside an optimization algorithm to refine the outcomes. The study utilizes a dataset of 3138 radiographs, of which 2553 images contain prostheses, to build a robust model. The tooth and prosthesis detection algorithms perform excellently, achieving mean average precisions of 0.982 and 0.983, respectively. Additionally, the trained tooth detection model is verified using an external dataset, and six-fold cross-validation is conducted to demonstrate the proposed method's feasibility and robustness. Moreover, the investigation of performance improvement resulting from the inclusion of prosthesis information in the teeth detection process reveals a marginal increase in the average F1-score, rising from 0.985 to 0.987 compared to the sole teeth detection method. The proposed method is unique in its approach to numbering teeth as it incorporates prosthesis information and considers complete restorations such as dental implants and dentures of fixed bridges during the teeth enumeration process, which follows the universal tooth numbering system. These advancements hold promise for automating dental charting processes.
Collapse
Affiliation(s)
- Md Anas Ali
- Graduate School of Engineering, University of Hyogo, Himeji, Japan.
| | - Daisuke Fujita
- Graduate School of Engineering, University of Hyogo, Himeji, Japan
| | - Syoji Kobashi
- Graduate School of Engineering, University of Hyogo, Himeji, Japan
| |
Collapse
|
45
|
Badeka E, Karapatzak E, Karampatea A, Bouloumpasi E, Kalathas I, Lytridis C, Tziolas E, Tsakalidou VN, Kaburlasos VG. A Deep Learning Approach for Precision Viticulture, Assessing Grape Maturity via YOLOv7. SENSORS (BASEL, SWITZERLAND) 2023; 23:8126. [PMID: 37836956 PMCID: PMC10575379 DOI: 10.3390/s23198126] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 09/19/2023] [Accepted: 09/25/2023] [Indexed: 10/15/2023]
Abstract
In the viticulture sector, robots are being employed more frequently to increase productivity and accuracy in operations such as vineyard mapping, pruning, and harvesting, especially in locations where human labor is in short supply or expensive. This paper presents the development of an algorithm for grape maturity estimation in the framework of vineyard management. An object detection algorithm is proposed based on You Only Look Once (YOLO) v7 and its extensions in order to detect grape maturity in a white variety of grape (Assyrtiko grape variety). The proposed algorithm was trained using images received over a period of six weeks from grapevines in Drama, Greece. Tests on high-quality images have demonstrated that the detection of five grape maturity stages is possible. Furthermore, the proposed approach has been compared against alternative object detection algorithms. The results showed that YOLO v7 outperforms other architectures both in precision and accuracy. This work paves the way for the development of an autonomous robot for grapevine management.
Collapse
Affiliation(s)
- Eftichia Badeka
- Human-Machines Interaction Laboratory (HUMAIN-Lab), Department of Computer Science, International Hellenic University (IHU), 65404 Kavala, Greece; (E.B.); (I.K.); (C.L.); (E.T.); (V.N.T.)
| | - Eleftherios Karapatzak
- Department of Agricultural Biotechnology and Oenology, International Hellenic University, 66100 Drama, Greece; (E.K.); (A.K.); (E.B.)
| | - Aikaterini Karampatea
- Department of Agricultural Biotechnology and Oenology, International Hellenic University, 66100 Drama, Greece; (E.K.); (A.K.); (E.B.)
| | - Elisavet Bouloumpasi
- Department of Agricultural Biotechnology and Oenology, International Hellenic University, 66100 Drama, Greece; (E.K.); (A.K.); (E.B.)
| | - Ioannis Kalathas
- Human-Machines Interaction Laboratory (HUMAIN-Lab), Department of Computer Science, International Hellenic University (IHU), 65404 Kavala, Greece; (E.B.); (I.K.); (C.L.); (E.T.); (V.N.T.)
| | - Chris Lytridis
- Human-Machines Interaction Laboratory (HUMAIN-Lab), Department of Computer Science, International Hellenic University (IHU), 65404 Kavala, Greece; (E.B.); (I.K.); (C.L.); (E.T.); (V.N.T.)
| | - Emmanouil Tziolas
- Human-Machines Interaction Laboratory (HUMAIN-Lab), Department of Computer Science, International Hellenic University (IHU), 65404 Kavala, Greece; (E.B.); (I.K.); (C.L.); (E.T.); (V.N.T.)
| | - Viktoria Nikoleta Tsakalidou
- Human-Machines Interaction Laboratory (HUMAIN-Lab), Department of Computer Science, International Hellenic University (IHU), 65404 Kavala, Greece; (E.B.); (I.K.); (C.L.); (E.T.); (V.N.T.)
| | - Vassilis G. Kaburlasos
- Human-Machines Interaction Laboratory (HUMAIN-Lab), Department of Computer Science, International Hellenic University (IHU), 65404 Kavala, Greece; (E.B.); (I.K.); (C.L.); (E.T.); (V.N.T.)
| |
Collapse
|
46
|
Gregory Dal Toé S, Neal M, Hold N, Heney C, Turner R, McCoy E, Iftikhar M, Tiddeman B. Automated Video-Based Capture of Crustacean Fisheries Data Using Low-Power Hardware. SENSORS (BASEL, SWITZERLAND) 2023; 23:7897. [PMID: 37765954 PMCID: PMC10535158 DOI: 10.3390/s23187897] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 09/03/2023] [Accepted: 09/05/2023] [Indexed: 09/29/2023]
Abstract
This work investigates the application of Computer Vision to the problem of the automated counting and measuring of crabs and lobsters onboard fishing boats. The aim is to provide catch count and measurement data for these key commercial crustacean species. This can provide vital input data for stock assessment models, to enable the sustainable management of these species. The hardware system is required to be low-cost, have low-power usage, be waterproof, available (given current chip shortages), and able to avoid over-heating. The selected hardware is based on a Raspberry Pi 3A+ contained in a custom waterproof housing. This hardware places challenging limitations on the options for processing the incoming video, with many popular deep learning frameworks (even light-weight versions) unable to load or run given the limited computational resources. The problem can be broken into several steps: (1) Identifying the portions of the video that contain each individual animal; (2) Selecting a set of representative frames for each animal, e.g, lobsters must be viewed from the top and underside; (3) Detecting the animal within the frame so that the image can be cropped to the region of interest; (4) Detecting keypoints on each animal; and (5) Inferring measurements from the keypoint data. In this work, we develop a pipeline that addresses these steps, including a key novel solution to frame selection in video streams that uses classification, temporal segmentation, smoothing techniques and frame quality estimation. The developed pipeline is able to operate on the target low-power hardware and the experiments show that, given sufficient training data, reasonable performance is achieved.
Collapse
Affiliation(s)
- Sebastian Gregory Dal Toé
- Department of Computer Science, Aberystwyth University, Aberystwyth SY23 3DB, Ceredigion, UK; (S.G.D.T.); (M.I.)
| | - Marie Neal
- Ystumtec Ltd., Pant-Y-Chwarel, Ystumtuen, Aberystwyth SY23 3AF, Ceredigion, UK;
| | - Natalie Hold
- School of Ocean Sciences, Bangor University, Bangor LL57 2DG, Gwynedd, UK; (N.H.); (C.H.); (R.T.); (E.M.)
| | - Charlotte Heney
- School of Ocean Sciences, Bangor University, Bangor LL57 2DG, Gwynedd, UK; (N.H.); (C.H.); (R.T.); (E.M.)
| | - Rebecca Turner
- School of Ocean Sciences, Bangor University, Bangor LL57 2DG, Gwynedd, UK; (N.H.); (C.H.); (R.T.); (E.M.)
| | - Emer McCoy
- School of Ocean Sciences, Bangor University, Bangor LL57 2DG, Gwynedd, UK; (N.H.); (C.H.); (R.T.); (E.M.)
| | - Muhammad Iftikhar
- Department of Computer Science, Aberystwyth University, Aberystwyth SY23 3DB, Ceredigion, UK; (S.G.D.T.); (M.I.)
| | - Bernard Tiddeman
- Department of Computer Science, Aberystwyth University, Aberystwyth SY23 3DB, Ceredigion, UK; (S.G.D.T.); (M.I.)
| |
Collapse
|
47
|
Gosiewska A, Baran Z, Baran M, Rutkowski T. Seeking a Sufficient Data Volume for Railway Infrastructure Component Detection with Computer Vision Models. SENSORS (BASEL, SWITZERLAND) 2023; 23:7776. [PMID: 37765832 PMCID: PMC10538059 DOI: 10.3390/s23187776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 08/29/2023] [Accepted: 09/05/2023] [Indexed: 09/29/2023]
Abstract
Railway infrastructure monitoring is crucial for transportation reliability and travelers' safety. However, it requires plenty of human resources that generate high costs and is limited to the efficiency of the human eye. Integrating machine learning into the railway monitoring process can overcome these problems. Since advanced algorithms perform equally to humans in many tasks, they can provide a faster, cost-effective, and reproducible evaluation of the infrastructure. The main issue with this approach is that training machine learning models involves acquiring a large amount of labeled data, which is unavailable for rail infrastructure. We trained YOLOv5 and MobileNet architectures to meet this challenge in low-data-volume scenarios. We established that 120 observations are enough to train an accurate model for the object-detection task for railway infrastructure. Moreover, we proposed a novel method for extracting background images from railway images. To test our method, we compared the performance of YOLOv5 and MobileNet on small datasets with and without background extraction. The results of the experiments show that background extraction reduces the sufficient data volume to 90 observations.
Collapse
|
48
|
Jiang M, Yuan B, Kou W, Yan W, Marshall H, Yang Q, Syer T, Punwani S, Emberton M, Barratt DC, Cho CCM, Hu Y, Chiu B. Prostate cancer segmentation from MRI by a multistream fusion encoder. Med Phys 2023; 50:5489-5504. [PMID: 36938883 DOI: 10.1002/mp.16374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 02/15/2023] [Accepted: 03/03/2023] [Indexed: 03/21/2023] Open
Abstract
BACKGROUND Targeted prostate biopsy guided by multiparametric magnetic resonance imaging (mpMRI) detects more clinically significant lesions than conventional systemic biopsy. Lesion segmentation is required for planning MRI-targeted biopsies. The requirement for integrating image features available in T2-weighted and diffusion-weighted images poses a challenge in prostate lesion segmentation from mpMRI. PURPOSE A flexible and efficient multistream fusion encoder is proposed in this work to facilitate the multiscale fusion of features from multiple imaging streams. A patch-based loss function is introduced to improve the accuracy in segmenting small lesions. METHODS The proposed multistream encoder fuses features extracted in the three imaging streams at each layer of the network, thereby allowing improved feature maps to propagate downstream and benefit segmentation performance. The fusion is achieved through a spatial attention map generated by optimally weighting the contribution of the convolution outputs from each stream. This design provides flexibility for the network to highlight image modalities according to their relative influence on the segmentation performance. The encoder also performs multiscale integration by highlighting the input feature maps (low-level features) with the spatial attention maps generated from convolution outputs (high-level features). The Dice similarity coefficient (DSC), serving as a cost function, is less sensitive to incorrect segmentation for small lesions. We address this issue by introducing a patch-based loss function that provides an average of the DSCs obtained from local image patches. This local average DSC is equally sensitive to large and small lesions, as the patch-based DSCs associated with small and large lesions have equal weights in this average DSC. RESULTS The framework was evaluated in 931 sets of images acquired in several clinical studies at two centers in Hong Kong and the United Kingdom. In particular, the training, validation, and test sets contain 615, 144, and 172 sets of images, respectively. The proposed framework outperformed single-stream networks and three recently proposed multistream networks, attaining F1 scores of 82.2 and 87.6% in the lesion and patient levels, respectively. The average inference time for an axial image was 11.8 ms. CONCLUSION The accuracy and efficiency afforded by the proposed framework would accelerate the MRI interpretation workflow of MRI-targeted biopsy and focal therapies.
Collapse
Affiliation(s)
- Mingjie Jiang
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong SAR, China
| | - Baohua Yuan
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong SAR, China
- Aliyun School of Big Data, Changzhou University, Changzhou, China
| | - Weixuan Kou
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong SAR, China
| | - Wen Yan
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong SAR, China
- Centre for Medical Image Computing, Wellcome/EPSRC Centre for Interventional & Surgical Sciences, Department of Medical Physics & Biomedical Engineering, University College London, London, UK
| | - Harry Marshall
- Schulich School of Medicine & Dentistry, Western University, Ontario, Canada
| | - Qianye Yang
- Centre for Medical Image Computing, Wellcome/EPSRC Centre for Interventional & Surgical Sciences, Department of Medical Physics & Biomedical Engineering, University College London, London, UK
| | - Tom Syer
- Centre for Medical Imaging, University College London, London, UK
| | - Shonit Punwani
- Centre for Medical Imaging, University College London, London, UK
| | - Mark Emberton
- Division of Surgery & Interventional Science, University College London, London, UK
| | - Dean C Barratt
- Centre for Medical Image Computing, Wellcome/EPSRC Centre for Interventional & Surgical Sciences, Department of Medical Physics & Biomedical Engineering, University College London, London, UK
| | - Carmen C M Cho
- Prince of Wales Hospital and Department of Imaging and Intervention Radiology, Chinese University of Hong Kong, Hong Kong SAR, China
| | - Yipeng Hu
- Centre for Medical Image Computing, Wellcome/EPSRC Centre for Interventional & Surgical Sciences, Department of Medical Physics & Biomedical Engineering, University College London, London, UK
| | - Bernard Chiu
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
49
|
Zhu H, Wu H, Wang X, He D, Liu Z, Pan X. DPACFuse: Dual-Branch Progressive Learning for Infrared and Visible Image Fusion with Complementary Self-Attention and Convolution. SENSORS (BASEL, SWITZERLAND) 2023; 23:7205. [PMID: 37631742 PMCID: PMC10458385 DOI: 10.3390/s23167205] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 08/03/2023] [Accepted: 08/04/2023] [Indexed: 08/27/2023]
Abstract
Infrared and visible image fusion aims to generate a single fused image that not only contains rich texture details and salient objects, but also facilitates downstream tasks. However, existing works mainly focus on learning different modality-specific or shared features, and ignore the importance of modeling cross-modality features. To address these challenges, we propose Dual-branch Progressive learning for infrared and visible image fusion with a complementary self-Attention and Convolution (DPACFuse) network. On the one hand, we propose Cross-Modality Feature Extraction (CMEF) to enhance information interaction and the extraction of common features across modalities. In addition, we introduce a high-frequency gradient convolution operation to extract fine-grained information and suppress high-frequency information loss. On the other hand, to alleviate the CNN issues of insufficient global information extraction and computation overheads of self-attention, we introduce the ACmix, which can fully extract local and global information in the source image with a smaller computational overhead than pure convolution or pure self-attention. Extensive experiments demonstrated that the fused images generated by DPACFuse not only contain rich texture information, but can also effectively highlight salient objects. Additionally, our method achieved approximately 3% improvement over the state-of-the-art methods in MI, Qabf, SF, and AG evaluation indicators. More importantly, our fused images enhanced object detection and semantic segmentation by approximately 10%, compared to using infrared and visible images separately.
Collapse
Affiliation(s)
- Huayi Zhu
- School of Computer and Information Security, Guilin University of Electronic Science and Technology, Guilin 541004, China
| | - Heshan Wu
- School of Computer and Information Security, Guilin University of Electronic Science and Technology, Guilin 541004, China
| | - Xiaolong Wang
- School of Computer and Information Security, Guilin University of Electronic Science and Technology, Guilin 541004, China
| | - Dongmei He
- School of Computer and Information Security, Guilin University of Electronic Science and Technology, Guilin 541004, China
| | - Zhenbing Liu
- School of Artificial Intelligence, Guilin University of Electronic Science and Technology, Guilin 541004, China
| | - Xipeng Pan
- School of Computer and Information Security, Guilin University of Electronic Science and Technology, Guilin 541004, China
| |
Collapse
|
50
|
Campos RL, Yoon SC, Chung S, Bhandarkar SM. Semisupervised Deep Learning for the Detection of Foreign Materials on Poultry Meat with Near-Infrared Hyperspectral Imaging. SENSORS (BASEL, SWITZERLAND) 2023; 23:7014. [PMID: 37631551 PMCID: PMC10459470 DOI: 10.3390/s23167014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 07/28/2023] [Accepted: 08/04/2023] [Indexed: 08/27/2023]
Abstract
A novel semisupervised hyperspectral imaging technique was developed to detect foreign materials (FMs) on raw poultry meat. Combining hyperspectral imaging and deep learning has shown promise in identifying food safety and quality attributes. However, the challenge lies in acquiring a large amount of accurately annotated/labeled data for model training. This paper proposes a novel semisupervised hyperspectral deep learning model based on a generative adversarial network, utilizing an improved 1D U-Net as its discriminator, to detect FMs on raw chicken breast fillets. The model was trained by using approximately 879,000 spectral responses from hyperspectral images of clean chicken breast fillets in the near-infrared wavelength range of 1000-1700 nm. Testing involved 30 different types of FMs commonly found in processing plants, prepared in two nominal sizes: 2 × 2 mm2 and 5 × 5 mm2. The FM-detection technique achieved impressive results at both the spectral pixel level and the foreign material object level. At the spectral pixel level, the model achieved a precision of 100%, a recall of over 93%, an F1 score of 96.8%, and a balanced accuracy of 96.9%. When combining the rich 1D spectral data with 2D spatial information, the FM-detection accuracy at the object level reached 96.5%. In summary, the impressive results obtained through this study demonstrate its effectiveness at accurately identifying and localizing FMs. Furthermore, the technique's potential for generalization and application to other agriculture and food-related domains highlights its broader significance.
Collapse
Affiliation(s)
| | - Seung-Chul Yoon
- U.S. National Poultry Research Center, Agricultural Research Service, U.S. Department of Agriculture, Athens, GA 30605, USA
| | - Soo Chung
- Department of Biosystems Engineering, Integrated Major in Global Smart Farm, Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul 08826, Republic of Korea;
| | | |
Collapse
|