1
|
Watabe H, Yu PKN, Krstic D, Nikezic D, Kim KM, Yamaya T, Kawachi N, Tanaka H, Jovanovic Z, Haque AKF, Islam MR, Tse G, Lee Q, Beni MS. RAPTOR-AI: An open-source AI powered radiation protection toolkit for radioisotopes. Appl Radiat Isot 2025; 221:111797. [PMID: 40184907 DOI: 10.1016/j.apradiso.2025.111797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2024] [Revised: 03/03/2025] [Accepted: 03/18/2025] [Indexed: 04/07/2025]
Abstract
Artificial intelligence (AI) has gained significant attention in various scientific fields due to its ability to process large datasets. In nuclear radiation physics, while AI presents exciting opportunities, it cannot replace physics-based models essential for explaining radiation interactions with matter. To combine the strengths of both, we have developed and open-sourced the Radiation Protection Toolkit for Radioisotopes with Artificial Intelligence (RAPTOR-AI). This toolkit integrates AI with the Particle and Heavy Ion Transport code System (PHITS) Monte Carlo package, enabling rapid radiation protection analysis for radioisotopes and structural shielding. RAPTOR-AI is particularly valuable for emergency scenarios, allowing quick dose dispersion assessments when a facility's structural map is available, enhancing safety and response efficiency.
Collapse
Affiliation(s)
- Hiroshi Watabe
- Division of Radiation Protection and Nuclear Safety Research Center for Accelerator and Radioisotope Science, Tohoku University, 6-3 Aoba, Aramaki, Aoba, Sendai, 980-8578 Miyagi, Japan
| | - Peter K N Yu
- Department of Physics, City University of Hong Kong, Tat Chee Avenue, Kowloon Tong, Hong Kong, China
| | - Dragana Krstic
- Faculty of Science, University of Kragujevac, R. Domanovica 12, 34000, Kragujevac, Serbia
| | - Dragoslav Nikezic
- Faculty of Science, University of Kragujevac, R. Domanovica 12, 34000, Kragujevac, Serbia
| | - Kyeong Min Kim
- Korea Institute of Radiological & Medical Sciences, 75, Nowon-ro, Nowon-gu, Seoul, Republic of Korea
| | - Taiga Yamaya
- National Institutes for Quantum Science and Technology, Anagawa 4-9-1, Inage-ku, Chiba-shi, Chiba 263-8555, Japan
| | - Naoki Kawachi
- National Institutes for Quantum Science and Technology, 1233 Watanuki, Takasaki, Gunma 370 1292, Japan
| | - Hiroki Tanaka
- Institute for Integrated Radiation and Nuclear Science, Kyoto University, 2-1010 Asashiro-Nishi, Kumatori-cho, Sennan-gun, Osaka 590-0494, Japan
| | - Zoran Jovanovic
- State University of Novi Pazar Department of Natural Science and Mathematics, Vuka Karadzica 9, 36300, Novi Pazar, Serbia
| | - A K F Haque
- Atomic and Molecular Physics Laboratory, Department of Physics, University of Rajshahi, Rajshahi 6205, Bangladesh
| | - M Rafiqul Islam
- Institute of Nuclear Medical Physics, AERE, Bangladesh Atomic Energy Commission, Dhaka, 1349, Bangladesh
| | - Gary Tse
- School of Nursing and Health Studies, Hong Kong Metropolitan University, Homantin, Kowloon, Hong Kong, China
| | - Quinncy Lee
- The Institute of Applied Health Sciences, The School of Medicine, Medical Sciences, and Nutrition, University of Aberdeen, Aberdeen, United Kingdom; Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Mehrdad Shahmohammadi Beni
- Division of Radiation Protection and Nuclear Safety Research Center for Accelerator and Radioisotope Science, Tohoku University, 6-3 Aoba, Aramaki, Aoba, Sendai, 980-8578 Miyagi, Japan; Department of Physics, City University of Hong Kong, Tat Chee Avenue, Kowloon Tong, Hong Kong, China; School of Nursing and Health Studies, Hong Kong Metropolitan University, Homantin, Kowloon, Hong Kong, China.
| |
Collapse
|
2
|
Wang XZ, Yang DH, Yan ZP, You XD, Yin XY, Chen Y, Wang T, Wu HL, Yu RQ. Ultrafast on-site adulteration detection and quantification in Asian black truffle using smartphone-based computer vision. Talanta 2025; 288:127743. [PMID: 39965382 DOI: 10.1016/j.talanta.2025.127743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2025] [Revised: 02/10/2025] [Accepted: 02/12/2025] [Indexed: 02/20/2025]
Abstract
Asian black truffle Tuber sinense (BT) is a premium edible fungus with medicinal value, but it is often prone to adulteration. This study aims to develop a fast, non-destructive, automatic, and intelligent method for identifying BT. A novel lightweight convolutional neural network model incorporates knowledge distillation (FastBTNet) to improve model efficiency on smartphones while maintaining higher performance. The well-trained model coupled with a fast object location technique was further employed for the absolute quantification of adulteration in BT. Results showed that FastBTNet achieved 99.0 % classification accuracy, 8.5 % root mean squared error in predicting adulteration levels, and 5.3 s for predicting 1024 samples. Additionally, Grad-CAM was used to investigate the models' recognition mechanism, and this strategy received a perfect score in the greenness assessment. These methods were deployed in a smartphone app, "Truffle Identifier," which enables ultrafast on-site identification of a batch of samples and assists in predicting adulteration levels.
Collapse
Affiliation(s)
- Xiao-Zhi Wang
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha, 410082, China
| | - De-Huan Yang
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha, 410082, China
| | - Zhan-Peng Yan
- College of Artificial Intelligence, Changsha NanFang Professional College, Changsha, 410208, China
| | - Xu-Dong You
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha, 410082, China
| | - Xiao-Yue Yin
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha, 410082, China
| | - Yao Chen
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha, 410082, China; Hunan Key Lab of Biomedical Materials and Devices, College of Life Sciences and Chemistry, Hunan University of Technology, Zhuzhou, 412007, China.
| | - Tong Wang
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha, 410082, China.
| | - Hai-Long Wu
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha, 410082, China
| | - Ru-Qin Yu
- State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha, 410082, China
| |
Collapse
|
3
|
Sarmet M, Kaczmarek E, Fauveau A, Steer K, Velasco AA, Smith A, Kennedy M, Shideler H, Wallace S, Stroud T, Blilie M, Mayerl CJ. A Machine Learning Pipeline for Automated Bolus Segmentation and Area Measurement in Swallowing Videofluoroscopy Images of an Infant Pig Model. Dysphagia 2025:10.1007/s00455-025-10829-z. [PMID: 40293507 DOI: 10.1007/s00455-025-10829-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 04/09/2025] [Indexed: 04/30/2025]
Abstract
Feeding efficiency and safety are often driven by bolus volume, which is one of the most common clinical measures of assessing swallow performance. However, manual measurement of bolus area is time-consuming and suffers from high levels of inter-rater variability. This study proposes a machine learning (ML) pipeline using ilastik, an accessible bioimage analysis tool, to automate the measurement of bolus area during swallowing. The pipeline was tested on 336 swallows from videofluoroscopic recordings of 8 infant pigs during bottle feeding. Eight trained raters manually measured bolus area in ImageJ and also used ilastik's autocontext pixel-level labeling and object classification tools to train ML models for automated bolus segmentation and area calculation. The ML pipeline trained in 1h42min and processed the dataset in 2 min 48s, a 97% time saving compared to manual methods. The model exhibited strong performance, achieving a high Dice Similarity Coefficient (0.84), Intersection over Union (0.76), and inter-rater reliability (intraclass correlation coefficient = 0.79). The bolus areas from the two methods were highly correlated (R² = 0.74 overall, 0.78 without bubbles, 0.67 with bubbles), with no significant difference in measured bolus area between the methods. Our ML pipeline, requiring no ML expertise, offers a reliable and efficient method for automatically measuring bolus area. While human confirmation remains valuable, this pipeline accelerates analysis and improves reproducibility compared to manual methods. Future refinements can further enhance precision and broaden its application in dysphagia research.
Collapse
Affiliation(s)
- Max Sarmet
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, 86011, USA.
- Graduate Department of Health Science and Technology, University of Brasilia, Brasilia, 70910-900, Brazil.
| | - Elska Kaczmarek
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, 86011, USA
| | - Alexane Fauveau
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, 86011, USA
| | - Kendall Steer
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, 86011, USA
| | - Alex-Ann Velasco
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, 86011, USA
| | - Ani Smith
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, 86011, USA
| | - Maressa Kennedy
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, 86011, USA
| | - Hannah Shideler
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, 86011, USA
| | - Skyler Wallace
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, 86011, USA
| | - Thomas Stroud
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, 86011, USA
| | - Morgan Blilie
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, 86011, USA
| | - Christopher J Mayerl
- Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ, 86011, USA
| |
Collapse
|
4
|
Zheng C, Lu J, Hu K, Xiang Q, Miao L. A ternary encoding network fusing scale awareness and large kernel attention for camouflaged object detection. Sci Rep 2025; 15:14345. [PMID: 40274958 PMCID: PMC12022351 DOI: 10.1038/s41598-025-97857-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2025] [Accepted: 04/08/2025] [Indexed: 04/26/2025] Open
Abstract
To address the issues of structural information loss and object occlusion arising from existing camouflaged object detection methods in handling complex situations, we propose a novel network that integrates scale awareness and enhanced large kernel attention (SALK-Net). Specifically, our network takes ternary images as input to mine the additional information contained at different scales. Firstly, we use a shared feature encoder to extract features and align channels from multi-scale input images. Secondly, enhanced large kernel attention is introduced to guide the fusion of scale features, which aims to fully perceive global semantic information and minimize the loss of valuable clues. Thirdly, in the designed hybrid-scale mixed-scale decoder, we adopt a progressive structure to explore and gradually accumulate the clue information contained in the feature channels. Finally, a dynamic weighting strategy for boundary and structure is introduced to loss constraints together with prior knowledge to help the model predict challenging pixels. We compared the proposed model with 12 state-of-the-art methods in 4 public datasets. The results were then assessed on 4 metrics. The structural similarity measure and enhanced alignment measure in a large trained dataset reached 0.861 and 0.927 respectively whereas 0.872 and 0.926 respectively in the untrained large dataset, which demonstrates the competitiveness of our method over state-of-the-art methods.
Collapse
Affiliation(s)
- Chaoquan Zheng
- School of Information and Control Engineering, Southwest University of Science and Technology, Mianyang, 621010, China
| | - Jinzheng Lu
- School of Information and Control Engineering, Southwest University of Science and Technology, Mianyang, 621010, China.
| | - Kun Hu
- School of Information and Control Engineering, Southwest University of Science and Technology, Mianyang, 621010, China
| | - Qiang Xiang
- School of Information and Control Engineering, Southwest University of Science and Technology, Mianyang, 621010, China
| | - Ling Miao
- School of Information and Control Engineering, Southwest University of Science and Technology, Mianyang, 621010, China
| |
Collapse
|
5
|
Xia H, Zhou H, Zhang M, Zhang Q, Fan C, Yang Y, Xi S, Liu Y. Surface Defect Detection for Small Samples of Particleboard Based on Improved Proximal Policy Optimization. SENSORS (BASEL, SWITZERLAND) 2025; 25:2541. [PMID: 40285229 PMCID: PMC12031556 DOI: 10.3390/s25082541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2025] [Revised: 03/28/2025] [Accepted: 04/03/2025] [Indexed: 04/29/2025]
Abstract
Particleboard is an important forest product that can be reprocessed using wood processing by-products. This approach has the potential to achieve significant conservation of forest resources and contribute to the protection of forest ecology. Most current detection models require a significant number of tagged samples for training. However, with the advancement of industrial technology, the prevalence of surface defects in particleboard is decreasing, making the acquisition of sample data difficult and significantly limiting the effectiveness of model training. Deep reinforcement learning-based detection methods have been shown to exhibit strong generalization ability and sample utilization efficiency when the number of samples is limited. This paper focuses on the potential application of deep reinforcement learning in particleboard defect detection and proposes a novel detection method, PPOBoardNet, for the identification of five typical defects: dust spot, glue spot, scratch, sand leak and indentation. The proposed method is based on the proximal policy optimization (PPO) algorithm of the Actor-Critic framework, and defect detection is achieved by performing a series of scaling and translation operations on the mask. The method integrates the variable action space and the composite reward function and achieves the balanced optimization of different types of defect detection performance by adjusting the scaling and translation amplitude of the detection region. In addition, this paper proposes a state characterization strategy of multi-scale feature fusion, which integrates global features, local features and historical action sequences of the defect image and provides reliable guidance for action selection. On the particleboard defect dataset with limited images, PPOBoardNet achieves a mean average precision (mAP) of 79.0%, representing a 5.3% performance improvement over the YOLO series of optimal detection models. This result provides a novel technical approach to the challenge of defect detection with limited samples in the particleboard domain, with significant practical application value.
Collapse
Affiliation(s)
- Haifei Xia
- Jiangsu Co-Innovation Center of Efficient Processing and Utilization of Forest Resources, College of Mechanical and Electronic Engineering, Nanjing Forestry University, Nanjing 210037, China; (H.X.); (H.Z.); (M.Z.); (C.F.); (Y.Y.); (S.X.)
| | - Haiyan Zhou
- Jiangsu Co-Innovation Center of Efficient Processing and Utilization of Forest Resources, College of Mechanical and Electronic Engineering, Nanjing Forestry University, Nanjing 210037, China; (H.X.); (H.Z.); (M.Z.); (C.F.); (Y.Y.); (S.X.)
| | - Mingao Zhang
- Jiangsu Co-Innovation Center of Efficient Processing and Utilization of Forest Resources, College of Mechanical and Electronic Engineering, Nanjing Forestry University, Nanjing 210037, China; (H.X.); (H.Z.); (M.Z.); (C.F.); (Y.Y.); (S.X.)
| | - Qingyi Zhang
- School of Agricultural Engineering, Jiangsu University, Zhenjiang 212013, China;
| | - Chenlong Fan
- Jiangsu Co-Innovation Center of Efficient Processing and Utilization of Forest Resources, College of Mechanical and Electronic Engineering, Nanjing Forestry University, Nanjing 210037, China; (H.X.); (H.Z.); (M.Z.); (C.F.); (Y.Y.); (S.X.)
| | - Yutu Yang
- Jiangsu Co-Innovation Center of Efficient Processing and Utilization of Forest Resources, College of Mechanical and Electronic Engineering, Nanjing Forestry University, Nanjing 210037, China; (H.X.); (H.Z.); (M.Z.); (C.F.); (Y.Y.); (S.X.)
| | - Shuang Xi
- Jiangsu Co-Innovation Center of Efficient Processing and Utilization of Forest Resources, College of Mechanical and Electronic Engineering, Nanjing Forestry University, Nanjing 210037, China; (H.X.); (H.Z.); (M.Z.); (C.F.); (Y.Y.); (S.X.)
| | - Ying Liu
- Jiangsu Co-Innovation Center of Efficient Processing and Utilization of Forest Resources, College of Mechanical and Electronic Engineering, Nanjing Forestry University, Nanjing 210037, China; (H.X.); (H.Z.); (M.Z.); (C.F.); (Y.Y.); (S.X.)
| |
Collapse
|
6
|
Liang C, Liu D, Ge W, Huang W, Lan Y, Long Y. Detection of litchi fruit maturity states based on unmanned aerial vehicle remote sensing and improved YOLOv8 model. FRONTIERS IN PLANT SCIENCE 2025; 16:1568237. [PMID: 40308298 PMCID: PMC12042761 DOI: 10.3389/fpls.2025.1568237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2025] [Accepted: 03/20/2025] [Indexed: 05/02/2025]
Abstract
Rapid and accurate detection of the maturity state of litchi fruits is crucial for orchard management and picking period prediction. However, existing studies are largely limited to the binary classification of immature and mature fruits, lacking dynamic evaluation and precise prediction of maturity states. To address these limitations, this study proposed a method for detecting litchi maturity states based on UAV remote sensing and YOLOv8-FPDW. The YOLOv8-FPDW model integrated FasterNet, ParNetAttention, DADet, and Wiou modules, achieving a mean average precision (mAP) of 87.7%. The weight, parameter count, and computational load of the model were reduced by 17.5%, 19.0%, and 9.9%, respectively. The improved model demonstrated robust performance in different application scenarios. The proposed target quantity differential strategy effectively reduced the detection error for semi-mature fruits by 12.58%. The results showed significant stage-based changes in the maturity states of litchi fruits: during the rapid growth phase, the fruit count increased by 18.28%; during the maturity differentiation phase, semi-mature fruits accounted for approximately 53%; and during the peak maturity phase, mature fruits exceeded 50%, with a fruit drop rate of 11.46%. In addition, YOLOv8-FPDW was more competitive than mainstream object detection algorithms. The study predicted the optimal harvest period for litchis, providing scientific support for orchard batch harvesting and fine management.
Collapse
Affiliation(s)
- Changjiang Liang
- College of Electronic Engineering/College of Artificial Intelligence, South China Agricultural University, Guangzhou, China
- National Center for International Collaboration Research on Precision Agricultural Aviation Pesticides Spraying Technology, Guangzhou, China
| | - Dandan Liu
- College of Electronic Engineering/College of Artificial Intelligence, South China Agricultural University, Guangzhou, China
- National Center for International Collaboration Research on Precision Agricultural Aviation Pesticides Spraying Technology, Guangzhou, China
| | - Weiyi Ge
- College of Electronic Engineering/College of Artificial Intelligence, South China Agricultural University, Guangzhou, China
- National Center for International Collaboration Research on Precision Agricultural Aviation Pesticides Spraying Technology, Guangzhou, China
| | - Wenzhong Huang
- College of Electronic Engineering/College of Artificial Intelligence, South China Agricultural University, Guangzhou, China
- National Center for International Collaboration Research on Precision Agricultural Aviation Pesticides Spraying Technology, Guangzhou, China
| | - Yubin Lan
- College of Electronic Engineering/College of Artificial Intelligence, South China Agricultural University, Guangzhou, China
- National Center for International Collaboration Research on Precision Agricultural Aviation Pesticides Spraying Technology, Guangzhou, China
- Guangdong Laboratory of Lingnan Modern Agriculture, Guangzhou, China
| | - Yongbing Long
- College of Electronic Engineering/College of Artificial Intelligence, South China Agricultural University, Guangzhou, China
- National Center for International Collaboration Research on Precision Agricultural Aviation Pesticides Spraying Technology, Guangzhou, China
- Guangdong Laboratory of Lingnan Modern Agriculture, Guangzhou, China
- South China Smart Agriculture Public Research & Development Center, Ministry of Agriculture and Rural Affairs, Guangzhou, China
| |
Collapse
|
7
|
Li P, Ru J, Fei Q, Chen Z, Wang B. Interpretable capsule networks via self attention routing on spatially invariant feature surfaces. Sci Rep 2025; 15:13026. [PMID: 40234510 PMCID: PMC12000548 DOI: 10.1038/s41598-025-96903-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2025] [Accepted: 04/01/2025] [Indexed: 04/17/2025] Open
Abstract
The accurate and efficient evaluation and classification of situational images is fundamental to making informed and effective decisions. However, current classification approaches based on convolutional neural networks often suffer from limited generalization and robustness, particularly when processing data characterized by abstract class features and pronounced spatial attributes. Additionally, the "black-box" nature of deep neural network architectures poses significant challenges to their application in fields with stringent security requirements. To address these limitations, this paper introduces a novel Spatially Invariant Self-Attention Capsule Network (SISA-CapsNet), designed to encode interpretable spatial features for classification tasks. SISA-CapsNet employs capsules to encode spatial features from specific image regions and classifies these features through a self-attention routing mechanism. Specifically, spatially invariant feature surfaces with dimensions identical to the input image are generated and stacked to form feature capsules, each encoding spatial features from distinct regions. The self-attention mechanism calculates coupling coefficients, clustering feature capsules into class capsules. This architecture integrates a spatially invariant feature extraction structure, facilitating pixel-level encoding of regional spatial features, and leverages self-attention to effectively capture the relative importance of different spatial regions for classification. Together, these two mechanisms constitute an interpretable classification framework. Experimental validation on benchmark datasets and battlefield situational image datasets with pronounced spatial characteristics demonstrates that the proposed method not only achieves superior classification performance but also offers interpretability closely aligned with human cognitive processes. Furthermore, comparative analyses with existing visual interpretability method underscore the enhanced interpretability of SISA-CapsNet.
Collapse
Affiliation(s)
- Peizhang Li
- School of Automation, Beijing Institute of Technology, Beijing, 100081, China
| | - Jiyuan Ru
- School of Automation, Beijing Institute of Technology, Beijing, 100081, China
| | - Qing Fei
- School of Automation, Beijing Institute of Technology, Beijing, 100081, China.
| | - Zhen Chen
- School of Automation, Beijing Institute of Technology, Beijing, 100081, China
| | - Bo Wang
- China Shipbuilding Zhihai Innovation Research Institute, Beijing, China
| |
Collapse
|
8
|
Liu C, Shen Y, Mu F, Long H, Bilal A, Yu X, Dai Q. Detection of surface defects in soybean seeds based on improved Yolov9. Sci Rep 2025; 15:12631. [PMID: 40221419 PMCID: PMC11993732 DOI: 10.1038/s41598-025-92429-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Accepted: 02/27/2025] [Indexed: 04/14/2025] Open
Abstract
As one of the important indicators of soybean seed quality identification, the appearance of soybeans has always been of great concern to people, and in traditional detection, it is mainly through the naked eye to check whether there are defects on its surface. The field of machine learning, particularly deep learning technology, has undergone rapid advancements and development, making it possible to detect the defects of soybean seeds using deep learning technology. This method can effectively replace the traditional detection methods in the past and reduce the human resources consumption in this work, leading to decreased expenses associated with agricultural activities. In this paper, we propose a Yolov9-c-ghost-Forward model improved by introducing GhostConv, a lightweight convolutional module in GhostNet, which enhances the recognition of soybean seed images through grayscale conversion, filtering processing, image segmentation, morphological operations, etc. and greatly reduces the noise in them, to separate the soybean seeds from the original images. Based on the Yolov9 network, the soybean seed features are extracted, and the defects of soybean seeds are detected. Based on the experiments' findings, the recall rate can reach 98.6%, and the mAP0.5 can reach 99.2%. This shows that the model can provide a solid theoretical foundation and technical support for agricultural breeding screening and agricultural development.
Collapse
Affiliation(s)
- Chuanming Liu
- Key Laboratory of Data Science and Smart Education, Hainan Normal University, Ministry of Education, Haikou, 571158, China
| | - Yifan Shen
- Key Laboratory of Data Science and Smart Education, Hainan Normal University, Ministry of Education, Haikou, 571158, China
- School of Information Science and Technology, Hainan Normal University, Haikou, 571158, China
| | - Feng Mu
- Key Laboratory of Data Science and Smart Education, Hainan Normal University, Ministry of Education, Haikou, 571158, China
- School of Information Science and Technology, Hainan Normal University, Haikou, 571158, China
| | - Haixia Long
- Key Laboratory of Data Science and Smart Education, Hainan Normal University, Ministry of Education, Haikou, 571158, China
- School of Information Science and Technology, Hainan Normal University, Haikou, 571158, China
| | - Anas Bilal
- Key Laboratory of Data Science and Smart Education, Hainan Normal University, Ministry of Education, Haikou, 571158, China
- School of Information Science and Technology, Hainan Normal University, Haikou, 571158, China
| | - Xia Yu
- Key Laboratory of Data Science and Smart Education, Hainan Normal University, Ministry of Education, Haikou, 571158, China.
- School of Information Science and Technology, Hainan Normal University, Haikou, 571158, China.
| | - Qi Dai
- Affiliation College of Life Science and Medicine, Zhejiang Sci-Tech University, Hangzhou, China.
| |
Collapse
|
9
|
Mao M, Hong M. YOLO Object Detection for Real-Time Fabric Defect Inspection in the Textile Industry: A Review of YOLOv1 to YOLOv11. SENSORS (BASEL, SWITZERLAND) 2025; 25:2270. [PMID: 40218782 PMCID: PMC11990965 DOI: 10.3390/s25072270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2025] [Revised: 04/01/2025] [Accepted: 04/02/2025] [Indexed: 04/14/2025]
Abstract
Automated fabric defect detection is crucial for improving quality control, reducing manual labor, and optimizing efficiency in the textile industry. Traditional inspection methods rely heavily on human oversight, which makes them prone to subjectivity, inefficiency, and inconsistency in high-speed manufacturing environments. This review systematically examines the evolution of the You Only Look Once (YOLO) object detection framework from YOLO-v1 to YOLO-v11, emphasizing architectural advancements such as attention-based feature refinement and Transformer integration and their impact on fabric defect detection. Unlike prior studies focusing on specific YOLO variants, this work comprehensively compares the entire YOLO family, highlighting key innovations and their practical implications. We also discuss the challenges, including dataset limitations, domain generalization, and computational constraints, proposing future solutions such as synthetic data generation, federated learning, and edge AI deployment. By bridging the gap between academic advancements and industrial applications, this review is a practical guide for selecting and optimizing YOLO models for fabric inspection, paving the way for intelligent quality control systems.
Collapse
Affiliation(s)
- Makara Mao
- Department of Software Convergence, Soonchunhyang University, Asan-si 31538, Republic of Korea;
| | - Min Hong
- Department of Computer Software Engineering, Soonchunhyang University, Asan-si 31538, Republic of Korea
| |
Collapse
|
10
|
Jiao L, Wang M, Liu X, Li L, Liu F, Feng Z, Yang S, Hou B. Multiscale Deep Learning for Detection and Recognition: A Comprehensive Survey. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:5900-5920. [PMID: 38652624 DOI: 10.1109/tnnls.2024.3389454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2024]
Abstract
Recently, the multiscale problem in computer vision has gradually attracted people's attention. This article focuses on multiscale representation for object detection and recognition, comprehensively introduces the development of multiscale deep learning, and constructs an easy-to-understand, but powerful knowledge structure. First, we give the definition of scale, explain the multiscale mechanism of human vision, and then lead to the multiscale problem discussed in computer vision. Second, advanced multiscale representation methods are introduced, including pyramid representation, scale-space representation, and multiscale geometric representation. Third, the theory of multiscale deep learning is presented, which mainly discusses the multiscale modeling in convolutional neural networks (CNNs) and Vision Transformers (ViTs). Fourth, we compare the performance of multiple multiscale methods on different tasks, illustrating the effectiveness of different multiscale structural designs. Finally, based on the in-depth understanding of the existing methods, we point out several open issues and future directions for multiscale deep learning.
Collapse
|
11
|
Luo Y, Zheng X, Qiu M, Gou Y, Yang Z, Qu X, Chen Z, Lin Y. Deep learning and its applications in nuclear magnetic resonance spectroscopy. PROGRESS IN NUCLEAR MAGNETIC RESONANCE SPECTROSCOPY 2025; 146-147:101556. [PMID: 40306798 DOI: 10.1016/j.pnmrs.2024.101556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2024] [Revised: 12/26/2024] [Accepted: 12/30/2024] [Indexed: 05/02/2025]
Abstract
Nuclear Magnetic Resonance (NMR), as an advanced technology, has widespread applications in various fields like chemistry, biology, and medicine. However, issues such as long acquisition times for multidimensional spectra and low sensitivity limit the broader application of NMR. Traditional algorithms aim to address these issues but have limitations in speed and accuracy. Deep Learning (DL), a branch of Artificial Intelligence (AI) technology, has shown remarkable success in many fields including NMR. This paper presents an overview of the basics of DL and current applications of DL in NMR, highlights existing challenges, and suggests potential directions for improvement.
Collapse
Affiliation(s)
- Yao Luo
- Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Department of Electronic Science, State Key Laboratory of Physical Chemistry of Solid Surfaces, Xiamen University, Xiamen 361005, China
| | - Xiaoxu Zheng
- Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Department of Electronic Science, State Key Laboratory of Physical Chemistry of Solid Surfaces, Xiamen University, Xiamen 361005, China
| | - Mengjie Qiu
- Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Department of Electronic Science, State Key Laboratory of Physical Chemistry of Solid Surfaces, Xiamen University, Xiamen 361005, China
| | - Yaoping Gou
- Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Department of Electronic Science, State Key Laboratory of Physical Chemistry of Solid Surfaces, Xiamen University, Xiamen 361005, China
| | - Zhengxian Yang
- Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Department of Electronic Science, State Key Laboratory of Physical Chemistry of Solid Surfaces, Xiamen University, Xiamen 361005, China
| | - Xiaobo Qu
- Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Department of Electronic Science, State Key Laboratory of Physical Chemistry of Solid Surfaces, Xiamen University, Xiamen 361005, China
| | - Zhong Chen
- Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Department of Electronic Science, State Key Laboratory of Physical Chemistry of Solid Surfaces, Xiamen University, Xiamen 361005, China
| | - Yanqin Lin
- Fujian Provincial Key Laboratory of Plasma and Magnetic Resonance, Department of Electronic Science, State Key Laboratory of Physical Chemistry of Solid Surfaces, Xiamen University, Xiamen 361005, China.
| |
Collapse
|
12
|
Wang G, Zhang X, Peng Z, Zhang T, Tang X, Zhou H, Jiao L. Negative Deterministic Information-Based Multiple Instance Learning for Weakly Supervised Object Detection and Segmentation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:6188-6202. [PMID: 38748523 DOI: 10.1109/tnnls.2024.3395751] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/05/2025]
Abstract
Weakly supervised object detection (WSOD) and semantic segmentation with image-level annotations have attracted extensive attention due to their high label efficiency. Multiple instance learning (MIL) offers a feasible solution for the two tasks by treating each image as a bag with a series of instances (object regions or pixels) and identifying foreground instances that contribute to bag classification. However, conventional MIL paradigms often suffer from issues, e.g., discriminative instance domination and missing instances. In this article, we observe that negative instances usually contain valuable deterministic information, which is the key to solving the two issues. Motivated by this, we propose a novel MIL paradigm based on negative deterministic information (NDI), termed NDI-MIL, which is based on two core designs with a progressive relation: NDI collection and negative contrastive learning (NCL). In NDI collection, we identify and distill NDI from negative instances online by a dynamic feature bank. The collected NDI is then utilized in a NCL mechanism to locate and punish those discriminative regions, by which the discriminative instance domination and missing instances issues are effectively addressed, leading to improved object- and pixel-level localization accuracy and completeness. In addition, we design an NDI-guided instance selection (NGIS) strategy to further enhance the systematic performance. Experimental results on several public benchmarks, including PASCAL VOC 2007, PASCAL VOC 2012, and MS COCO, show that our method achieves satisfactory performance. The code is available at: https://github.com/GC-WSL/NDI.
Collapse
|
13
|
Hao Y, Yan S, Yang G, Luo Y, Liu D, Han C, Ren X, Du D. Image segmentation and coverage estimation of deep-sea polymetallic nodules based on lightweight deep learning model. Sci Rep 2025; 15:10177. [PMID: 40128230 PMCID: PMC11933300 DOI: 10.1038/s41598-025-89952-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Accepted: 02/10/2025] [Indexed: 03/26/2025] Open
Abstract
Deep-sea polymetallic nodules, abundant in critical metal elements, are a vital strategic mineral resource. Accordingly, the prompt, accurate, and high-speed acquisition of parameters and distribution data for these nodules is crucial for the effective exploration, evaluation, and identification of valuable deposits. Studies show that one of the primary parameters for assessing polymetallic nodules is the Coverage Rate. For real-time, accurate, and efficient computation of this parameter, this article proposes a streamlined segmentation model named YOLOv7-PMN. This model is particularly designed for analyzing seafloor video data. The model substitutes the YOLOv7 backbone with the lightweight feature extraction framework of MobileNetV3-Small and integrates multi-level Squeeze-and-Excitation attention mechanisms. These changes enhance detection accuracy, speed up inference, and reduce the model's overall size. The head network utilizes depth-wise separable convolution modules, significantly decreasing the number of model parameters. Compared to the original YOLOv7, the YOLOv7-PMN shows improved detection and segmentation performance for nodules of varying sizes. On the same dataset, the recall rate for nodules increases by 3% over the YOLOv7 model. Model parameters are cut by 61.78%, memory usage by the best weights is reduced by 61.15%, and inference speed for detection and segmentation rises to 65.79 FPS, surpassing the 25 FPS video capture rate. The model demonstrates strong generalization capabilities, lowering the requirements for video data quality and reducing dependency on extensive dataset annotations. In summary, YOLOv7-PMN is highly effective in processing seabed images of polymetallic nodules, which are characterized by varying target scales, complex environments, and diverse features. This model holds significant promise for practical application and broad adoption.
Collapse
Affiliation(s)
- Yue Hao
- Key Laboratory of Marine Geology and Metallogeny, First Institute of Oceanography, Ministry of Natural Resources, Qingdao, 266061, China
| | - Shijuan Yan
- Key Laboratory of Marine Geology and Metallogeny, First Institute of Oceanography, Ministry of Natural Resources, Qingdao, 266061, China.
- Laboratory for Marine Mineral Resources, Qingdao Marine Science and Technology Center, Qingdao, 266237, China.
| | - Gang Yang
- Key Laboratory of Marine Geology and Metallogeny, First Institute of Oceanography, Ministry of Natural Resources, Qingdao, 266061, China.
- Key Laboratory of Deep Sea Mineral Resource Development, Shandong(Preparatory), Qingdao, China.
| | - Yiping Luo
- Key Laboratory of Marine Geology and Metallogeny, First Institute of Oceanography, Ministry of Natural Resources, Qingdao, 266061, China
| | - Dalong Liu
- Key Laboratory of Marine Geology and Metallogeny, First Institute of Oceanography, Ministry of Natural Resources, Qingdao, 266061, China
| | - Chunhua Han
- National Marine Data and Information Service, Tianjin, 300012, China
| | - Xiangwen Ren
- Key Laboratory of Marine Geology and Metallogeny, First Institute of Oceanography, Ministry of Natural Resources, Qingdao, 266061, China
- Laboratory for Marine Mineral Resources, Qingdao Marine Science and Technology Center, Qingdao, 266237, China
| | - Dewen Du
- Key Laboratory of Marine Geology and Metallogeny, First Institute of Oceanography, Ministry of Natural Resources, Qingdao, 266061, China
- Laboratory for Marine Mineral Resources, Qingdao Marine Science and Technology Center, Qingdao, 266237, China
| |
Collapse
|
14
|
Wei D, Chang Y, Kuang H. Extraction and spatiotemporal analysis of impervious surfaces in Chongqing based on enhanced DeepLabv3. Sci Rep 2025; 15:9807. [PMID: 40118998 PMCID: PMC11928587 DOI: 10.1038/s41598-025-94882-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Accepted: 03/17/2025] [Indexed: 03/24/2025] Open
Abstract
In this study, Sentinel-2 time series satellite remote sensing imagery and an improved CA-DeepLabV3+ semantic segmentation network were utilized to construct a model for extracting urban impervious surfaces. The model was used to extract the distribution information of impervious surfaces in the central urban area in Chongqing from 2017 to 2022. The spatiotemporal evolution characteristics of the impervious surfaces were analyzed using the area change and standard deviational ellipse methods. The results indicate that the improved CA-DeepLabV3+ model performs exceptionally well in identifying impervious surfaces, with precision, recall, F1 score, and MIoU values of 90.78%, 90.85%, 90.82%, and 83.25%, respectively, which are significantly better than those of other classic semantic segmentation models, demonstrating its high reliability and generalization performance. The analysis shows that the impervious surface area in Chongqing's central urban area has grown rapidly over the past five years, with a clear expansion trend, especially in the core urban area and its surrounding areas. The standard deviational ellipse analysis revealed that significant directional expansion of the impervious surfaces has occurred, primarily along the north-south axis. Overall, this model can achieve large-scale, time-series monitoring of the impervious surface distribution, providing critical technical support for studying urban impervious surface expansion and fine urban management, presenting promising application prospects.
Collapse
Affiliation(s)
- Dengfeng Wei
- School of Geographical Sciences, Southwest University, Chongqing, 400715, China
| | - Yue Chang
- School of Geographical Sciences, Southwest University, Chongqing, 400715, China
| | - Honghai Kuang
- School of Geographical Sciences, Southwest University, Chongqing, 400715, China.
| |
Collapse
|
15
|
Luo P, Niu Y, Tang D, Huang W, Luo X, Mu J. A computer vision solution for behavioral recognition in red pandas. Sci Rep 2025; 15:9201. [PMID: 40097424 PMCID: PMC11914461 DOI: 10.1038/s41598-025-89075-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Accepted: 02/03/2025] [Indexed: 03/19/2025] Open
Abstract
The survival of the red panda, an endangered arboreal mammal, is challenged by two main factors: habitat loss and health risks that contribute to high morbidity and mortality. Abnormal behaviors, such as reduced social and locomotor behaviors and sleep deprivation, are often signals of potential health problems. Non-invasive behavioral monitoring using computer vision can provide valuable insights to advance health research and welfare practices. This study presents a dataset of 3142 images of red panda behavior, collected using a motion-activated camera and web crawler technology at Bifengxia Wildlife World. This study proposes an improved lightweight and efficient YOLOv8 model for behavior recognition. The model incorporates adaptive histogram equalization and the GMBottleNeck module, which enhance detail accentuation and reduce parameters. The training process was enhanced through the integration of the SimAM attention mechanism and feature fusion learning. The aforementioned enhancements led to the YOLOv8s-Red Panda model attaining a 90.6% accuracy rate, representing a 1.4% improvement and a 1/3 reduction in model size in comparison to the data-enhanced baseline model (YOLOv8s-DE). The model exhibits exemplary performance in the recognition of red panda behavior, with the potential to significantly advance healthcare and optimize animal welfare.
Collapse
Affiliation(s)
- Pu Luo
- College of Information Engineering, Sichuan Agricultural University, Ya'an, China
- Ya'an Digital Agriculture Engineering Technology Research Center, Sichuan Agricultural University, Ya'an, China
| | - Yupeng Niu
- College of Information Engineering, Sichuan Agricultural University, Ya'an, China
- Ya'an Digital Agriculture Engineering Technology Research Center, Sichuan Agricultural University, Ya'an, China
| | - Duoxun Tang
- College of Information Engineering, Sichuan Agricultural University, Ya'an, China
- Ya'an Digital Agriculture Engineering Technology Research Center, Sichuan Agricultural University, Ya'an, China
| | - Wenyuan Huang
- College of Information Engineering, Sichuan Agricultural University, Ya'an, China
- Ya'an Digital Agriculture Engineering Technology Research Center, Sichuan Agricultural University, Ya'an, China
| | - Xuefei Luo
- College of Information Engineering, Sichuan Agricultural University, Ya'an, China
| | - Jiong Mu
- College of Information Engineering, Sichuan Agricultural University, Ya'an, China.
- Ya'an Digital Agriculture Engineering Technology Research Center, Sichuan Agricultural University, Ya'an, China.
| |
Collapse
|
16
|
Cao Z, Shi Y, Zhang S, Chen H, Liu W, Yue G, Lin H. Decentralized learning for medical image classification with prototypical contrastive network. Med Phys 2025. [PMID: 40089972 DOI: 10.1002/mp.17753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Revised: 02/17/2025] [Accepted: 02/22/2025] [Indexed: 03/18/2025] Open
Abstract
BACKGROUND Recently, deep convolutional neural networks (CNNs) have shown great potential in medical image classification tasks. However, the practical usage of the methods is constrained by two challenges: 1) the challenge of using nonindependent and identically distributed (non-IID) datasets from various medical institutions while ensuring privacy, and 2) the data imbalance problem due to the frequency of different diseases. PURPOSE The objective of this paper is to present a novel approach for addressing these challenges through a decentralized learning method using a prototypical contrastive network to achieve precise medical image classification while mitigating the non-IID problem across different clients. METHODS We propose a prototype contrastive network that minimizes disparities among heterogeneous clients. This network utilizes an approximate global prototype to alleviate the non-IID dataset problem for each local client by projecting data onto a balanced prototype space. To validate the effectiveness of our algorithm, we employed three distinct datasets of color fundus photographs for diabetic retinopathy: the EyePACS, APTOS, and IDRiD datasets. During training, we incorporated 35k images from EyePACS, 3662 from APTOS, and 516 from IDRiD. For testing, we used 53k images from EyePACS. Additionally, we included the COVIDx dataset of chest X-rays for comparative analysis, comprising 29 986 training images and 400 test samples. RESULTS In this study, we conducted comprehensive comparisons with existing works using four medical image datasets. Specifically, on the EyePACS dataset under the balanced IID setting, our method outperformed the FedAvg baseline by 3.7% in accuracy. In the Dirichlet non-IID setting, which presents an extremely unbalanced distribution, our method showed a notable 6.6% enhancement in accuracy over FedAvg. Similarly, on the APTOS dataset, our method achieved a 3.7% improvement in accuracy over FedAvg under the balanced IID setting and a 5.0% improvement under the Dirichlet non-IID setting. Notably, on the DCC non-IID and COVID-19 datasets, our method established a new state-of-the-art across all evaluation metrics, including WAccuracy, WPrecision, WRecall, and WF-score. CONCLUSIONS Our proposed prototypical contrastive loss guides the local client's data distribution to align with the global distribution. Additionally, our method uses an approximate global prototype to address unbalanced dataset distribution across local clients by projecting all data onto a new balanced prototype space. Our model achieves state-of-the-art performance on the EyePACS, APTOS, IDRiD, and COVIDx datasets.
Collapse
Affiliation(s)
- Zhantao Cao
- Institutions for Research, CETC Cyberspace Security Technology CO., LTD., Chengdu, China
- Chengdu Westone Information Security Technology Co., Ltd., Chengdu, China
- Ubiquitous Intelligence and Trusted Services Key Laboratory of Sichuan Province, Chengdu, China
| | - Yuanbing Shi
- Institutions for Research, CETC Cyberspace Security Technology CO., LTD., Chengdu, China
- Chengdu Westone Information Security Technology Co., Ltd., Chengdu, China
| | - Shuli Zhang
- Institutions for Research, CETC Cyberspace Security Technology CO., LTD., Chengdu, China
| | - Huanan Chen
- Institutions for Research, CETC Cyberspace Security Technology CO., LTD., Chengdu, China
| | - Weide Liu
- Institute for Infocomm Research, A*STAR, Singapore, Singapore
| | - Guanghui Yue
- School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, China
| | - Huazhen Lin
- Center of Statistical Research and School of Statistics, Southwestern University of Finance and Economics, Chengdu, China
| |
Collapse
|
17
|
Hong Y, Pan H, Jia Y, Sun W, Gao H. ResDNet: Efficient Dense Multi-Scale Representations With Residual Learning for High-Level Vision Tasks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:3904-3915. [PMID: 35533173 DOI: 10.1109/tnnls.2022.3169779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Deep feature fusion plays a significant role in the strong learning ability of convolutional neural networks (CNNs) for computer vision tasks. Recently, works continually demonstrate the advantages of efficient aggregation strategy and some of them refer to multiscale representations. In this article, we describe a novel network architecture for high-level computer vision tasks where densely connected feature fusion provides multiscale representations for the residual network. We term our method the ResDNet which is a simple and efficient backbone made up of sequential ResDNet modules containing the variants of dense blocks named sliding dense blocks (SDBs). Compared with DenseNet, ResDNet enhances the feature fusion and reduces the redundancy by shallower densely connected architectures. Experimental results on three classification benchmarks including CIFAR-10, CIFAR-100, and ImageNet demonstrate the effectiveness of ResDNet. ResDNet always outperforms DenseNet using much less computation on CIFAR-100. On ImageNet, ResDNet-B-129 achieves 1.94% and 0.89% top-1 accuracy improvement over ResNet-50 and DenseNet-201 with similar complexity. Besides, ResDNet with more than 1000 layers achieves remarkable accuracy on CIFAR compared with other state-of-the-art results. Based on MMdetection implementation of RetinaNet, ResDNet-B-129 improves mAP from 36.3 to 39.5 compared with ResNet-50 on COCO dataset.
Collapse
|
18
|
Li H, Yuan M, Li J, Liu Y, Lu G, Xu Y, Yu Z, Zhang D. Focus Affinity Perception and Super-Resolution Embedding for Multifocus Image Fusion. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:4311-4325. [PMID: 38446648 DOI: 10.1109/tnnls.2024.3367782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/08/2024]
Abstract
Despite the fact that there is a remarkable achievement on multifocus image fusion, most of the existing methods only generate a low-resolution image if the given source images suffer from low resolution. Obviously, a naive strategy is to independently conduct image fusion and image super-resolution. However, this two-step approach would inevitably introduce and enlarge artifacts in the final result if the result from the first step meets artifacts. To address this problem, in this article, we propose a novel method to simultaneously achieve image fusion and super-resolution in one framework, avoiding step-by-step processing of fusion and super-resolution. Since a small receptive field can discriminate the focusing characteristics of pixels in detailed regions, while a large receptive field is more robust to pixels in smooth regions, a subnetwork is first proposed to compute the affinity of features under different types of receptive fields, efficiently increasing the discriminability of focused pixels. Simultaneously, in order to prevent from distortion, a gradient embedding-based super-resolution subnetwork is also proposed, in which the features from the shallow layer, the deep layer, and the gradient map are jointly taken into account, allowing us to get an upsampled image with high resolution. Compared with the existing methods, which implemented fusion and super-resolution independently, our proposed method directly achieves these two tasks in a parallel way, avoiding artifacts caused by the inferior output of image fusion or super-resolution. Experiments conducted on the real-world dataset substantiate the superiority of our proposed method compared with state of the arts.
Collapse
|
19
|
Shao Z, Han J, Marnerides D, Debattista K. Region-Object Relation-Aware Dense Captioning via Transformer. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:4184-4195. [PMID: 35275824 DOI: 10.1109/tnnls.2022.3152990] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Dense captioning provides detailed captions of complex visual scenes. While a number of successes have been achieved in recent years, there are still two broad limitations: 1) most existing methods adopt an encoder-decoder framework, where the contextual information is sequentially encoded using long short-term memory (LSTM). However, the forget gate mechanism of LSTM makes it vulnerable when dealing with a long sequence and 2) the vast majority of prior arts consider regions of interests (RoIs) equally important, thus failing to focus on more informative regions. The consequence is that the generated captions cannot highlight important contents of the image, which does not seem natural. To overcome these limitations, in this article, we propose a novel end-to-end transformer-based dense image captioning architecture, termed the transformer-based dense captioner (TDC). TDC learns the mapping between images and their dense captions via a transformer, prioritizing more informative regions. To this end, we present a novel unit, named region-object correlation score unit (ROCSU), to measure the importance of each region, where the relationships between detected objects and the region, alongside the confidence scores of detected objects within the region, are taken into account. Extensive experimental results and ablation studies on the standard dense-captioning datasets demonstrate the superiority of the proposed method to the state-of-the-art methods.
Collapse
|
20
|
Zhang L, Liu Z, Zhu X, Song Z, Yang X, Lei Z, Qiao H. Weakly Aligned Feature Fusion for Multimodal Object Detection. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:4145-4159. [PMID: 34437075 DOI: 10.1109/tnnls.2021.3105143] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
To achieve accurate and robust object detection in the real-world scenario, various forms of images are incorporated, such as color, thermal, and depth. However, multimodal data often suffer from the position shift problem, i.e., the image pair is not strictly aligned, making one object has different positions in different modalities. For the deep learning method, this problem makes it difficult to fuse multimodal features and puzzles the convolutional neural network (CNN) training. In this article, we propose a general multimodal detector named aligned region CNN (AR-CNN) to tackle the position shift problem. First, a region feature (RF) alignment module with adjacent similarity constraint is designed to consistently predict the position shift between two modalities and adaptively align the cross-modal RFs. Second, we propose a novel region of interest (RoI) jitter strategy to improve the robustness to unexpected shift patterns. Third, we present a new multimodal feature fusion method that selects the more reliable feature and suppresses the less useful one via feature reweighting. In addition, by locating bounding boxes in both modalities and building their relationships, we provide novel multimodal labeling named KAIST-Paired. Extensive experiments on 2-D and 3-D object detection, RGB-T, and RGB-D datasets demonstrate the effectiveness and robustness of our method.
Collapse
|
21
|
Zhao J, Zhao Q, Wu C, Li Z, Shuang F. A Scale-Invariant Looming Detector for UAV Return Missions in Power Line Scenarios. Biomimetics (Basel) 2025; 10:99. [PMID: 39997122 PMCID: PMC11852856 DOI: 10.3390/biomimetics10020099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2025] [Revised: 02/02/2025] [Accepted: 02/07/2025] [Indexed: 02/26/2025] Open
Abstract
Unmanned aerial vehicles (UAVs) offer an efficient solution for power grid maintenance, but collision avoidance during return flights is challenged by crossing power lines, especially for small drones with limited computational resources. Conventional visual systems struggle to detect thin, intricate power lines, which are often overlooked or misinterpreted. While deep learning methods have improved static power line detection in images, they still struggle with dynamic scenarios where collision risks are not detected in real time. Inspired by the hypothesis that the Lobula Giant Movement Detector (LGMD) distinguishes sparse and incoherent motion in the background by detecting continuous and clustered motion contours of the looming object, we propose a Scale-Invariant Looming Detector (SILD). SILD detects motion by preprocessing video frames, enhances motion regions using attention masks, and simulates biological arousal to recognize looming threats while suppressing noise. It also predicts impending collisions during high-speed flight and overcomes the limitations of motion vision to ensure consistent sensitivity to looming objects at different scales. We compare SILD with existing static power line detection techniques, including the Hough transform and D-LinkNet with a dilated convolution-based encoder-decoder architecture. Our results show that SILD strikes an effective balance between detection accuracy and real-time processing efficiency. It is well suited for UAV-based power line detection, where high precision and low-latency performance are essential. Furthermore, we evaluated the performance of the model under various conditions and successfully deployed it on a UAV-embedded board for collision avoidance testing at power lines. This approach provides a novel perspective for UAV obstacle avoidance in power line scenarios.
Collapse
Affiliation(s)
- Jiannan Zhao
- Guangxi Key Laboratory of Intelligent Control and Maintenance of Power Equipment, School of Electrical Engineering, Guangxi University, Nanning 530004, China; (J.Z.); (Q.Z.); (Z.L.)
| | - Qidong Zhao
- Guangxi Key Laboratory of Intelligent Control and Maintenance of Power Equipment, School of Electrical Engineering, Guangxi University, Nanning 530004, China; (J.Z.); (Q.Z.); (Z.L.)
| | - Chenggen Wu
- State Grid Lishui Power Supply Company, Grid Zhejiang Electric Power Company, State Grid Corporation of China, Lishui 323000, China;
| | - Zhiteng Li
- Guangxi Key Laboratory of Intelligent Control and Maintenance of Power Equipment, School of Electrical Engineering, Guangxi University, Nanning 530004, China; (J.Z.); (Q.Z.); (Z.L.)
| | - Feng Shuang
- Guangxi Key Laboratory of Intelligent Control and Maintenance of Power Equipment, School of Electrical Engineering, Guangxi University, Nanning 530004, China; (J.Z.); (Q.Z.); (Z.L.)
| |
Collapse
|
22
|
Li G. A survey of open-access datasets for computer vision in precision poultry farming. Poult Sci 2025; 104:104784. [PMID: 39793242 PMCID: PMC11762189 DOI: 10.1016/j.psj.2025.104784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2024] [Revised: 01/02/2025] [Accepted: 01/04/2025] [Indexed: 01/13/2025] Open
Abstract
Computer vision has progressively advanced precision poultry farming. Despite this substantial increase in research activity, computer vision in precision poultry farming still lacks large-scale, open-access datasets with consistent evaluation metrics and baselines, which makes it challenging to reproduce and validate comparisons of different approaches. Since 2019, several image/video datasets have been published and open-accessed to alleviate the issue of dataset scarcity. However, there is no a dedicated survey summarizing the existing progress. To fill this gap, the objective of this research was to provide the first survey and analysis of the open-access image/video dataset for precision poultry farming. A total of 20 qualified images/video datasets were summarized, including 4 for behavior monitoring, 6 for health status identification, 3 for live performance prediction, 4 for product quality inspection, and 3 for animal trait recognition. Critical points of creating a new image/video dataset, consisting of data acquisition, augmentation, annotation, sharing, and benchmarking, were discussed. The survey provides options for selecting appropriate datasets for model development and optimization while delivering insights into building new datasets for precision poultry farming.
Collapse
Affiliation(s)
- Guoming Li
- Department of Poultry Science, The University of Georgia, Athens, GA 30602, USA; Institute for Artificial Intelligence, The University of Georgia, Athens, GA 30602, USA; Institute for Integrative Precision Agriculture, The University of Georgia, Athens, GA 30602, USA.
| |
Collapse
|
23
|
Wei W, Wei P, Liao Z, Qin J, Cheng X, Liu M, Zheng N. Semantic Consistency Reasoning for 3-D Object Detection in Point Clouds. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:3356-3369. [PMID: 38113156 DOI: 10.1109/tnnls.2023.3341097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2023]
Abstract
Point cloud-based 3-D object detection is a significant and critical issue in numerous applications. While most existing methods attempt to capitalize on the geometric characteristics of point clouds, they neglect the internal semantic properties of point and the consistency between the semantic and geometric clues. We introduce a semantic consistency (SC) mechanism for 3-D object detection in this article, by reasoning about the semantic relations between 3-D object boxes and its internal points. This mechanism is based on a natural principle: the semantic category of a 3-D bounding box should be consistent with the categories of all points within the box. Driven by the SC mechanism, we propose a novel SC network (SCNet) to detect 3-D objects from point clouds. Specifically, the SCNet is composed of a feature extraction module, a detection decision module, and a semantic segmentation module. In inference, the feature extraction and the detection decision modules are used to detect 3-D objects. In training, the semantic segmentation module is jointly trained with the other two modules to produce more robust and applicable model parameters. The performance is greatly boosted through reasoning about the relations between the output 3-D object boxes and segmented points. The proposed SC mechanism is model-agnostic and can be integrated into other base 3-D object detection models. We test the proposed model on three challenging indoor and outdoor benchmark datasets: ScanNetV2, SUN RGB-D, and KITTI. Furthermore, to validate the universality of the SC mechanism, we implement it in three different 3-D object detectors. The experiments show that the performance is impressively improved and the extensive ablation studies also demonstrate the effectiveness of the proposed model.
Collapse
|
24
|
Liu Y, Sun Z, Xi L, Zhang L, Dong W, Chen C, Lu M, Fu H, Deng F. MMFW-UAV dataset: multi-sensor and multi-view fixed-wing UAV dataset for air-to-air vision tasks. Sci Data 2025; 12:185. [PMID: 39885165 PMCID: PMC11782645 DOI: 10.1038/s41597-025-04482-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Accepted: 01/16/2025] [Indexed: 02/01/2025] Open
Abstract
We present an air-to-air multi-sensor and multi-view fixed-wing UAV dataset, MMFW-UAV, in this work. MMFW-UAV contains a total of 147,417 fixed-wing UAVs images captured by multiple types of sensors (zoom, wide-angle, and thermal imaging sensors), displaying the flight status of fixed-wing UAVs of different sizes, appearances, structures, and stabilized flight velocities from multiple aerial perspectives (top-down, horizontal, and bottom-up views), aiming to cover the full-range of perspectives with multi-modal image data. Quality control processes of semi-automatic annotation, manual check, and secondary refinement are performed on each image. To the best of our knowledge, MMFW-UAV is the first one-to-one multi-modal image dataset for fixed-wing UAVs with high-quality annotations. Several mainstream deep learning-based object detection architectures are evaluated on MMFW-UAV and the experimental results demonstrate that MMFW-UAV can be utilized for fixed-wing UAV identification, detection, and monitoring. We believe that MMFW-UAV will contribute to various fixed-wing UAVs-based research and applications.
Collapse
Affiliation(s)
- Yang Liu
- National Key Lab of Autonomous Intelligent Unmanned Systems, Beijing Institute of Technology, Beijing, 100081, China
| | - Zhihao Sun
- National Key Lab of Autonomous Intelligent Unmanned Systems, Beijing Institute of Technology, Beijing, 100081, China
| | - Lele Xi
- School of Electrical Engineering, Hebei University of Science and Technology, Shijiazhuang, 050018, China
| | - Lele Zhang
- National Key Lab of Autonomous Intelligent Unmanned Systems, Beijing Institute of Technology, Beijing, 100081, China
| | - Wei Dong
- National Key Lab of Autonomous Intelligent Unmanned Systems, Beijing Institute of Technology, Beijing, 100081, China
| | - Chen Chen
- National Key Lab of Autonomous Intelligent Unmanned Systems, Beijing Institute of Technology, Beijing, 100081, China
- Chongqing Innovation Center, Beijing Institute of Technology, Chongqing, 401120, China
| | - Maobin Lu
- National Key Lab of Autonomous Intelligent Unmanned Systems, Beijing Institute of Technology, Beijing, 100081, China
- Chongqing Innovation Center, Beijing Institute of Technology, Chongqing, 401120, China
| | - Hailing Fu
- National Key Lab of Autonomous Intelligent Unmanned Systems, Beijing Institute of Technology, Beijing, 100081, China
| | - Fang Deng
- National Key Lab of Autonomous Intelligent Unmanned Systems, Beijing Institute of Technology, Beijing, 100081, China.
- Chongqing Innovation Center, Beijing Institute of Technology, Chongqing, 401120, China.
| |
Collapse
|
25
|
Zhang N, Chen Y, Zhang E, Liu Z, Yue J. Maize quality detection based on MConv-SwinT high-precision model. PLoS One 2025; 20:e0312363. [PMID: 39854315 PMCID: PMC11761119 DOI: 10.1371/journal.pone.0312363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Accepted: 10/04/2024] [Indexed: 01/26/2025] Open
Abstract
The traditional method of corn quality detection relies heavily on the subjective judgment of inspectors and suffers from a high error rate. To address these issues, this study employs the Swin Transformer as an enhanced base model, integrating machine vision and deep learning techniques for corn quality assessment. Initially, images of high-quality, moldy, and broken corn were collected. After preprocessing, a total of 20,152 valid images were obtained for the experimental samples. The network then extracts both shallow and deep features from these maize images, which are subsequently fused. Concurrently, the extracted features undergo further processing through a specially designed convolutional block. The fused features, combined with those processed by the convolutional module, are fed into an attention layer. This attention layer assigns weights to the features, facilitating accurate final classification. Experimental results demonstrate that the MC-Swin Transformer model proposed in this paper significantly outperforms traditional convolutional neural network models in key metrics such as accuracy, precision, recall, and F1 score, achieving a recognition accuracy rate of 99.89%. Thus, the network effectively and efficiently classifies different corn qualities. This study not only offers a novel perspective and technical approach to corn quality detection but also holds significant implications for the advancement of smart agriculture.
Collapse
Affiliation(s)
- Ning Zhang
- Engineering Research Center of Hydrogen Energy Equipment& Safety Detection, Universities of Shaanxi Province, Xijing University, Xi’an, China
| | - Yuanqi Chen
- Engineering Research Center of Hydrogen Energy Equipment& Safety Detection, Universities of Shaanxi Province, Xijing University, Xi’an, China
| | - Enxu Zhang
- Engineering Research Center of Hydrogen Energy Equipment& Safety Detection, Universities of Shaanxi Province, Xijing University, Xi’an, China
| | - Ziyang Liu
- Engineering Research Center of Hydrogen Energy Equipment& Safety Detection, Universities of Shaanxi Province, Xijing University, Xi’an, China
| | - Jie Yue
- Engineering Research Center of Hydrogen Energy Equipment& Safety Detection, Universities of Shaanxi Province, Xijing University, Xi’an, China
| |
Collapse
|
26
|
Shetty S, Gali S, R V, Ms J, Mn T. Deep learning-based detection of incisal translucency patterns. J Prosthet Dent 2025:S0022-3913(24)00822-9. [PMID: 39837680 DOI: 10.1016/j.prosdent.2024.11.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2024] [Revised: 11/25/2024] [Accepted: 11/27/2024] [Indexed: 01/23/2025]
Abstract
STATEMENT OF PROBLEM The evaluation of incisal translucency in anterior teeth greatly influences esthetic treatment outcomes. This evaluation is mostly subjective and often overlooked among dental professionals. The application of artificial intelligence-based models to detect the incisal translucency of anterior teeth may be of value to dentists in their restorative dental practice, but studies are lacking. PURPOSE The purpose of this study was to assess the accuracy of deep learning models in predicting the translucency patterns of anterior teeth. MATERIAL AND METHODS Approximately 240 Joint Photographic Experts Group (JPEG) images of anterior teeth from participants over 18 years were collected using a smartphone. These images were resized to 224×224 pixels and classified by the presence or absence of translucency. Augmentation techniques enhanced the training dataset, and a 3-model deep learning approach was used: YOLOv5 detected central incisors, Vision Transformers (ViT) identified translucency, and U-Net segmented the translucent areas. The images were split 80 to 20 for training and testing, with performance evaluated using accuracy, precision, recall, F1 score, confusion matrix, and dice scores. RESULTS YOLOv5 achieved a precision of 1.00 at a confidence threshold of 0.910. The ViT system showed an accuracy of 91.66%, with 58 of 64 images predicting correctly with an F1 score of 94.83%. U-Net segmentation after training with annotated images achieved an accuracy of 91% with a dice score of 0.948. CONCLUSIONS The integration of YOLOv5 for detection, ViT for classification, and U-Net for segmentation demonstrates a comprehensive approach to addressing the classification of incisal translucencies. By leveraging the strengths of deep learning models, high accuracy and precision can be achieved in detecting the incisal translucency patterns of anterior teeth.
Collapse
Affiliation(s)
- Sthithika Shetty
- Postgraduate student, Department of Prosthodontics and Crown & Bridge, Faculty of Dental Sciences, M.S. Ramaiah University of Applied Sciences, Bangalore, India
| | - Sivaranjani Gali
- Professor and Head, Department of Prosthodontics and Crown & Bridge, Faculty of Dental Sciences, M.S. Ramaiah University of Applied Sciences, Bangalore, India.
| | - Venkatesh R
- Graduate student, Department of Computer Science & Engineering (Artificial Intelligence & Machine Learning), Ramaiah Institute of Technology, Bangalore, India
| | - Jeswin Ms
- Graduate student, Department of Computer Science & Engineering (Artificial Intelligence & Machine Learning), Ramaiah Institute of Technology, Bangalore, India
| | - Thippeswamy Mn
- Principal, Dr. Ambedkar Institute of Technology, Bangalore, India
| |
Collapse
|
27
|
Liu S, Bi Y, Li Q, Ren Y, Ji H, Wang L. A deep learning based detection algorithm for anomalous behavior and anomalous item on buses. Sci Rep 2025; 15:2163. [PMID: 39820021 PMCID: PMC11739372 DOI: 10.1038/s41598-025-85962-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 01/07/2025] [Indexed: 01/19/2025] Open
Abstract
This paper proposes a new strategy for analysing and detecting abnormal passenger behavior and abnormal objects on buses. First, a library of abnormal passenger behaviors and objects on buses is established. Then, a new mask detection and abnormal object detection and analysis (MD-AODA) algorithm is proposed. The algorithm is based on the deep learning YOLOv5 (You Only Look Once) algorithm with improvements. For onboard face mask detection, a strategy based on the combination of onboard face detection and target tracking is used. To detect abnormal objects in the vehicle, a geometric scale conversion-based approach for recognizing large-size ab-normal objects is adopted. To apply the algorithm effectively to real bus data, an embedded video analysis system is designed. The system incorporates the proposed method, which results in improved accuracy and timeliness in detecting anomalies compared to existing approaches. The algorithm's effectiveness and applicability is verified through comprehensive experiments using actual video bus data. The experimental results affirm the validity and practicality of the pro-posed algorithm.
Collapse
Affiliation(s)
- Shida Liu
- School of Electrical and Control Engineering, North China University of Technology, Beijing, China
| | - Yu Bi
- School of Electrical and Control Engineering, North China University of Technology, Beijing, China
| | - Qingyi Li
- School of Electrical and Control Engineering, North China University of Technology, Beijing, China.
| | - Ye Ren
- School of Electrical and Control Engineering, North China University of Technology, Beijing, China
| | - Honghai Ji
- School of Electrical and Control Engineering, North China University of Technology, Beijing, China
| | - Li Wang
- School of Electrical and Control Engineering, North China University of Technology, Beijing, China
| |
Collapse
|
28
|
Liu W, Tao Q, Wang N, Xiao W, Pan C. YOLO-STOD: an industrial conveyor belt tear detection model based on Yolov5 algorithm. Sci Rep 2025; 15:1659. [PMID: 39794390 PMCID: PMC11723914 DOI: 10.1038/s41598-024-83619-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Accepted: 12/16/2024] [Indexed: 01/13/2025] Open
Abstract
Real-time detection of conveyor belt tearing is of great significance to ensure mining in the coal industry. The longitudinal tear damage problem of conveyor belts has the characteristics of multi-scale, abundant small targets, and complex interference sources. Therefore, in order to improve the performance of small-size tear damage detection algorithms under complex interference, a visual detection method YOLO-STOD based on deep learning was proposed. Firstly, a multi-case conveyor belt tear dataset is developed for complex interference and small-size detection. Second, the detection method YOLO-STOD is designed, which utilizes the BotNet attention mechanism to extract multi-dimensional tearing features, enhancing the model's feature extraction ability for small targets and enables the model to converge quickly under the conditions of few samples. Secondly, Shape_IOU is utilized to calculate the training loss, and the shape regression loss of the bounding box itself is considered to enhance the robustness of the model. The experimental results fully proved the effectiveness of the YOLO-STOD detection method, which constantly surpasses the competing methods and achieves 91.2%, 91.9%, and 190.966 detection accuracy and detection speed in terms of recall, Map value, and FPS, respectively, which is able to satisfy the needs of industrial real-time detection and is expected to be used in the real-time detection of conveyor belt tearing in the industrial field.
Collapse
Affiliation(s)
- Wei Liu
- School of Mechanical Engineering, Xinjiang University, Urumqi, 830000, China
| | - Qing Tao
- School of Mechanical Engineering, Xinjiang University, Urumqi, 830000, China.
| | - Nini Wang
- College of Electrical Engineering, Xinjiang University, Urumqi, 830000, China
| | - Wendong Xiao
- School of Mechanical Engineering, Xinjiang University, Urumqi, 830000, China
| | - Cen Pan
- School of Mechanical Engineering, Xinjiang University, Urumqi, 830000, China
| |
Collapse
|
29
|
Xu J, Cao L, Pan L, Li X, Zhang L, Gao H, Song W. IMC-YOLO: a detection model for assisted razor clam fishing in the mudflat environment. PeerJ Comput Sci 2025; 11:e2614. [PMID: 39896003 PMCID: PMC11784722 DOI: 10.7717/peerj-cs.2614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2024] [Accepted: 11/26/2024] [Indexed: 02/04/2025]
Abstract
In intertidal mudflat culture (IMC), the fishing efficiency and the degree of damage to nature have always been a pair of irreconcilable contradictions. To improve the efficiency of razor clam fishing and at the same time reduce the damage to the natural environment, in this study, a razor clam burrows dataset is established, and an intelligent razor clam fishing method is proposed, which realizes the accurate identification and counting of razor clam burrows by introducing the object detection technology into the razor clam fishing activity. A detection model called intertidal mudflat culture-You Only Look Once (IMC-YOLO) is proposed in this study by making improvements upon You Only Look Once version 8 (YOLOv8). In this study, firstly, at the end of the backbone network, the Iterative Attention-based Intrascale Feature Interaction (IAIFI) module module was designed and adopted to improve the model's focus on advanced features. Subsequently, to improve the model's effectiveness in detecting difficult targets such as razor clam burrows with small sizes, the head network was refactored. Then, FasterNet Block is used to replace the Bottleneck, which achieves more effective feature extraction while balancing detection accuracy and model size. Finally, the Three Branch Convolution Attention Mechanism (TBCAM) is proposed, which enables the model to focus on the specific region of interest more accurately. After testing, IMC-YOLO achieved mAP50, mAP50:95, and F1best of 0.963, 0.636, and 0.918, respectively, representing improvements of 2.2%, 3.5%, and 2.4% over the baseline model. Comparison with other mainstream object detection models confirmed that IMC-YOLO strikes a good balance between accuracy and numbers of parameters.
Collapse
Affiliation(s)
- Jianhao Xu
- College of Information Engineering, Dalian Ocean University, Dalian, China
| | - Lijie Cao
- College of Information Engineering, Dalian Ocean University, Dalian, China
| | - Lanlan Pan
- College of Mechanical and Power Engineering, Dalian Ocean University, Dalian, China
| | - Xiankun Li
- College of Mechanical and Power Engineering, Dalian Ocean University, Dalian, China
| | - Lei Zhang
- College of Information Engineering, Dalian Ocean University, Dalian, China
| | - Hongyong Gao
- College of Information Engineering, Dalian Ocean University, Dalian, China
| | - Weibo Song
- College of Information Engineering, Dalian Ocean University, Dalian, China
| |
Collapse
|
30
|
Araya-Martinez JM, Matthiesen VS, Bøgh S, Lambrecht J, Pimentel de Figueiredo R. A fast monocular 6D pose estimation method for textureless objects based on perceptual hashing and template matching. Front Robot AI 2025; 11:1424036. [PMID: 39845569 PMCID: PMC11750840 DOI: 10.3389/frobt.2024.1424036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Accepted: 12/02/2024] [Indexed: 01/24/2025] Open
Abstract
Object pose estimation is essential for computer vision applications such as quality inspection, robotic bin picking, and warehouse logistics. However, this task often requires expensive equipment such as 3D cameras or Lidar sensors, as well as significant computational resources. Many state-of-the-art methods for 6D pose estimation depend on deep neural networks, which are computationally demanding and require GPUs for real-time performance. Moreover, they usually involve the collection and labeling of large training datasets, which is costly and time-consuming. In this study, we propose a template-based matching algorithm that utilizes a novel perceptual hashing method for binary images, enabling fast and robust pose estimation. This approach allows the automatic preselection of a subset of templates, significantly reducing inference time while maintaining similar accuracy. Our solution runs efficiently on multiple devices without GPU support, offering reduced runtime and high accuracy on cost-effective hardware. We benchmarked our proposed approach on a body-in-white automotive part and a widely used publicly available dataset. Our set of experiments on a synthetically generated dataset reveals a trade-off between accuracy and computation time superior to a previous work on the same automotive-production use case. Additionally, our algorithm efficiently utilizes all CPU cores and includes adjustable parameters for balancing computation time and accuracy, making it suitable for a wide range of applications where hardware cost and power efficiency are critical. For instance, with a rotation step of 10° in the template database, we achieve an average rotation error of 10 ° , matching the template quantization level, and an average translation error of 14% of the object's size, with an average processing time of 0.3 s per image on a small form-factor NVIDIA AGX Orin device. We also evaluate robustness under partial occlusions (up to 10% occlusion) and noisy inputs (signal-to-noise ratios [SNRs] up to 10 dB), with only minor losses in accuracy. Additionally, we compare our method to state-of-the-art deep learning models on a public dataset. Although our algorithm does not outperform them in absolute accuracy, it provides a more favorable trade-off between accuracy and processing time, which is especially relevant to applications using resource-constrained devices.
Collapse
Affiliation(s)
- Jose Moises Araya-Martinez
- Industry Grade Networks and Clouds, Institute of Telecommunication Systems, Electrical Engineering and Computer Science, Technical University Berlin, Berlin, Germany
- Future Manufacturing Technologies, Mercedes-Benz AG, Sindelfingen, Germany
| | - Vinicius Soares Matthiesen
- Future Manufacturing Technologies, Mercedes-Benz AG, Sindelfingen, Germany
- Department of Materials and Production, Aalborg University, Aalborg, Denmark
| | - Simon Bøgh
- Department of Materials and Production, Aalborg University, Aalborg, Denmark
| | - Jens Lambrecht
- Industry Grade Networks and Clouds, Institute of Telecommunication Systems, Electrical Engineering and Computer Science, Technical University Berlin, Berlin, Germany
| | | |
Collapse
|
31
|
Wang H, Guo X, Zhang S, Li G, Zhao Q, Wang Z. Detection and recognition of foreign objects in Pu-erh Sun-dried green tea using an improved YOLOv8 based on deep learning. PLoS One 2025; 20:e0312112. [PMID: 39775324 PMCID: PMC11709275 DOI: 10.1371/journal.pone.0312112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2024] [Accepted: 09/27/2024] [Indexed: 01/11/2025] Open
Abstract
The quality and safety of tea food production is of paramount importance. In traditional processing techniques, there is a risk of small foreign objects being mixed into Pu-erh sun-dried green tea, which directly affects the quality and safety of the food. To rapidly detect and accurately identify these small foreign objects in Pu-erh sun-dried green tea, this study proposes an improved YOLOv8 network model for foreign object detection. The method employs an MPDIoU optimized loss function to enhance target detection performance, thereby increasing the model's precision in targeting. It incorporates the EfficientDet high-efficiency target detection network architecture module, which utilizes compound scale-centered anchor boxes and an adaptive feature pyramid to achieve efficient detection of targets of various sizes. The BiFormer bidirectional attention mechanism is introduced, allowing the model to consider both forward and backward dependencies in sequence data, significantly enhancing the model's understanding of the context of targets in images. The model is further integrated with sliced auxiliary super-inference technology and YOLOv8, which subdivides the image and conducts in-depth analysis of local features, significantly improving the model's recognition accuracy and robustness for small targets and multi-scale objects. Experimental results demonstrate that, compared to the original YOLOv8 model, the improved model has seen increases of 4.50% in Precision, 5.30% in Recall, 3.63% in mAP, and 4.9% in F1 score. When compared with the YOLOv7, YOLOv5, Faster-RCNN, and SSD network models, its accuracy has improved by 3.92%, 7.26%, 14.03%, and 11.30%, respectively. This research provides new technological means for the intelligent transformation of automated color sorters, foreign object detection equipment, and intelligent sorting systems in the high-quality production of Yunnan Pu-erh sun-dried green tea. It also provides strong technical support for the automation and intelligent development of the tea industry.
Collapse
Affiliation(s)
- Houqiao Wang
- College of Tea Science, Yunnan Agricultural University, Kunming, China
| | - Xiaoxue Guo
- College of Mechanical and Electrical Engineering, Wuhan Donghu University, Wuhan, China
| | - Shihao Zhang
- College of Mechanical and Electrical Engineering, Wuhan Donghu University, Wuhan, China
| | - Gongming Li
- College of Mechanical and Electrical Engineering, Wuhan Donghu University, Wuhan, China
| | - Qiang Zhao
- College of Mechanical and Electrical Engineering, Wuhan Donghu University, Wuhan, China
| | - Zejun Wang
- College of Tea Science, Yunnan Agricultural University, Kunming, China
| |
Collapse
|
32
|
Liu K, Bouazizi M, Xing Z, Ohtsuki T. A Comparison Study of Person Identification Using IR Array Sensors and LiDAR. SENSORS (BASEL, SWITZERLAND) 2025; 25:271. [PMID: 39797062 PMCID: PMC11723478 DOI: 10.3390/s25010271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2024] [Revised: 12/31/2024] [Accepted: 01/03/2025] [Indexed: 01/13/2025]
Abstract
Person identification is a critical task in applications such as security and surveillance, requiring reliable systems that perform robustly under diverse conditions. This study evaluates the Vision Transformer (ViT) and ResNet34 models across three modalities-RGB, thermal, and depth-using datasets collected with infrared array sensors and LiDAR sensors in controlled scenarios and varying resolutions (16 × 12 to 640 × 480) to explore their effectiveness in person identification. Preprocessing techniques, including YOLO-based cropping, were employed to improve subject isolation. Results show a similar identification performance between the three modalities, in particular in high resolution (i.e., 640 × 480), with RGB image classification reaching 100.0%, depth images reaching 99.54% and thermal images reaching 97.93%. However, upon deeper investigation, thermal images show more robustness and generalizability by maintaining focus on subject-specific features even at low resolutions. In contrast, RGB data performs well at high resolutions but exhibits reliance on background features as resolution decreases. Depth data shows significant degradation at lower resolutions, suffering from scattered attention and artifacts. These findings highlight the importance of modality selection, with thermal imaging emerging as the most reliable. Future work will explore multi-modal integration, advanced preprocessing, and hybrid architectures to enhance model adaptability and address current limitations. This study highlights the potential of thermal imaging and the need for modality-specific strategies in designing robust person identification systems.
Collapse
Affiliation(s)
- Kai Liu
- Graduate School of Science and Technology, Keio University, Yokohama 223-8522, Japan; (K.L.); (Z.X.)
| | - Mondher Bouazizi
- Faculty of Science and Technology, Keio University, Yokohama 223-8522, Japan;
| | - Zelin Xing
- Graduate School of Science and Technology, Keio University, Yokohama 223-8522, Japan; (K.L.); (Z.X.)
| | - Tomoaki Ohtsuki
- Faculty of Science and Technology, Keio University, Yokohama 223-8522, Japan;
| |
Collapse
|
33
|
Yang F, Chen P, Lin S, Zhan T, Hong X, Chen Y. A weak edge estimation based multi-task neural network for OCT segmentation. PLoS One 2025; 20:e0316089. [PMID: 39752440 PMCID: PMC11698417 DOI: 10.1371/journal.pone.0316089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Accepted: 12/06/2024] [Indexed: 01/06/2025] Open
Abstract
Optical Coherence Tomography (OCT) offers high-resolution images of the eye's fundus. This enables thorough analysis of retinal health by doctors, providing a solid basis for diagnosis and treatment. With the development of deep learning, deep learning-based methods are becoming more popular for fundus OCT image segmentation. Yet, these methods still encounter two primary challenges. Firstly, deep learning methods are sensitive to weak edges. Secondly, the high cost of annotating medical image data results in a lack of labeled data, leading to overfitting during model training. To tackle these challenges, we introduce the Multi-Task Attention Mechanism Network with Pruning (MTAMNP), consisting of a segmentation branch and a boundary regression branch. The boundary regression branch utilizes an adaptive weighted loss function derived from the Truncated Signed Distance Function(TSDF), improving the model's capacity to preserve weak edge details. The Spatial Attention Based Dual-Branch Information Fusion Block links these branches, enabling mutual benefit. Furthermore, we present a structured pruning method grounded in channel attention to decrease parameter count, mitigate overfitting, and uphold segmentation accuracy. Our method surpasses other cutting-edge segmentation networks on two widely accessible datasets, achieving Dice scores of 84.09% and 93.84% on the HCMS and Duke datasets.
Collapse
Affiliation(s)
- Fan Yang
- School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing, Jiangsu, China
| | - Pu Chen
- School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing, Jiangsu, China
| | - Shiqi Lin
- School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing, Jiangsu, China
| | - Tianming Zhan
- School of Computer Science, Nanjing Audit University, Nanjing, Jiangsu, China
- Center for Applied Mathematics of Jiangsu Province, Nanjing University of Information Science and Technology, Nanjing, Jiangsu, China
| | - Xunning Hong
- The First Affiliated Hospital with Nanjing Medical University, Nanjing, Jiangsu, China
| | - Yunjie Chen
- School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing, Jiangsu, China
- Center for Applied Mathematics of Jiangsu Province, Nanjing University of Information Science and Technology, Nanjing, Jiangsu, China
- Jiangsu International Joint Laboratory on System Modeling and Data Analysis, Nanjing University of Information Science and Technology, Nanjing, Jiangsu, China
| |
Collapse
|
34
|
Tolu‐Akinnawo OZ, Ezekwueme F, Omolayo O, Batheja S, Awoyemi T. Advancements in Artificial Intelligence in Noninvasive Cardiac Imaging: A Comprehensive Review. Clin Cardiol 2025; 48:e70087. [PMID: 39871619 PMCID: PMC11772728 DOI: 10.1002/clc.70087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/13/2024] [Accepted: 01/06/2025] [Indexed: 01/29/2025] Open
Abstract
BACKGROUND Technological advancements in artificial intelligence (AI) are redefining cardiac imaging by providing advanced tools for analyzing complex health data. AI is increasingly applied across various imaging modalities, including echocardiography, magnetic resonance imaging (MRI), computed tomography (CT), and nuclear imaging, to enhance diagnostic workflows and improve patient outcomes. HYPOTHESIS Integrating AI into cardiac imaging enhances image quality, accelerates processing times, and improves diagnostic accuracy, enabling timely and personalized interventions that lead to better health outcomes. METHODS A comprehensive literature review was conducted to examine the impact of machine learning and deep learning algorithms on diagnostic accuracy, the detection of subtle patterns and anomalies, and key challenges such as data quality, patient safety, and regulatory barriers. RESULTS Findings indicate that AI integration in cardiac imaging enhances image quality, reduces processing times, and improves diagnostic precision, contributing to better clinical decision-making. Emerging machine learning techniques demonstrate the ability to identify subtle cardiac abnormalities that traditional methods may overlook. However, significant challenges persist, including data standardization, regulatory compliance, and patient safety concerns. CONCLUSIONS AI holds transformative potential in cardiac imaging, significantly advancing diagnosis and patient outcomes. Overcoming barriers to implementation will require ongoing collaboration among clinicians, researchers, and regulatory bodies. Further research is essential to ensure the safe, ethical, and effective integration of AI in cardiology, supporting its broader application to improve cardiovascular health.
Collapse
Affiliation(s)
| | - Francis Ezekwueme
- Department of Internal MedicineUniversity of Pittsburgh Medical CenterMcKeesportPennsylvaniaUSA
| | - Olukunle Omolayo
- Department of Internal MedicineLugansk State Medical UniversityLuganskUkraine
| | - Sasha Batheja
- Department of Internal MedicineGovernment Medical CollegePatialaPunjabIndia
| | - Toluwalase Awoyemi
- Department of Internal MedicineFeinberg School of Medicine, Northwestern UniversityChicagoIllinoisUSA
| |
Collapse
|
35
|
Fu J, Liu R, Zeng T, Cong P, Liu X, Sun Y. A study on CT detection image generation based on decompound synthesize method. JOURNAL OF X-RAY SCIENCE AND TECHNOLOGY 2025; 33:72-85. [PMID: 39973778 DOI: 10.1177/08953996241296249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
BACKGROUND Nuclear graphite and carbon components are vital structural elements in the cores of high-temperature gas-cooled reactors(HTGR), serving crucial roles in neutron reflection, moderation, and insulation. The structural integrity and stable operation of these reactors heavily depend on the quality of these components. Helical Computed Tomography (CT) technology provides a method for detecting and intelligently identifying defects within these structures. However, the scarcity of defect datasets limits the performance of deep learning-based detection algorithms due to small sample sizes and class imbalance. OBJECTIVE Given the limited number of actual CT reconstruction images of components and the sparse distribution of defects, this study aims to address the challenges of small sample sizes and class imbalance in defect detection model training by generating approximate CT reconstruction images to augment the defect detection training dataset. METHODS We propose a novel CT detection image generation algorithm called the Decompound Synthesize Method (DSM), which decomposes the image generation process into three steps: model conversion, background generation, and defect synthesis. First, STL files of various industrial components are converted into voxel data, which undergo forward projection and image reconstruction to obtain corresponding CT images. Next, the Contour-CycleGAN model is employed to generate synthetic images that closely resemble actual CT images. Finally, defects are randomly sampled from an existing defect library and added to the images using the Copy-Adjust-Paste (CAP) method. These steps significantly expand the training dataset with images that closely mimic actual CT reconstructions. RESULTS Experimental results validate the effectiveness of the proposed image generation method in defect detection tasks. Datasets generated using DSM exhibit greater similarity to actual CT images, and when combined with original data for training, these datasets enhance defect detection accuracy compared to using only the original images. CONCLUSION The DSM shows promise in addressing the challenges of small sample sizes and class imbalance. Future research can focus on further optimizing the generation algorithm and refining the model structure to enhance the performance and accuracy of defect detection models.
Collapse
Affiliation(s)
- Jintao Fu
- Institute of Nuclear and New Energy Technology, Tsinghua University, Beijing, China
- Beijing Key Laboratory of Nuclear Detection Technology, Beijing, China
| | - Renjie Liu
- Institute of Nuclear and New Energy Technology, Tsinghua University, Beijing, China
- Beijing Key Laboratory of Nuclear Detection Technology, Beijing, China
| | - Tianchen Zeng
- Institute of Nuclear and New Energy Technology, Tsinghua University, Beijing, China
- Beijing Key Laboratory of Nuclear Detection Technology, Beijing, China
| | - Peng Cong
- Institute of Nuclear and New Energy Technology, Tsinghua University, Beijing, China
- Beijing Key Laboratory of Nuclear Detection Technology, Beijing, China
| | - Ximing Liu
- Institute of Nuclear and New Energy Technology, Tsinghua University, Beijing, China
- Beijing Key Laboratory of Nuclear Detection Technology, Beijing, China
| | - Yuewen Sun
- Institute of Nuclear and New Energy Technology, Tsinghua University, Beijing, China
- Beijing Key Laboratory of Nuclear Detection Technology, Beijing, China
| |
Collapse
|
36
|
Tsai PF, Yuan SM. Using Infrared Raman Spectroscopy with Machine Learning and Deep Learning as an Automatic Textile-Sorting Technology for Waste Textiles. SENSORS (BASEL, SWITZERLAND) 2024; 25:57. [PMID: 39796848 PMCID: PMC11722779 DOI: 10.3390/s25010057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/26/2024] [Revised: 12/18/2024] [Accepted: 12/20/2024] [Indexed: 01/13/2025]
Abstract
With the fast-fashion trend, an increasing number of discarded clothing items are being eliminated at the stages of both pre-consumer and post-consumer each year. The linear economy produces large volumes of waste, which harm environmental sustainability. This study addresses the pressing need for efficient textile recycling in the circular economy (CE). We developed a highly accurate Raman-spectroscopy-based textile-sorting technology, which overcomes the challenge of diverse fiber combinations in waste textiles. By categorizing textiles into six groups based on their fiber compositions, the sorter improves the quality of recycled fibers. Our study demonstrates the potential of Raman spectroscopy in providing detailed molecular compositional information, which is crucial for effective textile sorting. Furthermore, AI technologies, including PCA, KNN, SVM, RF, ANN, and CNN, are integrated into the sorting process, further enhancing the efficiency to 1 piece per second with a precision of over 95% in grouping textiles based on the fiber compositional analysis. This interdisciplinary approach offers a promising solution for sustainable textile recycling, contributing to the objectives of the CE.
Collapse
Affiliation(s)
| | - Shyan-Ming Yuan
- Department of Computer Science, National Yang Ming Chiao Tung University, ChiaoTung Campus, Hsinchu 300093, Taiwan;
| |
Collapse
|
37
|
Anagnostopoulos CN, Krinidis S. Sensors and Advanced Sensing Techniques for Computer Vision Applications. SENSORS (BASEL, SWITZERLAND) 2024; 25:35. [PMID: 39796825 PMCID: PMC11722863 DOI: 10.3390/s25010035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2024] [Accepted: 12/20/2024] [Indexed: 01/13/2025]
Abstract
Computer vision is a multidisciplinary field that enables machines to interpret and understand visual information from the world, simulating human vision [...].
Collapse
Affiliation(s)
| | - Stelios Krinidis
- Management Science and Technology Department, Democritus University of Thrace (DUTh), 65404 Kavala, Greece
| |
Collapse
|
38
|
Fergus P, Chalmers C, Matthews N, Nixon S, Burger A, Hartley O, Sutherland C, Lambin X, Longmore S, Wich S. Towards Context-Rich Automated Biodiversity Assessments: Deriving AI-Powered Insights from Camera Trap Data. SENSORS (BASEL, SWITZERLAND) 2024; 24:8122. [PMID: 39771857 PMCID: PMC11679253 DOI: 10.3390/s24248122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2024] [Revised: 12/12/2024] [Accepted: 12/18/2024] [Indexed: 01/11/2025]
Abstract
Camera traps offer enormous new opportunities in ecological studies, but current automated image analysis methods often lack the contextual richness needed to support impactful conservation outcomes. Integrating vision-language models into these workflows could address this gap by providing enhanced contextual understanding and enabling advanced queries across temporal and spatial dimensions. Here, we present an integrated approach that combines deep learning-based vision and language models to improve ecological reporting using data from camera traps. We introduce a two-stage system: YOLOv10-X to localise and classify species (mammals and birds) within images and a Phi-3.5-vision-instruct model to read YOLOv10-X bounding box labels to identify species, overcoming its limitation with hard-to-classify objects in images. Additionally, Phi-3.5 detects broader variables, such as vegetation type and time of day, providing rich ecological and environmental context to YOLO's species detection output. When combined, this output is processed by the model's natural language system to answer complex queries, and retrieval-augmented generation (RAG) is employed to enrich responses with external information, like species weight and IUCN status (information that cannot be obtained through direct visual analysis). Combined, this information is used to automatically generate structured reports, providing biodiversity stakeholders with deeper insights into, for example, species abundance, distribution, animal behaviour, and habitat selection. Our approach delivers contextually rich narratives that aid in wildlife management decisions. By providing contextually rich insights, our approach not only reduces manual effort but also supports timely decision making in conservation, potentially shifting efforts from reactive to proactive.
Collapse
Affiliation(s)
- Paul Fergus
- School of Computer Science and Mathematics, Liverpool John Moores University, James Parsons Building, Byrom Street, Liverpool L3 3AF, UK;
| | - Carl Chalmers
- School of Computer Science and Mathematics, Liverpool John Moores University, James Parsons Building, Byrom Street, Liverpool L3 3AF, UK;
| | - Naomi Matthews
- Chester Zoo, Upton-by-Chester, Chester CH2 IEU, UK; (N.M.); (S.N.)
| | - Stuart Nixon
- Chester Zoo, Upton-by-Chester, Chester CH2 IEU, UK; (N.M.); (S.N.)
| | - André Burger
- Welgevonden Game Reserve, P.O. Box 433, Vaalwater 0530, South Africa;
| | - Oliver Hartley
- School of Mathematics and Statistics, Mathematical Institute, University of St Andrews, North Haugh, St Andrews KY16 9SS, UK; (O.H.); (C.S.)
| | - Chris Sutherland
- School of Mathematics and Statistics, Mathematical Institute, University of St Andrews, North Haugh, St Andrews KY16 9SS, UK; (O.H.); (C.S.)
| | - Xavier Lambin
- School of Biological Sciences, University of Aberdeen, Tillydrone Avenue, Aberdeen AB24 2TZ, UK;
| | - Steven Longmore
- Astrophysics Research Institute, Liverpool John Moores University, IC2, Liverpool Science Park, 146 Brownlow Hill, Liverpool L3 5RF, UK;
| | - Serge Wich
- School of Biological and Environmental Sciences, Liverpool John Moores University, James Parsons Building, Byrom Street, Liverpool L3 3AF, UK;
| |
Collapse
|
39
|
Wang M, Zhou Y, Yao S, Wu J, Zhu M, Dong L, Wang D. Enhancing vector control: AI-based identification and counting of Aedes albopictus (Diptera: Culicidae) mosquito eggs. Parasit Vectors 2024; 17:511. [PMID: 39696631 DOI: 10.1186/s13071-024-06587-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Accepted: 11/17/2024] [Indexed: 12/20/2024] Open
Abstract
BACKGROUND Dengue fever poses a significant global public health concern, necessitating the monitoring of Aedes mosquito population density. These mosquitoes serve as the disease vectors, making their surveillance crucial for dengue prevention. The objective of this study was to address the difficulty associated with identifying and counting mosquito eggs of wild strains during the monitoring of Aedes albopictus (Diptera: Culicidae) density via ovitraps in field surveys. METHODS We constructed a dataset comprising 1729 images of Ae. albopictus mosquito eggs from wild strains and employed the Segment Anything Model to enhance the applicability of the detection model in complex environments. A two-stage Faster Region-based Convolutional Neural Network model was used to establish a detection model for Ae. albopictus mosquito eggs. The identification and counting process involved applying the tile overlapping method, while morphological filtering was employed to remove impurities. The model's performance was evaluated in terms of precision, recall, and F1 score, and counting accuracy was assessed using R-squared and root mean square error (RMSE). RESULTS The experimental results revealed the model's remarkable identification capabilities, achieving precision of 0.977, recall of 0.978, and an F1 score of 0.977. The R-squared value between the actual and identified egg counts was 0.997, with an RMSE of 1.742. The average detection time for a single tile was 0.48 s, which was more than 10 times as fast as the human-computer interaction method in counting an entire image. CONCLUSIONS The model demonstrated excellent performance in recognizing and counting Ae. albopictus mosquito eggs, indicating great application potential. This study offers novel technological support for enhancing vector control effectiveness and public health standards.
Collapse
Affiliation(s)
- Minghao Wang
- Key Laboratory of Geographic Information Science, Ministry of Education, Shanghai, China
- School of Geographic Sciences, East China Normal University, Shanghai, China
| | - Yibin Zhou
- Minhang Center for Disease Prevention and Control, Shanghai, China
| | - Shenjun Yao
- Key Laboratory of Geographic Information Science, Ministry of Education, Shanghai, China.
- School of Geographic Sciences, East China Normal University, Shanghai, China.
| | - Jianping Wu
- Key Laboratory of Spatial-Temporal Big Data Analysis and Application of Natural Resources in Megacities, Ministry of Natural Resources, Shanghai, China
- Institute of Cartography, East China Normal University, Shanghai, China
| | - Minhui Zhu
- Minhang Center for Disease Prevention and Control, Shanghai, China
| | - Linjuan Dong
- Minhang Center for Disease Prevention and Control, Shanghai, China
| | - Dunjia Wang
- Minhang Center for Disease Prevention and Control, Shanghai, China
| |
Collapse
|
40
|
Shi C, Zhu D, Zhou C, Cheng S, Zou C. Gpmb-yolo: a lightweight model for efficient blood cell detection in medical imaging. Health Inf Sci Syst 2024; 12:24. [PMID: 39668840 PMCID: PMC11632753 DOI: 10.1007/s13755-024-00285-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 02/24/2024] [Indexed: 12/14/2024] Open
Abstract
In the field of biomedical science, blood cell detection in microscopic images is crucial for aiding physicians in diagnosing blood-related diseases and plays a pivotal role in advancing medicine toward more precise and efficient treatment directions. Addressing the time-consuming and error-prone issues of traditional manual detection methods, as well as the challenge existing blood cell detection technologies face in meeting both high accuracy and real-time requirements, this study proposes a lightweight blood cell detection model based on YOLOv8n, named GPMB-YOLO. This model utilizes advanced lightweight strategies and PGhostC2f design, effectively reducing model complexity and enhancing detection speed. The integration of the simple parameter-free attention mechanism (SimAM) significantly enhances the model's feature extraction ability. Furthermore, we have designed a multidimensional attention-enhanced bidirectional feature pyramid network structure, MCA-BiFPN, optimizing the effect of multi-scale feature fusion. And use genetic algorithms for hyperparameter optimization, further improving detection accuracy. Experimental results validate the effectiveness of the GPMB-YOLO model, which realized a 3.2% increase in mean Average Precision (mAP) compared to the baseline YOLOv8n model and a marked reduction in model complexity. Furthermore, we have developed a blood cell detection system and deployed the model for application. This study serves as a valuable reference for the efficient detection of blood cells in medical images.
Collapse
Affiliation(s)
- Chenyang Shi
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004 China
| | - Donglin Zhu
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004 China
| | - Changjun Zhou
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua, 321004 China
| | - Shi Cheng
- School of Computer Science, Shaanxi Normal University, Xi’an, 710119 China
| | - Chengye Zou
- College of Information Science and Engineering, Yanshan University, Qinhuangdao, 066004 China
| |
Collapse
|
41
|
Zheng Z, Zhang D, Liang X, Liu X, Fang G. RadarFormer: End-to-End Human Perception With Through-Wall Radar and Transformers. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:18285-18299. [PMID: 37738194 DOI: 10.1109/tnnls.2023.3314031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/24/2023]
Abstract
For fine-grained human perception tasks such as pose estimation and activity recognition, radar-based sensors show advantages over optical cameras in low-visibility, privacy-aware, and wall-occlusive environments. Radar transmits radio frequency signals to irradiate the target of interest and store the target information in the echo signals. One common approach is to transform the echoes into radar images and extract the features with convolutional neural networks. This article introduces RadarFormer, the first method that introduces the self-attention (SA) mechanism to perform human perception tasks directly from radar echoes. It bypasses the imaging algorithm and realizes end-to-end signal processing. Specifically, we give constructive proof that processing radar echoes using the SA mechanism is at least as expressive as processing radar images using the convolutional layer. On this foundation, we design RadarFormer, which is a Transformer-like model to process radar signals. It benefits from the fast-/slow-time SA mechanism considering the physical characteristics of radar signals. RadarFormer extracts human representations from radar echoes and handles various downstream human perception tasks. The experimental results demonstrate that our method outperforms the state-of-the-art radar-based methods both in performance and computational cost and obtains accurate human perception results even in dark and occlusive environments.
Collapse
|
42
|
Magro M, Covallero N, Gambaro E, Ruffaldi E, De Momi E. A dual-instrument Kalman-based tracker to enhance robustness of microsurgical tools tracking. Int J Comput Assist Radiol Surg 2024; 19:2351-2362. [PMID: 39133431 DOI: 10.1007/s11548-024-03246-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 07/26/2024] [Indexed: 08/13/2024]
Abstract
PURPOSE The integration of a surgical robotic instrument tracking module within optical microscopes holds the potential to advance microsurgery practices, as it facilitates automated camera movements, thereby augmenting the surgeon's capability in executing surgical procedures. METHODS In the present work, an innovative detection backbone based on spatial attention module is implemented to enhance the detection accuracy of small objects within the image. Additionally, we have introduced a robust data association technique, capable to re-track surgical instrument, mainly based on the knowledge of the dual-instrument robotics system, Intersection over Union metric and Kalman filter. RESULTS The effectiveness of this pipeline was evaluated through testing on a dataset comprising ten manually annotated videos of anastomosis procedures involving either animal or phantom vessels, exploiting the Symani®Surgical System-a dedicated robotic platform designed for microsurgery. The multiple object tracking precision (MOTP) and the multiple object tracking accuracy (MOTA) are used to evaluate the performance of the proposed approach, and a new metric is computed to demonstrate the efficacy in stabilizing the tracking result along the video frames. An average MOTP of 74±0.06% and a MOTA of 99±0.03% over the test videos were found. CONCLUSION These results confirm the potential of the proposed approach in enhancing precision and reliability in microsurgical instrument tracking. Thus, the integration of attention mechanisms and a tailored data association module could be a solid base for automatizing the motion of optical microscopes.
Collapse
Affiliation(s)
- Mattia Magro
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, Italy.
- Medical Microinstruments, Inc., Wilmington, USA.
| | | | | | | | - Elena De Momi
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, Italy
| |
Collapse
|
43
|
Rocchetta R, Mey A, Oliehoek FA. A Survey on Scenario Theory, Complexity, and Compression-Based Learning and Generalization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:16985-16999. [PMID: 37703153 DOI: 10.1109/tnnls.2023.3308828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/15/2023]
Abstract
This work investigates formal generalization error bounds that apply to support vector machines (SVMs) in realizable and agnostic learning problems. We focus on recently observed parallels between probably approximately correct (PAC)-learning bounds, such as compression and complexity-based bounds, and novel error guarantees derived within scenario theory. Scenario theory provides nonasymptotic and distributional-free error bounds for models trained by solving data-driven decision-making problems. Relevant theorems and assumptions are reviewed and discussed. We propose a numerical comparison of the tightness and effectiveness of theoretical error bounds for support vector classifiers trained on several randomized experiments from 13 real-life problems. This analysis allows for a fair comparison of different approaches from both conceptual and experimental standpoints. Based on the numerical results, we argue that the error guarantees derived from scenario theory are often tighter for realizable problems and always yield informative results, i.e., probability bounds tighter than a vacuous [0, 1] interval. This work promotes scenario theory as an alternative tool for model selection, structural-risk minimization, and generalization error analysis of SVMs. In this way, we hope to bring the communities of scenario and statistical learning theory closer, so that they can benefit from each other's insights.
Collapse
|
44
|
Dalva Y, Pehlivan H, Altindis SF, Dundar A. Benchmarking the Robustness of Instance Segmentation Models. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:17021-17035. [PMID: 37721888 DOI: 10.1109/tnnls.2023.3310985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/20/2023]
Abstract
This article presents a comprehensive evaluation of instance segmentation models with respect to real-world image corruptions as well as out-of-domain image collections, e.g., images captured by a different set-up than the training dataset. The out-of-domain image evaluation shows the generalization capability of models, an essential aspect of real-world applications, and an extensively studied topic of domain adaptation. These presented robustness and generalization evaluations are important when designing instance segmentation models for real-world applications and picking an off-the-shelf pretrained model to directly use for the task at hand. Specifically, this benchmark study includes state-of-the-art network architectures, network backbones, normalization layers, models trained starting from scratch versus pretrained networks, and the effect of multitask training on robustness and generalization. Through this study, we gain several insights. For example, we find that group normalization (GN) enhances the robustness of networks across corruptions where the image contents stay the same but corruptions are added on top. On the other hand, batch normalization (BN) improves the generalization of the models across different datasets where statistics of image features change. We also find that single-stage detectors do not generalize well to larger image resolutions than their training size. On the other hand, multistage detectors can easily be used on images of different sizes. We hope that our comprehensive study will motivate the development of more robust and reliable instance segmentation models.
Collapse
|
45
|
Singh R, Singh N, Kaur L. Deep learning methods for 3D magnetic resonance image denoising, bias field and motion artifact correction: a comprehensive review. Phys Med Biol 2024; 69:23TR01. [PMID: 39569887 DOI: 10.1088/1361-6560/ad94c7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Accepted: 11/19/2024] [Indexed: 11/22/2024]
Abstract
Magnetic resonance imaging (MRI) provides detailed structural information of the internal body organs and soft tissue regions of a patient in clinical diagnosis for disease detection, localization, and progress monitoring. MRI scanner hardware manufacturers incorporate various post-acquisition image-processing techniques into the scanner's computer software tools for different post-processing tasks. These tools provide a final image of adequate quality and essential features for accurate clinical reporting and predictive interpretation for better treatment planning. Different post-acquisition image-processing tasks for MRI quality enhancement include noise removal, motion artifact reduction, magnetic bias field correction, and eddy electric current effect removal. Recently, deep learning (DL) methods have shown great success in many research fields, including image and video applications. DL-based data-driven feature-learning approaches have great potential for MR image denoising and image-quality-degrading artifact correction. Recent studies have demonstrated significant improvements in image-analysis tasks using DL-based convolutional neural network techniques. The promising capabilities and performance of DL techniques in various problem-solving domains have motivated researchers to adapt DL methods to medical image analysis and quality enhancement tasks. This paper presents a comprehensive review of DL-based state-of-the-art MRI quality enhancement and artifact removal methods for regenerating high-quality images while preserving essential anatomical and physiological feature maps without destroying important image information. Existing research gaps and future directions have also been provided by highlighting potential research areas for future developments, along with their importance and advantages in medical imaging.
Collapse
Affiliation(s)
- Ram Singh
- Department of Computer Science & Engineering, Punjabi University, Chandigarh Road, Patiala 147002, Punjab, India
| | - Navdeep Singh
- Department of Computer Science & Engineering, Punjabi University, Chandigarh Road, Patiala 147002, Punjab, India
| | - Lakhwinder Kaur
- Department of Computer Science & Engineering, Punjabi University, Chandigarh Road, Patiala 147002, Punjab, India
| |
Collapse
|
46
|
Liu H, Zhao X, Liu Q, Chen W. An Optimization Method for PCB Surface Defect Detection Model Based on Measurement of Defect Characteristics and Backbone Network Feature Information. SENSORS (BASEL, SWITZERLAND) 2024; 24:7373. [PMID: 39599149 PMCID: PMC11598505 DOI: 10.3390/s24227373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2024] [Revised: 11/03/2024] [Accepted: 11/17/2024] [Indexed: 11/29/2024]
Abstract
Printed Circuit Boards (PCBs) are essential components in electronic devices, making defect detection crucial. PCB surface defects are diverse, complex, low in feature resolution, and often resemble the background, leading to detection challenges. This paper proposes the YOLOv8_DSM algorithm for PCB surface defect detection, optimized based on the three major characteristics of defect targets and feature map visualization. First, to address the complexity and variety of defect shapes, we introduce CSPLayer_2DCNv3, which incorporates deformable convolution into the backbone network. This enhances adaptive defect feature extraction, effectively capturing diverse defect characteristics. Second, to handle low feature resolution and background resemblance, we design a Shallow-layer Low-semantic Feature Fusion Module (SLFFM). By visualizing the last four downsampling convolution layers of the YOLOv8 backbone, we incorporate feature information from the second downsampling layer into SLFFM. We apply feature map separation-based SPDConv for downsampling, providing PAN-FPN with rich, fine-grained shallow-layer features. Additionally, SLFFM employs the bi-level routing attention (BRA) mechanism as a feature aggregation module, mitigating defect-background similarity issues. Lastly, MPDIoU is used as the bounding box loss regression function, improving training efficiency by enhancing convergence speed and accuracy. Experimental results show that YOLOv8_DSM achieves a mAP (0.5:0.9) of 63.4%, representing a 5.14% improvement over the original model. The model's Frames Per Second (FPS) reaches 144.6. To meet practical engineering requirements, the designed PCB defect detection model is deployed in a PCB quality inspection system on a PC platform.
Collapse
Affiliation(s)
- Huixiang Liu
- School of Automation, Beijing Information Science and Technology University, Beijing 100192, China; (H.L.); (X.Z.); (Q.L.)
| | - Xin Zhao
- School of Automation, Beijing Information Science and Technology University, Beijing 100192, China; (H.L.); (X.Z.); (Q.L.)
| | - Qiong Liu
- School of Automation, Beijing Information Science and Technology University, Beijing 100192, China; (H.L.); (X.Z.); (Q.L.)
- Jiangxi Research Institute of Beihang University, Nanchang 330096, China
| | - Wenbai Chen
- School of Automation, Beijing Information Science and Technology University, Beijing 100192, China; (H.L.); (X.Z.); (Q.L.)
| |
Collapse
|
47
|
Yu W, Liu R, Chen D, Hu Q. Explainability Enhanced Object Detection Transformer With Feature Disentanglement. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2024; 33:6439-6454. [PMID: 39531564 DOI: 10.1109/tip.2024.3492733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2024]
Abstract
Explainability is a pivotal factor in determining whether a deep learning model can be authorized in critical applications. To enhance the explainability of models of end-to-end object DEtection with TRansformer (DETR), we introduce a disentanglement method that constrains the feature learning process, following a divide-and-conquer decoupling paradigm, similar to how people understand complex real-world problems. We first demonstrate the entangled property of the features between the extractor and detector and find that the regression function is a key factor contributing to the deterioration of disentangled feature activation. These highly entangled features always activate the local characteristics, making it difficult to cover the semantic information of an object, which also reduces the interpretability of single-backbone object detection models. Thus, an Explainability Enhanced object detection Transformer with feature Disentanglement (DETD) model is proposed, in which the Tensor Singular Value Decomposition (T-SVD) is used to produce feature bases and the Batch averaged Feature Spectral Penalization (BFSP) loss is introduced to constrain the disentanglement of the feature and balance the semantic activation. The proposed method is applied across three prominent backbones, two DETR variants, and a CNN based model. By combining two optimization techniques, extensive experiments on two datasets consistently demonstrate that the DETD model outperforms the counterpart in terms of object detection performance and feature disentanglement. The Grad-CAM visualizations demonstrate the enhancement of feature learning explainability in the disentanglement view.
Collapse
|
48
|
Gibbs JA, Burgess AJ. Application of deep learning for the analysis of stomata: a review of current methods and future directions. JOURNAL OF EXPERIMENTAL BOTANY 2024; 75:6704-6718. [PMID: 38716775 PMCID: PMC11565211 DOI: 10.1093/jxb/erae207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 05/07/2024] [Indexed: 11/16/2024]
Abstract
Plant physiology and metabolism rely on the function of stomata, structures on the surface of above-ground organs that facilitate the exchange of gases with the atmosphere. The morphology of the guard cells and corresponding pore that make up the stomata, as well as the density (number per unit area), are critical in determining overall gas exchange capacity. These characteristics can be quantified visually from images captured using microscopy, traditionally relying on time-consuming manual analysis. However, deep learning (DL) models provide a promising route to increase the throughput and accuracy of plant phenotyping tasks, including stomatal analysis. Here we review the published literature on the application of DL for stomatal analysis. We discuss the variation in pipelines used, from data acquisition, pre-processing, DL architecture, and output evaluation to post-processing. We introduce the most common network structures, the plant species that have been studied, and the measurements that have been performed. Through this review, we hope to promote the use of DL methods for plant phenotyping tasks and highlight future requirements to optimize uptake, predominantly focusing on the sharing of datasets and generalization of models as well as the caveats associated with utilizing image data to infer physiological function.
Collapse
Affiliation(s)
- Jonathon A Gibbs
- Agriculture and Environmental Sciences, School of Biosciences, University of Nottingham Sutton Bonington Campus, Loughborough LE12 5RD, UK
| | - Alexandra J Burgess
- Agriculture and Environmental Sciences, School of Biosciences, University of Nottingham Sutton Bonington Campus, Loughborough LE12 5RD, UK
| |
Collapse
|
49
|
Chen P, Liu S, Lu W, Lu F, Ding B. WCAY object detection of fractures for X-ray images of multiple sites. Sci Rep 2024; 14:26702. [PMID: 39496710 PMCID: PMC11535499 DOI: 10.1038/s41598-024-77878-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Accepted: 10/25/2024] [Indexed: 11/06/2024] Open
Abstract
The WCAY (weighted channel attention YOLO) model, which is meticulously crafted to identify fracture features across diverse X-ray image sites, is presented herein. This model integrates novel core operators and an innovative attention mechanism to enhance its efficacy. Initially, leveraging the benefits of dynamic snake convolution (DSConv), which is adept at capturing elongated tubular structural features, we introduce the DSC-C2f module to augment the model's fracture detection performance by replacing a portion of C2f. Subsequently, we integrate the newly proposed weighted channel attention (WCA) mechanism into the architecture to bolster feature fusion and improve fracture detection across various sites. Comparative experiments were conducted, to evaluate the performances of several attention mechanisms. These enhancement strategies were validated through experimentation on public X-ray image datasets (FracAtlas and GRAZPEDWRI-DX). Multiple experimental comparisons substantiated the model's efficacy, demonstrating its superior accuracy and real-time detection capabilities. According to the experimental findings, on the FracAtlas dataset, our WCAY model exhibits a notable 8.8% improvement in mean average precision (mAP) over the original model. On the GRAZPEDWRI-DX dataset, the mAP reaches 64.4%, with a detection accuracy of 93.9% for the "fracture" category alone. The proposed model represents a substantial improvement over the original algorithm compared to other state-of-the-art object detection models. The code is publicly available at https://github.com/cccp421/Fracture-Detection-WCAY .
Collapse
Affiliation(s)
- Peng Chen
- Heilongjiang University, Harbin, 150080, China
| | - Songyan Liu
- Heilongjiang University, Harbin, 150080, China.
| | - Wenbin Lu
- Heilongjiang University, Harbin, 150080, China
| | - Fangpeng Lu
- Heilongjiang University, Harbin, 150080, China
| | - Boyang Ding
- Heilongjiang University, Harbin, 150080, China
| |
Collapse
|
50
|
Tian Q, Zhao G, Yan C, Yao L, Qu J, Yin L, Feng H, Yao N, Yu Q. Enhancing practicality of deep learning for crop disease identification under field conditions: insights from model evaluation and crop-specific approaches. PEST MANAGEMENT SCIENCE 2024; 80:5864-5875. [PMID: 39030887 DOI: 10.1002/ps.8317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 06/16/2024] [Accepted: 07/03/2024] [Indexed: 07/22/2024]
Abstract
BACKGROUND Crop diseases can lead to significant yield losses and food shortages if not promptly identified and managed by farmers. With the advancements in convolutional neural networks (CNN) and the widespread availability of smartphones, automated and accurate identification of crop diseases has become feasible. However, although previous studies have achieved high accuracy (>95%) under laboratory conditions (Lab) using mixed data sets of multiple crops, these models often falter when deployed under field conditions (Field). In this study, we aimed to evaluate disease identification accuracy under Lab, Field, and Mixed (Lab and Field) conditions using an assembled data set encompassing 14 diseases of apple (Malus × domestica Borkh.), potato (Solanum tuberosum L.), and tomato (Solanum lycopersicum L.). In addition, we investigated the impact of model architectures, parameter sizes, and crop-specific models (CSMs) on accuracy, using DenseNets, ResNets, MobileNetV3, EfficientNet, and VGG Nets. RESULTS Our results revealed a decrease in accuracy across all models from Lab (98.22%) to Mixed (91.76%) to Field (71.55%) conditions. Interestingly, disease classification accuracy showed minimal variation across model architectures and parameter sizes: Lab (97.61-98.76%), Mixed (90.76-92.31%), and Field (68.56-73.81%). Although CSMs were found to reduce inter-crop disease misclassifications, they also led to a slight increase in intra-crop misclassifications. CONCLUSION Our findings underscore the importance of enriching data representation and volumes over employing new model architectures. Furthermore, the need for more field-specific images was highlighted. Ultimately, these insights contribute to the advancement of crop disease identification applications, facilitating their practical implementation in farmer's fields. © 2024 Society of Chemical Industry.
Collapse
Affiliation(s)
- Qi Tian
- College of Natural Resources and Environment, Northwest A&F University, Xianyang, China
| | - Gang Zhao
- State Key Laboratory of Soil Erosion and Dryland Farming on the Loess Plateau, Northwest A&F University, Xianyang, China
- BASF Digital Farming GmbH, Köln, Germany
| | - Changqing Yan
- College of Intelligent Equipment, Shandong University of Science and Technology, Taian, China
| | - Linjia Yao
- College of Natural Resources and Environment, Northwest A&F University, Xianyang, China
| | - Junjie Qu
- Guangxi Crop Genetic Improvement and Biotechnology Key Lab, Guangxi Academy of Agricultural Sciences, Nanning, China
| | - Ling Yin
- Guangxi Crop Genetic Improvement and Biotechnology Key Lab, Guangxi Academy of Agricultural Sciences, Nanning, China
| | - Hao Feng
- State Key Laboratory of Soil Erosion and Dryland Farming on the Loess Plateau, Northwest A&F University, Xianyang, China
- Institute of Water-Saving Agriculture in Arid Areas of China, Northwest A&F University, Xianyang, China
| | - Ning Yao
- College of Water Resources and Architectural Engineering/Key Lab of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A&F University, Xianyang, Yangling, China
| | - Qiang Yu
- State Key Laboratory of Soil Erosion and Dryland Farming on the Loess Plateau, Northwest A&F University, Xianyang, China
| |
Collapse
|