201
|
Xu X, Chen Y, Yin H, Wang X, Zhang X. Nondestructive detection of SSC in multiple pear (Pyrus pyrifolia Nakai) cultivars using Vis-NIR spectroscopy coupled with the Grad-CAM method. Food Chem 2024; 450:139283. [PMID: 38615528 DOI: 10.1016/j.foodchem.2024.139283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Revised: 03/22/2024] [Accepted: 04/06/2024] [Indexed: 04/16/2024]
Abstract
Vis-NIR spectroscopy coupled with chemometric models is frequently used for pear soluble solid content (SSC) prediction. However, the model robustness is challenged by the variations in pear cultivars. This study explored the feasibility of developing universal models for predicting SSC of multiple pear varieties to improve the model's generalizability. The mature fruits of 6 pear cultivars with green skin (Pyrus pyrifolia Nakai cv. 'Cuiyu', 'Sucui No.1' and 'Cuiguan') and brown skin (Pyrus pyrifolia Nakai cv. 'Hosui','Syusui' and 'Wakahikari') were used to establish single-cultivar models and multi-cultivar universal models using convolutional neural network (CNN), partial least square (PLS), and support vector regression (SVR) approaches. Multi-cultivar universal models were built using full spectra and important variables extracted by gradient-weighted class activation mapping (Grad-CAM), respectively. The universal models based on important variables obtained satisfactory performances with RMSEPs of 0.76, 0.59, 0.80, 1.64, 0.98, and 1.03°Brix on 6 cultivars, respectively.
Collapse
Affiliation(s)
- Xin Xu
- College of Engineering, Nanjing Agricultural University, Nanjing 210031, China
| | - Yanyu Chen
- College of Engineering, Nanjing Agricultural University, Nanjing 210031, China
| | - Hao Yin
- College of Horticulture, Nanjing Agricultural University, Nanjing 210031, China
| | - Xiaochan Wang
- College of Engineering, Nanjing Agricultural University, Nanjing 210031, China
| | - Xiaolei Zhang
- College of Engineering, Nanjing Agricultural University, Nanjing 210031, China.
| |
Collapse
|
202
|
Tao W, Wang X, Yan T, Liu Z, Wan S. ESF-YOLO: an accurate and universal object detector based on neural networks. Front Neurosci 2024; 18:1371418. [PMID: 38650621 PMCID: PMC11033406 DOI: 10.3389/fnins.2024.1371418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Accepted: 03/28/2024] [Indexed: 04/25/2024] Open
Abstract
As an excellent single-stage object detector based on neural networks, YOLOv5 has found extensive applications in the industrial domain; however, it still exhibits certain design limitations. To address these issues, this paper proposes Efficient Scale Fusion YOLO (ESF-YOLO). Firstly, the Multi-Sampling Conv Module (MSCM) is designed, which enhances the backbone network's learning capability for low-level features through multi-scale receptive fields and cross-scale feature fusion. Secondly, to tackle occlusion issues, a new Block-wise Channel Attention Module (BCAM) is designed, assigning greater weights to channels corresponding to critical information. Next, a lightweight Decoupled Head (LD-Head) is devised. Additionally, the loss function is redesigned to address asynchrony between labels and confidences, alleviating the imbalance between positive and negative samples during the neural network training. Finally, an adaptive scale factor for Intersection over Union (IoU) calculation is innovatively proposed, adjusting bounding box sizes adaptively to accommodate targets of different sizes in the dataset. Experimental results on the SODA10M and CBIA8K datasets demonstrate that ESF-YOLO increases Average Precision at 0.50 IoU (AP50) by 3.93 and 2.24%, Average Precision at 0.75 IoU (AP75) by 4.77 and 4.85%, and mean Average Precision (mAP) by 4 and 5.39%, respectively, validating the model's broad applicability.
Collapse
Affiliation(s)
- Wenguang Tao
- Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an, China
| | - Xiaotian Wang
- Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an, China
| | - Tian Yan
- Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an, China
| | - Zhengzhuo Liu
- Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an, China
| | - Shizheng Wan
- Shanghai Electro-Mechanical Engineering Institute, Shanghai, China
| |
Collapse
|
203
|
Zhang X, Gao H, Wang H, Chen Z, Zhang Z, Chen X, Li Y, Qi Y, Wang R. PLANET: A Multi-objective Graph Neural Network Model for Protein-Ligand Binding Affinity Prediction. J Chem Inf Model 2024; 64:2205-2220. [PMID: 37319418 DOI: 10.1021/acs.jcim.3c00253] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Predicting protein-ligand binding affinity is a central issue in drug design. Various deep learning models have been published in recent years, where many of them rely on 3D protein-ligand complex structures as input and tend to focus on the single task of reproducing binding affinity. In this study, we have developed a graph neural network model called PLANET (Protein-Ligand Affinity prediction NETwork). This model takes the graph-represented 3D structure of the binding pocket on the target protein and the 2D chemical structure of the ligand molecule as input. It was trained through a multi-objective process with three related tasks, including deriving the protein-ligand binding affinity, protein-ligand contact map, and ligand distance matrix. Besides the protein-ligand complexes with known binding affinity data retrieved from the PDBbind database, a large number of non-binder decoys were also added to the training data for deriving the final model of PLANET. When tested on the CASF-2016 benchmark, PLANET exhibited a scoring power comparable to the best result yielded by other deep learning models as well as a reasonable ranking power and docking power. In virtual screening trials conducted on the DUD-E benchmark, PLANET's performance was notably better than several deep learning and machine learning models. As on the LIT-PCBA benchmark, PLANET achieved comparable accuracy as the conventional docking program Glide, but it only spent less than 1% of Glide's computation time to finish the same job because PLANET did not need exhaustive conformational sampling. Considering the decent accuracy and efficiency of PLANET in binding affinity prediction, it may become a useful tool for conducting large-scale virtual screening.
Collapse
Affiliation(s)
- Xiangying Zhang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Haotian Gao
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Haojie Wang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Zhihang Chen
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Zhe Zhang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Xinchong Chen
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Yan Li
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Yifei Qi
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| | - Renxiao Wang
- Department of Medicinal Chemistry, School of Pharmacy, Fudan University, 826 Zhangheng Road, Shanghai 201203, People's Republic of China
| |
Collapse
|
204
|
Tian T, Li S, Fang M, Zhao D, Zeng J. MolSHAP: Interpreting Quantitative Structure-Activity Relationships Using Shapley Values of R-Groups. J Chem Inf Model 2024; 64:2236-2249. [PMID: 37584270 DOI: 10.1021/acs.jcim.3c00465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2023]
Abstract
Optimizing the activities and properties of lead compounds is an essential step in the drug discovery process. Despite recent advances in machine learning-aided drug discovery, most of the existing methods focus on making predictions for the desired objectives directly while ignoring the explanations for predictions. Although several techniques can provide interpretations for machine learning-based methods such as feature attribution, there are still gaps between these interpretations and the principles commonly adopted by medicinal chemists when designing and optimizing molecules. Here, we propose an interpretation framework, named MolSHAP, for quantitative structure-activity relationship analysis by estimating the contributions of R-groups. Instead of attributing the activities to individual input features, MolSHAP regards the R-group fragments as the basic units of interpretation, which is in accordance with the fragment-based modifications in molecule optimization. MolSHAP is a model-agnostic method that can interpret activity regression models with arbitrary input formats and model architectures. Based on the evaluations of numerous representative activity regression models on a specially designed R-group ranking task, MolSHAP achieved significantly better interpretation power compared with other methods. In addition, we developed a compound optimization algorithm based on MolSHAP and illustrated the reliability of the optimized compounds using an independent case study. These results demonstrated that MolSHAP can provide a useful tool for accurately interpreting the quantitative structure-activity relationships and rationally optimizing the compound activities in drug discovery.
Collapse
Affiliation(s)
- Tingzhong Tian
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Shuya Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Meng Fang
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Dan Zhao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| |
Collapse
|
205
|
Cui S, Hui B. Dual-Dependency Attention Transformer for Fine-Grained Visual Classification. Sensors (Basel) 2024; 24:2337. [PMID: 38610547 PMCID: PMC11014298 DOI: 10.3390/s24072337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/31/2024] [Accepted: 04/03/2024] [Indexed: 04/14/2024]
Abstract
Visual transformers (ViTs) are widely used in various visual tasks, such as fine-grained visual classification (FGVC). However, the self-attention mechanism, which is the core module of visual transformers, leads to quadratic computational and memory complexity. The sparse-attention and local-attention approaches currently used by most researchers are not suitable for FGVC tasks. These tasks require dense feature extraction and global dependency modeling. To address this challenge, we propose a dual-dependency attention transformer model. It decouples global token interactions into two paths. The first is a position-dependency attention pathway based on the intersection of two types of grouped attention. The second is a semantic dependency attention pathway based on dynamic central aggregation. This approach enhances the high-quality semantic modeling of discriminative cues while reducing the computational cost to linear computational complexity. In addition, we develop discriminative enhancement strategies. These strategies increase the sensitivity of high-confidence discriminative cue tracking with a knowledge-based representation approach. Experiments on three datasets, NABIRDS, CUB, and DOGS, show that the method is suitable for fine-grained image classification. It finds a balance between computational cost and performance.
Collapse
Affiliation(s)
- Shiyan Cui
- Key Laboratory of Opto-Electronic Information Processing, Chinese Academy of Sciences, Shenyang 110016, China;
- Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China
- Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110169, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Bin Hui
- Key Laboratory of Opto-Electronic Information Processing, Chinese Academy of Sciences, Shenyang 110016, China;
- Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China
- Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110169, China
| |
Collapse
|
206
|
Freitas M, Pinho F, Pinho L, Silva S, Figueira V, Vilas-Boas JP, Silva A. Biomechanical Assessment Methods Used in Chronic Stroke: A Scoping Review of Non-Linear Approaches. Sensors (Basel) 2024; 24:2338. [PMID: 38610549 PMCID: PMC11014015 DOI: 10.3390/s24072338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 03/22/2024] [Accepted: 04/04/2024] [Indexed: 04/14/2024]
Abstract
Non-linear and dynamic systems analysis of human movement has recently become increasingly widespread with the intention of better reflecting how complexity affects the adaptability of motor systems, especially after a stroke. The main objective of this scoping review was to summarize the non-linear measures used in the analysis of kinetic, kinematic, and EMG data of human movement after stroke. PRISMA-ScR guidelines were followed, establishing the eligibility criteria, the population, the concept, and the contextual framework. The examined studies were published between 1 January 2013 and 12 April 2023, in English or Portuguese, and were indexed in the databases selected for this research: PubMed®, Web of Science®, Institute of Electrical and Electronics Engineers®, Science Direct® and Google Scholar®. In total, 14 of the 763 articles met the inclusion criteria. The non-linear measures identified included entropy (n = 11), fractal analysis (n = 1), the short-term local divergence exponent (n = 1), the maximum Floquet multiplier (n = 1), and the Lyapunov exponent (n = 1). These studies focused on different motor tasks: reaching to grasp (n = 2), reaching to point (n = 1), arm tracking (n = 2), elbow flexion (n = 5), elbow extension (n = 1), wrist and finger extension upward (lifting) (n = 1), knee extension (n = 1), and walking (n = 4). When studying the complexity of human movement in chronic post-stroke adults, entropy measures, particularly sample entropy, were preferred. Kinematic assessment was mainly performed using motion capture systems, with a focus on joint angles of the upper limbs.
Collapse
Affiliation(s)
- Marta Freitas
- Escola Superior de Saúde do Vale do Ave, Cooperativa de Ensino Superior Politécnico e Universitário, Rua José António Vidal, 81, 4760-409 Vila Nova de Famalicão, Portugal; (F.P.); (L.P.); (S.S.); (V.F.)
- HM—Health and Human Movement Unit, Polytechnic University of Health, Cooperativa de Ensino Superior Politécnico e Universitário, CRL, 4760-409 Vila Nova de Famalicão, Portugal
- Center for Rehabilitation Research (CIR), R. Dr. António Bernardino de Almeida 400, 4200-072 Porto, Portugal;
- Porto Biomechanics Laboratory (LABIOMEP), 4200-450 Porto, Portugal
| | - Francisco Pinho
- Escola Superior de Saúde do Vale do Ave, Cooperativa de Ensino Superior Politécnico e Universitário, Rua José António Vidal, 81, 4760-409 Vila Nova de Famalicão, Portugal; (F.P.); (L.P.); (S.S.); (V.F.)
- HM—Health and Human Movement Unit, Polytechnic University of Health, Cooperativa de Ensino Superior Politécnico e Universitário, CRL, 4760-409 Vila Nova de Famalicão, Portugal
| | - Liliana Pinho
- Escola Superior de Saúde do Vale do Ave, Cooperativa de Ensino Superior Politécnico e Universitário, Rua José António Vidal, 81, 4760-409 Vila Nova de Famalicão, Portugal; (F.P.); (L.P.); (S.S.); (V.F.)
- HM—Health and Human Movement Unit, Polytechnic University of Health, Cooperativa de Ensino Superior Politécnico e Universitário, CRL, 4760-409 Vila Nova de Famalicão, Portugal
- Center for Rehabilitation Research (CIR), R. Dr. António Bernardino de Almeida 400, 4200-072 Porto, Portugal;
- Porto Biomechanics Laboratory (LABIOMEP), 4200-450 Porto, Portugal
| | - Sandra Silva
- Escola Superior de Saúde do Vale do Ave, Cooperativa de Ensino Superior Politécnico e Universitário, Rua José António Vidal, 81, 4760-409 Vila Nova de Famalicão, Portugal; (F.P.); (L.P.); (S.S.); (V.F.)
- HM—Health and Human Movement Unit, Polytechnic University of Health, Cooperativa de Ensino Superior Politécnico e Universitário, CRL, 4760-409 Vila Nova de Famalicão, Portugal
- Department of Medical Sciences, University of Aveiro, 3810-193 Aveiro, Portugal
- School of Health Sciences, University of Aveiro, 3810-193 Aveiro, Portugal;
| | - Vânia Figueira
- Escola Superior de Saúde do Vale do Ave, Cooperativa de Ensino Superior Politécnico e Universitário, Rua José António Vidal, 81, 4760-409 Vila Nova de Famalicão, Portugal; (F.P.); (L.P.); (S.S.); (V.F.)
- HM—Health and Human Movement Unit, Polytechnic University of Health, Cooperativa de Ensino Superior Politécnico e Universitário, CRL, 4760-409 Vila Nova de Famalicão, Portugal
- Porto Biomechanics Laboratory (LABIOMEP), 4200-450 Porto, Portugal
| | - João Paulo Vilas-Boas
- School of Health Sciences, University of Aveiro, 3810-193 Aveiro, Portugal;
- Centre for Research, Training, Innovation and Intervention in Sport (CIFI2D), Faculty of Sport, University of Porto, 4200-450 Porto, Portugal
| | - Augusta Silva
- Center for Rehabilitation Research (CIR), R. Dr. António Bernardino de Almeida 400, 4200-072 Porto, Portugal;
- Department of Physiotherapy, School of Health, Polytechnic of Porto, 4200-072 Porto, Portugal
| |
Collapse
|
207
|
Ibragimov E, Kim Y, Lee JH, Cho J, Lee JJ. Automated Pavement Condition Index Assessment with Deep Learning and Image Analysis: An End-to-End Approach. Sensors (Basel) 2024; 24:2333. [PMID: 38610545 PMCID: PMC11014408 DOI: 10.3390/s24072333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 04/02/2024] [Accepted: 04/03/2024] [Indexed: 04/14/2024]
Abstract
The degradation of road pavements due to environmental factors is a pressing issue in infrastructure maintenance, necessitating precise identification of pavement distresses. The pavement condition index (PCI) serves as a critical metric for evaluating pavement conditions, essential for effective budget allocation and performance tracking. Traditional manual PCI assessment methods are limited by labor intensity, subjectivity, and susceptibility to human error. Addressing these challenges, this paper presents a novel, end-to-end automated method for PCI calculation, integrating deep learning and image processing technologies. The first stage employs a deep learning algorithm for accurate detection of pavement cracks, followed by the application of a segmentation-based skeleton algorithm in image processing to estimate crack width precisely. This integrated approach enhances the assessment process, providing a more comprehensive evaluation of pavement integrity. The validation results demonstrate a 95% accuracy in crack detection and 90% accuracy in crack width estimation. Leveraging these results, the automated PCI rating is achieved, aligned with standards, showcasing significant improvements in the efficiency and reliability of PCI evaluations. This method offers advancements in pavement maintenance strategies and potential applications in broader road infrastructure management.
Collapse
Affiliation(s)
- Eldor Ibragimov
- SISTech Co., Ltd., Seoul 05006, Republic of Korea; (E.I.); (Y.K.)
| | - Yongsoo Kim
- SISTech Co., Ltd., Seoul 05006, Republic of Korea; (E.I.); (Y.K.)
| | - Jung Hee Lee
- Department of Artificial Intelligence, Ajou University, Suwon-si 16499, Republic of Korea;
| | - Junsang Cho
- Korea Expressway Corporation Research Institute, Hwaseong-si 13550, Republic of Korea;
| | - Jong-Jae Lee
- Department of Civil & Environmental Engineering, Sejong University, Seoul 05006, Republic of Korea
| |
Collapse
|
208
|
Reigle J, Lopez-Nunez O, Drysdale E, Abuquteish D, Liu X, Putra J, Erdman L, Griffiths AM, Prasath S, Siddiqui I, Dhaliwal J. Using Deep Learning to Automate Eosinophil Counting in Pediatric Ulcerative Colitis Histopathological Images. medRxiv 2024:2024.04.03.24305251. [PMID: 38633803 PMCID: PMC11023647 DOI: 10.1101/2024.04.03.24305251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Abstract
Background Accurate identification of inflammatory cells from mucosal histopathology images is important in diagnosing ulcerative colitis. The identification of eosinophils in the colonic mucosa has been associated with disease course. Cell counting is not only time-consuming but can also be subjective to human biases. In this study we developed an automatic eosinophilic cell counting tool from mucosal histopathology images, using deep learning. Method Four pediatric IBD pathologists from two North American pediatric hospitals annotated 530 crops from 143 standard-of-care hematoxylin and eosin (H & E) rectal mucosal biopsies. A 305/75 split was used for training/validation to develop and optimize a U-Net based deep learning model, and 150 crops were used as a test set. The U-Net model was then compared to SAU-Net, a state-of-the-art U-Net variant. We undertook post-processing steps, namely, (1) the pixel-level probability threshold, (2) the minimum number of clustered pixels to designate a cell, and (3) the connectivity. Experiments were run to optimize model parameters using AUROC and cross-entropy loss as the performance metrics. Results The F1-score was 0.86 (95%CI:0.79-0.91) (Precision: 0.77 (95%CI:0.70-0.83), Recall: 0.96 (95%CI:0.93-0.99)) to identify eosinophils as compared to an F1-score of 0.2 (95%CI:0.13-0.26) for SAU-Net (Precision: 0.38 (95%CI:0.31-0.46), Recall: 0.13 (95%CI:0.08-0.19)). The inter-rater reliability was 0.96 (95%CI:0.93-0.97). The correlation between two pathologists and the algorithm was 0.89 (95%CI:0.82-0.94) and 0.88 (95%CI:0.80-0.94) respectively. Conclusion Our results indicate that deep learning-based automated eosinophilic cell counting can obtain a robust level of accuracy with a high degree of concordance with manual expert annotations.
Collapse
|
209
|
Wang X, Mi Y, Zhang X. 3D human pose data augmentation using Generative Adversarial Networks for robotic-assisted movement quality assessment. Front Neurorobot 2024; 18:1371385. [PMID: 38644903 PMCID: PMC11032046 DOI: 10.3389/fnbot.2024.1371385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Accepted: 03/08/2024] [Indexed: 04/23/2024] Open
Abstract
In the realm of human motion recognition systems, the augmentation of 3D human pose data plays a pivotal role in enriching and enhancing the quality of original datasets through the generation of synthetic data. This augmentation is vital for addressing the current research gaps in diversity and complexity, particularly when dealing with rare or complex human movements. Our study introduces a groundbreaking approach employing Generative Adversarial Networks (GANs), coupled with Support Vector Machine (SVM) and DenseNet, further enhanced by robot-assisted technology to improve the precision and efficiency of data collection. The GANs in our model are responsible for generating highly realistic and diverse 3D human motion data, while SVM aids in the effective classification of this data. DenseNet is utilized for the extraction of key features, facilitating a comprehensive and integrated approach that significantly elevates both the data augmentation process and the model's ability to process and analyze complex human movements. The experimental outcomes underscore our model's exceptional performance in motion quality assessment, showcasing a substantial improvement over traditional methods in terms of classification accuracy and data processing efficiency. These results validate the effectiveness of our integrated network model, setting a solid foundation for future advancements in the field. Our research not only introduces innovative methodologies for 3D human pose data enhancement but also provides substantial technical support for practical applications across various domains, including sports science, rehabilitation medicine, and virtual reality. By combining advanced algorithmic strategies with robotic technologies, our work addresses key challenges in data augmentation and motion quality assessment, paving the way for new research and development opportunities in these critical areas.
Collapse
Affiliation(s)
- Xuefeng Wang
- College of Sports, Woosuk University, Jeonju, Republic of Korea
| | - Yang Mi
- College of Sports and Health, Linyi University, Linyi, China
| | - Xiang Zhang
- Department of Information Engineering, Linyi Technician Institute, Linyi, China
| |
Collapse
|
210
|
Arif M, Fang G, Ghulam A, Musleh S, Alam T. DPI_CDF: druggable protein identifier using cascade deep forest. BMC Bioinformatics 2024; 25:145. [PMID: 38580921 DOI: 10.1186/s12859-024-05744-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 03/13/2024] [Indexed: 04/07/2024] Open
Abstract
BACKGROUND Drug targets in living beings perform pivotal roles in the discovery of potential drugs. Conventional wet-lab characterization of drug targets is although accurate but generally expensive, slow, and resource intensive. Therefore, computational methods are highly desirable as an alternative to expedite the large-scale identification of druggable proteins (DPs); however, the existing in silico predictor's performance is still not satisfactory. METHODS In this study, we developed a novel deep learning-based model DPI_CDF for predicting DPs based on protein sequence only. DPI_CDF utilizes evolutionary-based (i.e., histograms of oriented gradients for position-specific scoring matrix), physiochemical-based (i.e., component protein sequence representation), and compositional-based (i.e., normalized qualitative characteristic) properties of protein sequence to generate features. Then a hierarchical deep forest model fuses these three encoding schemes to build the proposed model DPI_CDF. RESULTS The empirical outcomes on 10-fold cross-validation demonstrate that the proposed model achieved 99.13 % accuracy and 0.982 of Matthew's-correlation-coefficient (MCC) on the training dataset. The generalization power of the trained model is further examined on an independent dataset and achieved 95.01% of maximum accuracy and 0.900 MCC. When compared to current state-of-the-art methods, DPI_CDF improves in terms of accuracy by 4.27% and 4.31% on training and testing datasets, respectively. We believe, DPI_CDF will support the research community to identify druggable proteins and escalate the drug discovery process. AVAILABILITY The benchmark datasets and source codes are available in GitHub: http://github.com/Muhammad-Arif-NUST/DPI_CDF .
Collapse
Affiliation(s)
- Muhammad Arif
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Ge Fang
- State Key Laboratory for Organic Electronics and Information Displays, Institute of Advanced Materials (IAM), Nanjing 210023, P. R. China, Nanjing 210023, China
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bankok, 10700, Thailand
| | - Ali Ghulam
- Information Technology Centre, Sindh Agriculture University, Sindh, Pakistan
| | - Saleh Musleh
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Tanvir Alam
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar.
| |
Collapse
|
211
|
Ebrahim M, Alsmirat M, Al-Ayyoub M. Advanced disk herniation computer aided diagnosis system. Sci Rep 2024; 14:8071. [PMID: 38580700 PMCID: PMC10997754 DOI: 10.1038/s41598-024-58283-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Accepted: 03/27/2024] [Indexed: 04/07/2024] Open
Abstract
Over recent years, researchers and practitioners have encountered massive and continuous improvements in the computational resources available for their use. This allowed the use of resource-hungry Machine learning (ML) algorithms to become feasible and practical. Moreover, several advanced techniques are being used to boost the performance of such algorithms even further, which include various transfer learning techniques, data augmentation, and feature concatenation. Normally, the use of these advanced techniques highly depends on the size and nature of the dataset being used. In the case of fine-grained medical image sets, which have subcategories within the main categories in the image set, there is a need to find the combination of the techniques that work the best on these types of images. In this work, we utilize these advanced techniques to find the best combinations to build a state-of-the-art lumber disc herniation computer-aided diagnosis system. We have evaluated the system extensively and the results show that the diagnosis system achieves an accuracy of 98% when it is compared with human diagnosis.
Collapse
Affiliation(s)
- Maad Ebrahim
- Department of Computer Science and Operations Research (DIRO), University of Montreal, Montreal, QC, H3T1J4, Canada
- Department of Computer Science, Jordan University of Science and Technology, Ar-Ramtha, Jordan
| | - Mohammad Alsmirat
- Department of Computer Science, University of Sharjah, Sharjah, United Arab Emirates.
- Department of Computer Science, Jordan University of Science and Technology, Ar-Ramtha, Jordan.
| | - Mahmoud Al-Ayyoub
- Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, Ajman, United Arab Emirates.
- Department of Computer Science, Jordan University of Science and Technology, Ar-Ramtha, Jordan.
| |
Collapse
|
212
|
Tang X, Zhao S. The application prospects of robot pose estimation technology: exploring new directions based on YOLOv8-ApexNet. Front Neurorobot 2024; 18:1374385. [PMID: 38644904 PMCID: PMC11026676 DOI: 10.3389/fnbot.2024.1374385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 03/22/2024] [Indexed: 04/23/2024] Open
Abstract
Introduction Service robot technology is increasingly gaining prominence in the field of artificial intelligence. However, persistent limitations continue to impede its widespread implementation. In this regard, human motion pose estimation emerges as a crucial challenge necessary for enhancing the perceptual and decision-making capacities of service robots. Method This paper introduces a groundbreaking model, YOLOv8-ApexNet, which integrates advanced technologies, including Bidirectional Routing Attention (BRA) and Generalized Feature Pyramid Network (GFPN). BRA facilitates the capture of inter-keypoint correlations within dynamic environments by introducing a bidirectional information propagation mechanism. Furthermore, GFPN adeptly extracts and integrates feature information across different scales, enabling the model to make more precise predictions for targets of various sizes and shapes. Results Empirical research findings reveal significant performance enhancements of the YOLOv8-ApexNet model across the COCO and MPII datasets. Compared to existing methodologies, the model demonstrates pronounced advantages in keypoint localization accuracy and robustness. Discussion The significance of this research lies in providing an efficient and accurate solution tailored for the realm of service robotics, effectively mitigating the deficiencies inherent in current approaches. By bolstering the accuracy of perception and decision-making, our endeavors unequivocally endorse the widespread integration of service robots within practical applications.
Collapse
Affiliation(s)
- XianFeng Tang
- Physical Education Department, Zhejiang Wanli University, Ningbo, China
| | - Shuwei Zhao
- Physical Education Department, Hebei University of Technology, Tianjin, China
| |
Collapse
|
213
|
Zhou L, Wu G, Zuo Y, Chen X, Hu H. A Comprehensive Review of Vision-Based 3D Reconstruction Methods. Sensors (Basel) 2024; 24:2314. [PMID: 38610525 PMCID: PMC11014007 DOI: 10.3390/s24072314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 03/28/2024] [Accepted: 04/03/2024] [Indexed: 04/14/2024]
Abstract
With the rapid development of 3D reconstruction, especially the emergence of algorithms such as NeRF and 3DGS, 3D reconstruction has become a popular research topic in recent years. 3D reconstruction technology provides crucial support for training extensive computer vision models and advancing the development of general artificial intelligence. With the development of deep learning and GPU technology, the demand for high-precision and high-efficiency 3D reconstruction information is increasing, especially in the fields of unmanned systems, human-computer interaction, virtual reality, and medicine. The rapid development of 3D reconstruction is becoming inevitable. This survey categorizes the various methods and technologies used in 3D reconstruction. It explores and classifies them based on three aspects: traditional static, dynamic, and machine learning. Furthermore, it compares and discusses these methods. At the end of the survey, which includes a detailed analysis of the trends and challenges in 3D reconstruction development, we aim to provide a comprehensive introduction for individuals who are currently engaged in or planning to conduct research on 3D reconstruction. Our goal is to help them gain a comprehensive understanding of the relevant knowledge related to 3D reconstruction.
Collapse
Affiliation(s)
| | - Guoxin Wu
- Key Laboratory of Modern Measurement and Control Technology Ministry of Education, Beijing Information Science and Technology University, Beijing 100080, China; (L.Z.); (Y.Z.); (X.C.); (H.H.)
| | | | | | | |
Collapse
|
214
|
Li B, Chen H, Duan H. Artificial intelligence-driven prognostic system for conception prediction and management in intrauterine adhesions following hysteroscopic adhesiolysis: a diagnostic study using hysteroscopic images. Front Bioeng Biotechnol 2024; 12:1327207. [PMID: 38638324 PMCID: PMC11024240 DOI: 10.3389/fbioe.2024.1327207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 03/04/2024] [Indexed: 04/20/2024] Open
Abstract
Introduction Intrauterine adhesions (IUAs) caused by endometrial injury, commonly occurring in developing countries, can lead to subfertility. This study aimed to develop and evaluate a DeepSurv architecture-based artificial intelligence (AI) system for predicting fertility outcomes after hysteroscopic adhesiolysis. Methods This diagnostic study included 555 intrauterine adhesions (IUAs) treated with hysteroscopic adhesiolysis with 4,922 second-look hysteroscopic images from a prospective clinical database (IUADB, NCT05381376) with a minimum of 2 years of follow-up. These patients were randomly divided into training, validation, and test groups for model development, tuning, and external validation. Four transfer learning models were built using the DeepSurv architecture and a code-free AI application for pregnancy prediction was also developed. The primary outcome was the model's ability to predict pregnancy within a year after adhesiolysis. Secondary outcomes were model performance which evaluated using time-dependent area under the curves (AUCs) and C-index, and ART benefits evaluated by hazard ratio (HR) among different risk groups. Results External validation revealed that using the DeepSurv architecture, InceptionV3+ DeepSurv, InceptionResNetV2+ DeepSurv, and ResNet50+ DeepSurv achieved AUCs of 0.94, 0.95, and 0.93, respectively, for one-year pregnancy prediction, outperforming other models and clinical score systems. A code-free AI application was developed to identify candidates for ART. Patients with lower natural conception probability indicated by the application had a higher ART benefit hazard ratio (HR) of 3.13 (95% CI: 1.22-8.02, p = 0.017). Conclusion InceptionV3+ DeepSurv, InceptionResNetV2+ DeepSurv, and ResNet50+ DeepSurv show potential in predicting the fertility outcomes of IUAs after hysteroscopic adhesiolysis. The code-free AI application based on the DeepSurv architecture facilitates personalized therapy following hysteroscopic adhesiolysis.
Collapse
Affiliation(s)
- Bohan Li
- Department of Minimally Invasive Gynecologic Center, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing Maternal and Child Healthcare Hospital, Beijing, China
| | - Hui Chen
- School of Biomedical Engineering, Capital Medical University, Beijing, China
- Beijing Advanced Innovation Center for Big Data-based Precision Medicine, Capital Medical University, Beijing, China
| | - Hua Duan
- Department of Minimally Invasive Gynecologic Center, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing Maternal and Child Healthcare Hospital, Beijing, China
| |
Collapse
|
215
|
Schulte L, Faul C, Oswald P, Preißler K, Steinfartz S, Veith M, Caspers BA. Performance of different automatic photographic identification software for larvae and adults of the European fire salamander. PLoS One 2024; 19:e0298285. [PMID: 38573887 PMCID: PMC10994360 DOI: 10.1371/journal.pone.0298285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Accepted: 01/22/2024] [Indexed: 04/06/2024] Open
Abstract
For many species, population sizes are unknown despite their importance for conservation. For population size estimation, capture-mark-recapture (CMR) studies are often used, which include the necessity to identify each individual, mostly through individual markings or genetic characters. Invasive marking techniques, however, can negatively affect the individual fitness. Alternatives are low-impact techniques such as the use of photos for individual identification, for species with stable distinctive phenotypic traits. For the individual identification of photos, a variety of different software, with different requirements, is available. The European fire salamander (Salamandra salamandra) is a species in which individuals, both at the larval stage and as adults, have individual specific patterns that allow for individual identification. In this study, we compared the performance of five different software for the use of photographic identification for the European fire salamander: Amphibian & Reptile Wildbook (ARW), AmphIdent, I3S pattern+, ManderMatcher and Wild-ID. While adults can be identified by all five software, European fire salamander larvae can currently only be identified by two of the five (ARW and Wild-ID). We used one dataset of European fire salamander larval pictures taken in the laboratory and tested this dataset in two of the five software (ARW and Wild-ID). We used another dataset of European fire salamander adult pictures taken in the field and tested this using all five software. We compared the requirements of all software on the pictures used and calculated the False Rejection Rate (FRR) and the Recognition Rate (RR). For the larval dataset (421 pictures) we found that the ARW and Wild-ID performed equally well for individual identification (99.6% and 100% Recognition Rate, respectively). For the adult dataset (377 pictures), we found the best False Rejection Rate in ManderMatcher and the highest Recognition Rate in the ARW. Additionally, the ARW is the only program that requires no image pre-processing. In times of amphibian declines, non-invasive photo identification software allowing capture-mark-recapture studies help to gain knowledge on population sizes, distribution, movement and demography of a population and can thus help to support species conservation.
Collapse
Affiliation(s)
- Laura Schulte
- Department of Behavioural Ecology, Bielefeld University, Konsequenz, Bielefeld, Germany
| | - Charlotte Faul
- Biogeography, Trier University, Universitätsring, Trier, Germany
| | - Pia Oswald
- Department of Behavioural Ecology, Bielefeld University, Konsequenz, Bielefeld, Germany
| | - Kathleen Preißler
- Molecular Evolution and Systematics of Animals, Leipzig University, Talstraße, Leipzig
| | - Sebastian Steinfartz
- Molecular Evolution and Systematics of Animals, Leipzig University, Talstraße, Leipzig
| | - Michael Veith
- Biogeography, Trier University, Universitätsring, Trier, Germany
| | - Barbara A. Caspers
- Department of Behavioural Ecology, Bielefeld University, Konsequenz, Bielefeld, Germany
- JICE, Joint Institute for Individualisation in a Changing Environment, University of Münster and Bielefeld University, Bielefeld, Germany
| |
Collapse
|
216
|
Nakatani T, Utsumi Y, Fujimoto K, Iwamura M, Kise K. Image recognition-based petal arrangement estimation. Front Plant Sci 2024; 15:1334362. [PMID: 38638358 PMCID: PMC11024381 DOI: 10.3389/fpls.2024.1334362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 02/21/2024] [Indexed: 04/20/2024]
Abstract
Flowers exhibit morphological diversity in the number and positional arrangement of their floral organs, such as petals. The petal arrangements of blooming flowers are represented by the overlap position relation between neighboring petals, an indicator of the floral developmental process; however, only specialists are capable of the petal arrangement identification. Therefore, we propose a method to support the estimation of the arrangement of the perianth organs, including petals and tepals, using image recognition techniques. The problem for realizing the method is that it is not possible to prepare a large number of image datasets: we cannot apply the latest machine learning based image processing methods, which require a large number of images. Therefore, we describe the tepal arrangement as a sequence of interior-exterior patterns of tepal overlap in the image, and estimate the tepal arrangement by matching the pattern with the known patterns. We also use methods that require less or no training data to implement the method: the fine-tuned YOLO v5 model for flower detection, GrubCut for flower segmentation, the Harris corner detector for tepal overlap detection, MAML-based interior-exterior estimation, and circular permutation matching for tepal arrangement estimation. Experimental results showed good accuracy when flower detection, segmentation, overlap location estimation, interior-exterior estimation, and circle permutation matching-based tepal arrangement estimation were evaluated independently. However, the accuracy decreased when they were integrated. Therefore, we developed a user interface for manual correction of the position of overlap estimation and interior-exterior pattern estimation, which ensures the quality of tepal arrangement estimation.
Collapse
Affiliation(s)
- Tomoya Nakatani
- Graduate School of Informatics, Osaka Metropolitan University, Sakai, Japan
| | - Yuzuko Utsumi
- Graduate School of Informatics, Osaka Metropolitan University, Sakai, Japan
| | - Koichi Fujimoto
- Graduate School of Integrated Sciences for Life, Hiroshima University, Higashi-Hiroshima, Japan
| | - Masakazu Iwamura
- Graduate School of Informatics, Osaka Metropolitan University, Sakai, Japan
| | - Koichi Kise
- Graduate School of Informatics, Osaka Metropolitan University, Sakai, Japan
| |
Collapse
|
217
|
Ahmadkhani S, Moghaddam ME. A social image recommendation system based on deep reinforcement learning. PLoS One 2024; 19:e0300059. [PMID: 38574062 PMCID: PMC10994284 DOI: 10.1371/journal.pone.0300059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 02/21/2024] [Indexed: 04/06/2024] Open
Abstract
Today, due to the expansion of the Internet and social networks, people are faced with a vast amount of dynamic information. To mitigate the issue of information overload, recommender systems have become pivotal by analyzing users' activity histories to discern their interests and preferences. However, most available social image recommender systems utilize a static strategy, meaning they do not adapt to changes in user preferences. To overcome this challenge, our paper introduces a dynamic image recommender system that leverages a deep reinforcement learning (DRL) framework, enriched with a novel set of features including emotion, style, and personality. These features, uncommon in existing systems, are instrumental in crafting a user's characteristic vector, offering a personalized recommendation experience. Additionally, we overcome the challenge of state representation definition in reinforcement learning by introducing a new state representation. The experimental results show that our proposed method, compared to some related works, significantly improves Recall@k and Precision@k by approximately 7%-10% (for the top 100 images recommended) for personalized image recommendation.
Collapse
Affiliation(s)
- Somaye Ahmadkhani
- Shahid Beheshti University, Faculty of Computer Science and Engineering, Tehran, Iran
| | | |
Collapse
|
218
|
Nagle MF, Yuan J, Kaur D, Ma C, Peremyslova E, Jiang Y, Niño de Rivera A, Jawdy S, Chen JG, Feng K, Yates TB, Tuskan GA, Muchero W, Fuxin L, Strauss SH. GWAS supported by computer vision identifies large numbers of candidate regulators of in planta regeneration in Populus trichocarpa. G3 (Bethesda) 2024; 14:jkae026. [PMID: 38325329 PMCID: PMC10989874 DOI: 10.1093/g3journal/jkae026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 01/18/2024] [Accepted: 01/20/2024] [Indexed: 02/09/2024]
Abstract
Plant regeneration is an important dimension of plant propagation and a key step in the production of transgenic plants. However, regeneration capacity varies widely among genotypes and species, the molecular basis of which is largely unknown. Association mapping methods such as genome-wide association studies (GWAS) have long demonstrated abilities to help uncover the genetic basis of trait variation in plants; however, the performance of these methods depends on the accuracy and scale of phenotyping. To enable a large-scale GWAS of in planta callus and shoot regeneration in the model tree Populus, we developed a phenomics workflow involving semantic segmentation to quantify regenerating plant tissues over time. We found that the resulting statistics were of highly non-normal distributions, and thus employed transformations or permutations to avoid violating assumptions of linear models used in GWAS. We report over 200 statistically supported quantitative trait loci (QTLs), with genes encompassing or near to top QTLs including regulators of cell adhesion, stress signaling, and hormone signaling pathways, as well as other diverse functions. Our results encourage models of hormonal signaling during plant regeneration to consider keystone roles of stress-related signaling (e.g. involving jasmonates and salicylic acid), in addition to the auxin and cytokinin pathways commonly considered. The putative regulatory genes and biological processes we identified provide new insights into the biological complexity of plant regeneration, and may serve as new reagents for improving regeneration and transformation of recalcitrant genotypes and species.
Collapse
Affiliation(s)
- Michael F Nagle
- Department of Forest Ecosystems and Society, Oregon State University, 321 Richardson Hall, Corvallis, OR 97311, USA
| | - Jialin Yuan
- Department of Electrical Engineering and Computer Science, Oregon State University, 1148 Kelley Engineering Center, Corvallis, OR 97331, USA
| | - Damanpreet Kaur
- Department of Electrical Engineering and Computer Science, Oregon State University, 1148 Kelley Engineering Center, Corvallis, OR 97331, USA
| | - Cathleen Ma
- Department of Forest Ecosystems and Society, Oregon State University, 321 Richardson Hall, Corvallis, OR 97311, USA
| | - Ekaterina Peremyslova
- Department of Forest Ecosystems and Society, Oregon State University, 321 Richardson Hall, Corvallis, OR 97311, USA
| | - Yuan Jiang
- Statistics Department, Oregon State University, 239 Weniger Hall, Corvallis, OR 97331, USA
| | - Alexa Niño de Rivera
- Department of Forest Ecosystems and Society, Oregon State University, 321 Richardson Hall, Corvallis, OR 97311, USA
| | - Sara Jawdy
- Biosciences Division, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831, USA
- Center for Bioenergy Innovation, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831, USA
| | - Jin-Gui Chen
- Biosciences Division, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831, USA
- Center for Bioenergy Innovation, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831, USA
- Bredesen Center for Interdisciplinary Research, University of Tennessee-Knoxville, 310 Ferris Hall 1508 Middle Dr, Knoxville, TN 37996, USA
| | - Kai Feng
- Biosciences Division, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831, USA
- Center for Bioenergy Innovation, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831, USA
| | - Timothy B Yates
- Biosciences Division, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831, USA
- Center for Bioenergy Innovation, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831, USA
- Bredesen Center for Interdisciplinary Research, University of Tennessee-Knoxville, 310 Ferris Hall 1508 Middle Dr, Knoxville, TN 37996, USA
| | - Gerald A Tuskan
- Biosciences Division, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831, USA
- Center for Bioenergy Innovation, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831, USA
| | - Wellington Muchero
- Biosciences Division, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831, USA
- Center for Bioenergy Innovation, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN 37831, USA
- Bredesen Center for Interdisciplinary Research, University of Tennessee-Knoxville, 310 Ferris Hall 1508 Middle Dr, Knoxville, TN 37996, USA
| | - Li Fuxin
- Department of Electrical Engineering and Computer Science, Oregon State University, 1148 Kelley Engineering Center, Corvallis, OR 97331, USA
| | - Steven H Strauss
- Department of Forest Ecosystems and Society, Oregon State University, 321 Richardson Hall, Corvallis, OR 97311, USA
| |
Collapse
|
219
|
Xu K, Zhang F, Huang Y, Huang X. 2.5D UNet with context-aware feature sequence fusion for accurate esophageal tumor semantic segmentation. Phys Med Biol 2024; 69:085002. [PMID: 38484399 DOI: 10.1088/1361-6560/ad3419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 03/14/2024] [Indexed: 04/04/2024]
Abstract
Segmenting esophageal tumor from computed tomography (CT) sequence images can assist doctors in diagnosing and treating patients with this malignancy. However, accurately extracting esophageal tumor features from CT images often present challenges due to their small area, variable position, and shape, as well as the low contrast with surrounding tissues. This results in not achieving the level of accuracy required for practical applications in current methods. To address this problem, we propose a 2.5D context-aware feature sequence fusion UNet (2.5D CFSF-UNet) model for esophageal tumor segmentation in CT sequence images. Specifically, we embed intra-slice multiscale attention feature fusion (Intra-slice MAFF) in each skip connection of UNet to improve feature learning capabilities, better expressing the differences between anatomical structures within CT sequence images. Additionally, the inter-slice context fusion block (Inter-slice CFB) is utilized in the center bridge of UNet to enhance the depiction of context features between CT slices, thereby preventing the loss of structural information between slices. Experiments are conducted on a dataset of 430 esophageal tumor patients. The results show an 87.13% dice similarity coefficient, a 79.71% intersection over union and a 2.4758 mm Hausdorff distance, which demonstrates that our approach can improve contouring consistency and can be applied to clinical applications.
Collapse
Affiliation(s)
- Kai Xu
- Scholl of the Internet, Anhui university, Anhui, 230039, People's Republic of China
| | - Feixiang Zhang
- Scholl of the Internet, Anhui university, Anhui, 230039, People's Republic of China
| | - Yong Huang
- Department of Medical Oncology, The Second People's Hospital of Hefei, Hefei, 230011, People's Republic of China
| | - Xiaoyu Huang
- Department of Chinese Integrative Medicine Oncology, The First Affiliated Hospital of Anhui Medical University, Hefei, 230022, People's Republic of China
| |
Collapse
|
220
|
Sittinger M, Uhler J, Pink M, Herz A. Insect detect: An open-source DIY camera trap for automated insect monitoring. PLoS One 2024; 19:e0295474. [PMID: 38568922 PMCID: PMC10990185 DOI: 10.1371/journal.pone.0295474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 02/28/2024] [Indexed: 04/05/2024] Open
Abstract
Insect monitoring is essential to design effective conservation strategies, which are indispensable to mitigate worldwide declines and biodiversity loss. For this purpose, traditional monitoring methods are widely established and can provide data with a high taxonomic resolution. However, processing of captured insect samples is often time-consuming and expensive, which limits the number of potential replicates. Automated monitoring methods can facilitate data collection at a higher spatiotemporal resolution with a comparatively lower effort and cost. Here, we present the Insect Detect DIY (do-it-yourself) camera trap for non-invasive automated monitoring of flower-visiting insects, which is based on low-cost off-the-shelf hardware components combined with open-source software. Custom trained deep learning models detect and track insects landing on an artificial flower platform in real time on-device and subsequently classify the cropped detections on a local computer. Field deployment of the solar-powered camera trap confirmed its resistance to high temperatures and humidity, which enables autonomous deployment during a whole season. On-device detection and tracking can estimate insect activity/abundance after metadata post-processing. Our insect classification model achieved a high top-1 accuracy on the test dataset and generalized well on a real-world dataset with captured insect images. The camera trap design and open-source software are highly customizable and can be adapted to different use cases. With custom trained detection and classification models, as well as accessible software programming, many possible applications surpassing our proposed deployment method can be realized.
Collapse
Affiliation(s)
- Maximilian Sittinger
- Julius Kühn Institute (JKI)—Federal Research Centre for Cultivated Plants, Institute for Biological Control, Dossenheim, Germany
| | - Johannes Uhler
- Julius Kühn Institute (JKI)—Federal Research Centre for Cultivated Plants, Institute for Biological Control, Dossenheim, Germany
| | - Maximilian Pink
- Julius Kühn Institute (JKI)—Federal Research Centre for Cultivated Plants, Institute for Biological Control, Dossenheim, Germany
| | - Annette Herz
- Julius Kühn Institute (JKI)—Federal Research Centre for Cultivated Plants, Institute for Biological Control, Dossenheim, Germany
| |
Collapse
|
221
|
Ali IE, Sumita Y, Wakabayashi N. Advancing maxillofacial prosthodontics by using pre-trained convolutional neural networks: Image-based classification of the maxilla. J Prosthodont 2024. [PMID: 38566564 DOI: 10.1111/jopr.13853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 03/15/2024] [Indexed: 04/04/2024] Open
Abstract
PURPOSE The study aimed to compare the performance of four pre-trained convolutional neural networks in recognizing seven distinct prosthodontic scenarios involving the maxilla, as a preliminary step in developing an artificial intelligence (AI)-powered prosthesis design system. MATERIALS AND METHODS Seven distinct classes, including cleft palate, dentulous maxillectomy, edentulous maxillectomy, reconstructed maxillectomy, completely dentulous, partially edentulous, and completely edentulous, were considered for recognition. Utilizing transfer learning and fine-tuned hyperparameters, four AI models (VGG16, Inception-ResNet-V2, DenseNet-201, and Xception) were employed. The dataset, consisting of 3541 preprocessed intraoral occlusal images, was divided into training, validation, and test sets. Model performance metrics encompassed accuracy, precision, recall, F1 score, area under the receiver operating characteristic curve (AUC), and confusion matrix. RESULTS VGG16, Inception-ResNet-V2, DenseNet-201, and Xception demonstrated comparable performance, with maximum test accuracies of 0.92, 0.90, 0.94, and 0.95, respectively. Xception and DenseNet-201 slightly outperformed the other models, particularly compared with InceptionResNet-V2. Precision, recall, and F1 scores exceeded 90% for most classes in Xception and DenseNet-201 and the average AUC values for all models ranged between 0.98 and 1.00. CONCLUSIONS While DenseNet-201 and Xception demonstrated superior performance, all models consistently achieved diagnostic accuracy exceeding 90%, highlighting their potential in dental image analysis. This AI application could help work assignments based on difficulty levels and enable the development of an automated diagnosis system at patient admission. It also facilitates prosthesis designing by integrating necessary prosthesis morphology, oral function, and treatment difficulty. Furthermore, it tackles dataset size challenges in model optimization, providing valuable insights for future research.
Collapse
Affiliation(s)
- Islam E Ali
- Department of Advanced Prosthodontics, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
- Department of Prosthodontics, Faculty of Dentistry, Mansoura University, Mansoura, Egypt
| | - Yuka Sumita
- Division of General Dentistry 4, The Nippon Dental University Hospital, Tokyo, Japan
- Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Noriyuki Wakabayashi
- Department of Advanced Prosthodontics, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| |
Collapse
|
222
|
Alegria AD, Joshi AS, Mendana JB, Khosla K, Smith KT, Auch B, Donovan M, Bischof J, Gohl DM, Kodandaramaiah SB. High-throughput genetic manipulation of multicellular organisms using a machine-vision guided embryonic microinjection robot. Genetics 2024; 226:iyae025. [PMID: 38373262 PMCID: PMC10990426 DOI: 10.1093/genetics/iyae025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 01/02/2024] [Accepted: 01/08/2024] [Indexed: 02/21/2024] Open
Abstract
Microinjection is a technique used for transgenesis, mutagenesis, cell labeling, cryopreservation, and in vitro fertilization in multiple single and multicellular organisms. Microinjection requires specialized skills and involves rate-limiting and labor-intensive preparatory steps. Here, we constructed a machine-vision guided generalized robot that fully automates the process of microinjection in fruit fly (Drosophila melanogaster) and zebrafish (Danio rerio) embryos. The robot uses machine learning models trained to detect embryos in images of agar plates and identify specific anatomical locations within each embryo in 3D space using dual view microscopes. The robot then serially performs a microinjection in each detected embryo. We constructed and used three such robots to automatically microinject tens of thousands of Drosophila and zebrafish embryos. We systematically optimized robotic microinjection for each species and performed routine transgenesis with proficiency comparable to highly skilled human practitioners while achieving up to 4× increases in microinjection throughput in Drosophila. The robot was utilized to microinject pools of over 20,000 uniquely barcoded plasmids into 1,713 embryos in 2 days to rapidly generate more than 400 unique transgenic Drosophila lines. This experiment enabled a novel measurement of the number of independent germline integration events per successfully injected embryo. Finally, we showed that robotic microinjection of cryoprotective agents in zebrafish embryos significantly improves vitrification rates and survival of cryopreserved embryos post-thaw as compared to manual microinjection. We anticipate that the robot can be used to carry out microinjection for genome-wide manipulation and cryopreservation at scale in a wide range of organisms.
Collapse
Affiliation(s)
- Andrew D Alegria
- Department of Mechanical Engineering, University of Minnesota, Minneapolis, MN 55455, USA
| | - Amey S Joshi
- Department of Mechanical Engineering, University of Minnesota, Minneapolis, MN 55455, USA
| | - Jorge Blanco Mendana
- University of Minnesota Genomics Center, University of Minnesota, Minneapolis, MN 55455, USA
| | - Kanav Khosla
- Department of Mechanical Engineering, University of Minnesota, Minneapolis, MN 55455, USA
| | - Kieran T Smith
- Department of Fisheries, Wildlife and Conservation Biology, University of Minnesota, St. Paul, MN 55108, USA
| | - Benjamin Auch
- University of Minnesota Genomics Center, University of Minnesota, Minneapolis, MN 55455, USA
| | - Margaret Donovan
- University of Minnesota Genomics Center, University of Minnesota, Minneapolis, MN 55455, USA
| | - John Bischof
- Department of Mechanical Engineering, University of Minnesota, Minneapolis, MN 55455, USA
- Department of Biomedical Engineering, University of Minnesota, Minneapolis, MN 55455, USA
| | - Daryl M Gohl
- University of Minnesota Genomics Center, University of Minnesota, Minneapolis, MN 55455, USA
- Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, MN 55455, USA
| | - Suhasa B Kodandaramaiah
- Department of Mechanical Engineering, University of Minnesota, Minneapolis, MN 55455, USA
- Department of Neuroscience, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
223
|
Lv Y, Zhang J, Barnes N, Dai Y. Weakly-Supervised Contrastive Learning for Unsupervised Object Discovery. IEEE Trans Image Process 2024; 33:2689-2702. [PMID: 38536682 DOI: 10.1109/tip.2024.3380243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Unsupervised object discovery (UOD) refers to the task of discriminating the whole region of objects from the background within a scene without relying on labeled datasets, which benefits the task of bounding-box-level localization and pixel-level segmentation. This task is promising due to its ability to discover objects in a generic manner. We roughly categorize existing techniques into two main directions, namely the generative solutions based on image resynthesis, and the clustering methods based on self-supervised models. We have observed that the former heavily relies on the quality of image reconstruction, while the latter shows limitations in effectively modeling semantic correlations. To directly target at object discovery, we focus on the latter approach and propose a novel solution by incorporating weakly-supervised contrastive learning (WCL) to enhance semantic information exploration. We design a semantic-guided self-supervised learning model to extract high-level semantic features from images, which is achieved by fine-tuning the feature encoder of a self-supervised model, namely DINO, via WCL. Subsequently, we introduce Principal Component Analysis (PCA) to localize object regions. The principal projection direction, corresponding to the maximal eigenvalue, serves as an indicator of the object region(s). Extensive experiments on benchmark unsupervised object discovery datasets demonstrate the effectiveness of our proposed solution. The source code and experimental results are publicly available via our project page at https://github.com/npucvr/WSCUOD.git.
Collapse
|
224
|
Buatik A, Thansirichaisree P, Kalpiyapun P, Khademi N, Pasityothin I, Poovarodom N. Mosaic crack mapping of footings by convolutional neural networks. Sci Rep 2024; 14:7851. [PMID: 38570570 PMCID: PMC10991403 DOI: 10.1038/s41598-024-58432-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 03/29/2024] [Indexed: 04/05/2024] Open
Abstract
Cracks are the primary indicator informing the structural health of concrete structures. Frequent inspection is essential for maintenance, and automatic crack inspection offers a significant advantage, given its efficiency and accuracy. Previously, image-based crack detection systems have been utilized for individual images, yet these systems are not effective for large inspection areas. This paper thereby proposes an image-based crack detection system using a Deep Convolution Neural Network (DCNN) to identify cracks in mosaic images composed from UAV photos of concrete footings. UAV images are transformed into 3D footing models, from which the composite images are created. The CNN model is trained on 224 × 224 pixel patches, and training samples are augmented by various image transformation techniques. The proposed method is applied to localize cracks on composite images through the sliding window technique. The proposed VGG16 CNN detection system, with 95% detection accuracy, indicates superior performance to feature-based detection systems.
Collapse
Affiliation(s)
- Apichat Buatik
- Research Unit of Infrastructure Inspection, Monitoring, Repair and Strengthening, Faculty of Engineering, Thammasat School of Engineering, Thammasat University, Pathumthani, Thailand
| | - Phromphat Thansirichaisree
- Research Unit of Infrastructure Inspection, Monitoring, Repair and Strengthening, Faculty of Engineering, Thammasat School of Engineering, Thammasat University, Pathumthani, Thailand.
| | - Phisutwat Kalpiyapun
- Research Unit of Infrastructure Inspection, Monitoring, Repair and Strengthening, Faculty of Engineering, Thammasat School of Engineering, Thammasat University, Pathumthani, Thailand
| | - Navid Khademi
- Research Unit of Infrastructure Inspection, Monitoring, Repair and Strengthening, Faculty of Engineering, Thammasat School of Engineering, Thammasat University, Pathumthani, Thailand
- School of Civil Engineering, College of Engineering, University of Tehran, Tehran, Iran
| | - Ittipon Pasityothin
- Research Unit of Infrastructure Inspection, Monitoring, Repair and Strengthening, Faculty of Engineering, Thammasat School of Engineering, Thammasat University, Pathumthani, Thailand
| | - Nakhorn Poovarodom
- Research Unit of Infrastructure Inspection, Monitoring, Repair and Strengthening, Faculty of Engineering, Thammasat School of Engineering, Thammasat University, Pathumthani, Thailand
| |
Collapse
|
225
|
Bertamini M, Oletto CM, Contemori G. The Role of Uniform Textures in Making Texture Elements Visible in the Visual Periphery. Open Mind (Camb) 2024; 8:462-482. [PMID: 38665546 PMCID: PMC11045036 DOI: 10.1162/opmi_a_00136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 02/25/2024] [Indexed: 04/28/2024] Open
Abstract
There are important differences between central and peripheral vision. With respect to shape, contours retain phenomenal sharpness, although some contours disappear if they are near other contours. This leads to some uniform textures to appear non-uniform (Honeycomb illusion, Bertamini et al., 2016). Unlike other phenomena of shape perception in the periphery, this illusion is showing how continuity of the texture does not contribute to phenomenal continuity. We systematically varied the relationship between central and peripheral regions, and we collected subjective reports (how far can one see lines) as well as judgments of line orientation. We used extended textures created with a square grid and some additional lines that are invisible when they are located at the corners of the grid, or visible when they are separated from the grid (control condition). With respects to subjective reports, we compared the region of visibility for cases in which the texture was uniform (Exp 1a), or when in a central region the lines were different (Exp 1b). There were no differences, showing no role of objective uniformity on visibility. Next, in addition to the region of visibility we measured sensitivity using a forced-choice task (line tilted left or right) (Exp 2). The drop in sensitivity with eccentricity matched the size of the region in which lines were perceived in the illusion condition, but not in the control condition. When participants were offered a choice to report of the lines were present or absent (Exp 3) they confirmed that they did not see them in the illusion condition, but saw them in the control condition. We conclude that mechanisms that control perception of contours operate differently in the periphery, and override prior expectations, including that of uniformity. Conversely, when elements are detected in the periphery, we assign to them properties based on information from central vision, but these shapes cannot be identified correctly when the task requires such discrimination.
Collapse
|
226
|
Jo S, Jang O, Bhattacharyya C, Kim M, Lee T, Jang Y, Song H, Kwon H, Do S, Kim S. S-LIGHT: Synthetic Dataset for the Separation of Diffuse and Specular Reflection Images. Sensors (Basel) 2024; 24:2286. [PMID: 38610497 PMCID: PMC11014017 DOI: 10.3390/s24072286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2024] [Revised: 03/19/2024] [Accepted: 04/01/2024] [Indexed: 04/14/2024]
Abstract
Several studies in computer vision have examined specular removal, which is crucial for object detection and recognition. This research has traditionally been divided into two tasks: specular highlight removal, which focuses on removing specular highlights on object surfaces, and reflection removal, which deals with specular reflections occurring on glass surfaces. In reality, however, both types of specular effects often coexist, making it a fundamental challenge that has not been adequately addressed. Recognizing the necessity of integrating specular components handled in both tasks, we constructed a specular-light (S-Light) DB for training single-image-based deep learning models. Moreover, considering the absence of benchmark datasets for quantitative evaluation, the multi-scale normalized cross correlation (MS-NCC) metric, which considers the correlation between specular and diffuse components, was introduced to assess the learning outcomes.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | - Sungho Kim
- Advanced Visual Intelligence Lab (AVILAB), Yeungnam University, Gyeongsan-si 38541, Republic of Korea; (S.J.); (O.J.); (C.B.); (M.K.); (T.L.); (Y.J.); (H.S.); (H.K.); (S.D.)
| |
Collapse
|
227
|
Shi Q, Ye M, Huang W, Ruan W, Du B. Label-Aware Calibration and Relation-Preserving in Visual Intention Understanding. IEEE Trans Image Process 2024; 33:2627-2638. [PMID: 38536683 DOI: 10.1109/tip.2024.3380250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Visual intention understanding is a challenging task that explores the hidden intention behind the images of publishers in social media. Visual intention represents implicit semantics, whose ambiguous definition inevitably leads to label shifting and label blemish. The former indicates that the same image delivers intention discrepancies under different data augmentations, while the latter represents that the label of intention data is susceptible to errors or omissions during the annotation process. This paper proposes a novel method, called Label-aware Calibration and Relation-preserving (LabCR) to alleviate the above two problems from both intra-sample and inter-sample views. First, we disentangle the multiple intentions into a single intention for explicit distribution calibration in terms of the overall and the individual. Calibrating the class probability distributions in augmented instance pairs provides consistent inferred intention to address label shifting. Second, we utilize the intention similarity to establish correlations among samples, which offers additional supervision signals to form correlation alignments in instance pairs. This strategy alleviates the effect of label blemish. Extensive experiments have validated the superiority of the proposed method LabCR in visual intention understanding and pedestrian attribute recognition. Code is available at https://github.com/ShiQingHongYa/LabCR.
Collapse
|
228
|
Miao B, Bennamoun M, Gao Y, Mian A. Region Aware Video Object Segmentation With Deep Motion Modeling. IEEE Trans Image Process 2024; 33:2639-2651. [PMID: 38551827 DOI: 10.1109/tip.2024.3381445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Current semi-supervised video object segmentation (VOS) methods often employ the entire features of one frame to predict object masks and update memory. This introduces significant redundant computations. To reduce redundancy, we introduce a Region Aware Video Object Segmentation (RAVOS) approach, which predicts regions of interest (ROIs) for efficient object segmentation and memory storage. RAVOS includes a fast object motion tracker to predict object ROIs in the next frame. For efficient segmentation, object features are extracted based on the ROIs, and an object decoder is designed for object-level segmentation. For efficient memory storage, we propose motion path memory to filter out redundant context by memorizing the features within the motion path of objects. In addition to RAVOS, we also propose a large-scale occluded VOS dataset, dubbed OVOS, to benchmark the performance of VOS models under occlusions. Evaluation on DAVIS and YouTube-VOS benchmarks and our new OVOS dataset show that our method achieves state-of-the-art performance with significantly faster inference time, e.g., 86.1 J & F at 42 FPS on DAVIS and 84.4 J & F at 23 FPS on YouTube-VOS. Project page: ravos.netlify.app.
Collapse
|
229
|
Ke TW, Yu SX, Koneff MD, Fronczak DL, Fara LJ, Harrison TJ, Landolt KL, Hlavacek EJ, Lubinski BR, White TP. Deep learning workflow to support in-flight processing of digital aerial imagery for wildlife population surveys. PLoS One 2024; 19:e0288121. [PMID: 38568890 PMCID: PMC10990224 DOI: 10.1371/journal.pone.0288121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 05/21/2023] [Indexed: 04/05/2024] Open
Abstract
Deep learning shows promise for automating detection and classification of wildlife from digital aerial imagery to support cost-efficient remote sensing solutions for wildlife population monitoring. To support in-flight orthorectification and machine learning processing to detect and classify wildlife from imagery in near real-time, we evaluated deep learning methods that address hardware limitations and the need for processing efficiencies to support the envisioned in-flight workflow. We developed an annotated dataset for a suite of marine birds from high-resolution digital aerial imagery collected over open water environments to train the models. The proposed 3-stage workflow for automated, in-flight data processing includes: 1) image filtering based on the probability of any bird occurrence, 2) bird instance detection, and 3) bird instance classification. For image filtering, we compared the performance of a binary classifier with Mask Region-based Convolutional Neural Network (Mask R-CNN) as a means of sub-setting large volumes of imagery based on the probability of at least one bird occurrence in an image. On both the validation and test datasets, the binary classifier achieved higher performance than Mask R-CNN for predicting bird occurrence at the image-level. We recommend the binary classifier over Mask R-CNN for workflow first-stage filtering. For bird instance detection, we leveraged Mask R-CNN as our detection framework and proposed an iterative refinement method to bootstrap our predicted detections from loose ground-truth annotations. We also discuss future work to address the taxonomic classification phase of the envisioned workflow.
Collapse
Affiliation(s)
- Tsung-Wei Ke
- University of California Berkeley, Berkeley, California, United States of America
| | - Stella X. Yu
- University of California Berkeley, Berkeley, California, United States of America
- University of Michigan, Ann Arbor, Michigan, United States of America
| | - Mark D. Koneff
- Division of Migratory Bird Management, United States Fish and Wildlife Service, Orono, Maine, United States of America
| | - David L. Fronczak
- Division of Migratory Bird Management, United States Fish and Wildlife Service, Bloomington, Minnesota, United States of America
| | - Luke J. Fara
- Upper Midwest Environmental Sciences Center, United States Geological Survey, La Crosse, Wisconsin, Minnesota, United States of America
| | - Travis J. Harrison
- Upper Midwest Environmental Sciences Center, United States Geological Survey, La Crosse, Wisconsin, Minnesota, United States of America
| | - Kyle L. Landolt
- Upper Midwest Environmental Sciences Center, United States Geological Survey, La Crosse, Wisconsin, Minnesota, United States of America
| | - Enrika J. Hlavacek
- Upper Midwest Environmental Sciences Center, United States Geological Survey, La Crosse, Wisconsin, Minnesota, United States of America
| | - Brian R. Lubinski
- Division of Migratory Bird Management, United States Fish and Wildlife Service, Bloomington, Minnesota, United States of America
| | - Timothy P. White
- Environmental Studies Program, Bureau of Ocean Energy Management, Sterling, Virginia, United States of America
| |
Collapse
|
230
|
Pérez-García JL, Gómez-López JM, Mozas-Calvache AT, Delgado-García J. Analysis of the Photogrammetric Use of 360-Degree Cameras in Complex Heritage-Related Scenes: Case of the Necropolis of Qubbet el-Hawa (Aswan Egypt). Sensors (Basel) 2024; 24:2268. [PMID: 38610481 PMCID: PMC11013985 DOI: 10.3390/s24072268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 03/25/2024] [Accepted: 03/26/2024] [Indexed: 04/14/2024]
Abstract
This study shows the results of the analysis of the photogrammetric use of 360-degree cameras in complex heritage-related scenes. The goal is to take advantage of the large field of view provided by these sensors and reduce the number of images used to cover the entire scene compared to those needed using conventional cameras. We also try to minimize problems derived from camera geometry and lens characteristics. In this regard, we used a multi-sensor camera composed of six fisheye lenses, applying photogrammetric procedures to several funerary structures. The methodology includes the analysis of several types of spherical images obtained using different stitching techniques and the comparison of the results of image orientation processes considering these images and the original fisheye images. Subsequently, we analyze the possible use of the fisheye images to model complex scenes by reducing the use of ground control points, thus minimizing the need to apply surveying techniques to determine their coordinates. In this regard, we applied distance constraints based on a previous extrinsic calibration of the camera, obtaining results similar to those obtained using a traditional schema based on points. The results have allowed us to determine the advantages and disadvantages of each type of image and configuration, providing several recommendations regarding their use in complex scenes.
Collapse
Affiliation(s)
| | | | - Antonio Tomás Mozas-Calvache
- Departamento de Ingeniería Cartográfica, Geodésica y Fotogrametría, Universidad de Jaén, 23071 Jaen, Spain; (J.L.P.-G.); (J.M.G.-L.); (J.D.-G.)
| | | |
Collapse
|
231
|
Lubbad MAH, Kurtulus IL, Karaboga D, Kilic K, Basturk A, Akay B, Nalbantoglu OU, Yilmaz OMD, Ayata M, Yilmaz S, Pacal I. A Comparative Analysis of Deep Learning-Based Approaches for Classifying Dental Implants Decision Support System. J Imaging Inform Med 2024:10.1007/s10278-024-01086-x. [PMID: 38565730 DOI: 10.1007/s10278-024-01086-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2024] [Revised: 02/28/2024] [Accepted: 03/12/2024] [Indexed: 04/04/2024]
Abstract
This study aims to provide an effective solution for the autonomous identification of dental implant brands through a deep learning-based computer diagnostic system. It also seeks to ascertain the system's potential in clinical practices and to offer a strategic framework for improving diagnosis and treatment processes in implantology. This study employed a total of 28 different deep learning models, including 18 convolutional neural network (CNN) models (VGG, ResNet, DenseNet, EfficientNet, RegNet, ConvNeXt) and 10 vision transformer models (Swin and Vision Transformer). The dataset comprises 1258 panoramic radiographs from patients who received implant treatments at Erciyes University Faculty of Dentistry between 2012 and 2023. It is utilized for the training and evaluation process of deep learning models and consists of prototypes from six different implant systems provided by six manufacturers. The deep learning-based dental implant system provided high classification accuracy for different dental implant brands using deep learning models. Furthermore, among all the architectures evaluated, the small model of the ConvNeXt architecture achieved an impressive accuracy rate of 94.2%, demonstrating a high level of classification success.This study emphasizes the effectiveness of deep learning-based systems in achieving high classification accuracy in dental implant types. These findings pave the way for integrating advanced deep learning tools into clinical practice, promising significant improvements in patient care and treatment outcomes.
Collapse
Affiliation(s)
- Mohammed A H Lubbad
- Department of Computer Engineering, Engineering Faculty, Erciyes University, 38039, Kayseri, Turkey.
- Artificial Intelligence and Big Data Application and Research Center, Erciyes University, Kayseri, Turkey.
| | | | - Dervis Karaboga
- Department of Computer Engineering, Engineering Faculty, Erciyes University, 38039, Kayseri, Turkey
- Artificial Intelligence and Big Data Application and Research Center, Erciyes University, Kayseri, Turkey
| | - Kerem Kilic
- Department of Prosthodontics, Dentistry Faculty, Erciyes University, Kayseri, Turkey
| | - Alper Basturk
- Department of Computer Engineering, Engineering Faculty, Erciyes University, 38039, Kayseri, Turkey
- Artificial Intelligence and Big Data Application and Research Center, Erciyes University, Kayseri, Turkey
| | - Bahriye Akay
- Department of Computer Engineering, Engineering Faculty, Erciyes University, 38039, Kayseri, Turkey
- Artificial Intelligence and Big Data Application and Research Center, Erciyes University, Kayseri, Turkey
| | - Ozkan Ufuk Nalbantoglu
- Department of Computer Engineering, Engineering Faculty, Erciyes University, 38039, Kayseri, Turkey
- Artificial Intelligence and Big Data Application and Research Center, Erciyes University, Kayseri, Turkey
| | | | - Mustafa Ayata
- Department of Prosthodontics, Dentistry Faculty, Erciyes University, Kayseri, Turkey
| | - Serkan Yilmaz
- Department of Dentomaxillofacial Radiology, Dentistry Faculty, Erciyes University, Kayseri, Turkey
| | - Ishak Pacal
- Department of Computer Engineering, Engineering Faculty, Igdir University, Igdir, Turkey
- Artificial Intelligence and Big Data Application and Research Center, Erciyes University, Kayseri, Turkey
| |
Collapse
|
232
|
Huang C, Jiang Y, Yang X, Wei C, Chen H, Xiong W, Lin H, Wang X, Tian T, Tan H. Enhancing Retinal Fundus Image Quality Assessment With Swin-Transformer-Based Learning Across Multiple Color-Spaces. Transl Vis Sci Technol 2024; 13:8. [PMID: 38568606 PMCID: PMC10996994 DOI: 10.1167/tvst.13.4.8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 02/18/2024] [Indexed: 04/05/2024] Open
Abstract
Purpose The assessment of retinal image (RI) quality holds significant importance in both clinical trials and large datasets, because suboptimal images can potentially conceal early signs of diseases, thereby resulting in inaccurate medical diagnoses. This study aims to develop an automatic method for Retinal Image Quality Assessment (RIQA) that incorporates visual explanations, aiming to comprehensively evaluate the quality of retinal fundus images (RIs). Methods We developed an automatic RIQA system, named Swin-MCSFNet, utilizing 28,792 RIs from the EyeQ dataset, as well as 2000 images from the EyePACS dataset and an additional 1,000 images from the OIA-ODIR dataset. After preprocessing, including cropping black regions, data augmentation, and normalization, a Swin-MCSFNet classifier based on the Swin-Transformer for multiple color-space fusion was proposed to grade the quality of RIs. The generalizability of Swin-MCSFNet was validated across multiple data centers. Additionally, for enhanced interpretability, a Score-CAM-generated heatmap was applied to provide visual explanations. Results Experimental results reveal that the proposed Swin-MCSFNet achieves promising performance, yielding a micro-receiver operating characteristic (ROC) of 0.93 and ROC scores of 0.96, 0.81, and 0.96 for the "Good," "Usable," and "Reject" categories, respectively. These scores underscore the accuracy of RIQA based on Swin-MCSF in distinguishing among the three categories. Furthermore, heatmaps generated across different RIQA classification scores and various color spaces suggest that regions in the retinal images from multiple color spaces contribute significantly to the decision-making process of the Swin-MCSFNet classifier. Conclusions Our study demonstrates that the proposed Swin-MCSFNet outperforms other methods in experiments conducted on multiple datasets, as evidenced by the superior performance metrics and insightful Score-CAM heatmaps. Translational Relevance This study constructs a new retinal image quality evaluation system, which will contribute to the subsequent research of retinal images.
Collapse
Affiliation(s)
- Chengcheng Huang
- Department of Preventive Medicine, Shantou University Medical College, Shantou, China
| | - Yukang Jiang
- School of Mathematics, Sun Yat-Sen University, Guangzhou, Guangdong, China
| | - Xiaochun Yang
- The First People's Hospital of Yun Nan Province, Kunming, China
| | - Chiyu Wei
- Department of Preventive Medicine, Shantou University Medical College, Shantou, China
| | - Hongyu Chen
- Department of Optoelectronic Information Science and Engineering, Physical and Materials Science College, Guangzhou University, Guangzhou, China
| | - Weixue Xiong
- Department of Preventive Medicine, Shantou University Medical College, Shantou, China
| | - Henghui Lin
- Department of Preventive Medicine, Shantou University Medical College, Shantou, China
| | - Xueqin Wang
- School of Management, University of Science and Technology of China, Hefei, Anhui, China
| | - Ting Tian
- School of Mathematics, Sun Yat-Sen University, Guangzhou, Guangdong, China
| | - Haizhu Tan
- Department of Preventive Medicine, Shantou University Medical College, Shantou, China
| |
Collapse
|
233
|
Thomas R, Westphal E, Schnell G, Seitz H. Machine Learning Classification of Self-Organized Surface Structures in Ultrashort-Pulse Laser Processing Based on Light Microscopic Images. Micromachines (Basel) 2024; 15:491. [PMID: 38675302 DOI: 10.3390/mi15040491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 03/26/2024] [Accepted: 03/28/2024] [Indexed: 04/28/2024]
Abstract
In ultrashort-pulsed laser processing, surface modification is subject to complex laser and scanning parameter studies. In addition, quality assurance systems for monitoring surface modification are still lacking. Automated laser processing routines featuring machine learning (ML) can help overcome these limitations, but they are largely absent in the literature and still lack practical applications. This paper presents a new methodology for machine learning classification of self-organized surface structures based on light microscopic images. For this purpose, three application-relevant types of self-organized surface structures are fabricated using a 300 fs laser system on hot working tool steel and stainless-steel substrates. Optical images of the hot working tool steel substrates were used to learn a classification algorithm based on the open-source tool Teachable Machine from Google. The trained classification algorithm achieved very high accuracy in distinguishing the surface types for the hot working steel substrate learned on, as well as for surface structures on the stainless-steel substrate. In addition, the algorithm also achieved very high accuracy in classifying the images of a specific structure class captured at different optical magnifications. Thus, the methodology proposed represents a simple and robust automated classification of surface structures that can be used as a basis for further development of quality assurance systems, automated process parameter recommendation, and inline laser parameter control.
Collapse
Affiliation(s)
- Robert Thomas
- Chair of Microfluidics, Faculty of Mechanical Engineering and Marine Technology, University of Rostock, Justus-von-Liebig Weg 6, 18059 Rostock, Germany
| | - Erik Westphal
- Chair of Microfluidics, Faculty of Mechanical Engineering and Marine Technology, University of Rostock, Justus-von-Liebig Weg 6, 18059 Rostock, Germany
| | - Georg Schnell
- Chair of Microfluidics, Faculty of Mechanical Engineering and Marine Technology, University of Rostock, Justus-von-Liebig Weg 6, 18059 Rostock, Germany
| | - Hermann Seitz
- Chair of Microfluidics, Faculty of Mechanical Engineering and Marine Technology, University of Rostock, Justus-von-Liebig Weg 6, 18059 Rostock, Germany
- Department Life, Light & Matter, University of Rostock, Albert-Einstein-Str. 25, 18059 Rostock, Germany
| |
Collapse
|
234
|
Barrett JS, Strauss JA, Chow LS, Shepherd SO, Wagenmakers AJM, Wang Y. GLUT4 localisation with the plasma membrane is unaffected by an increase in plasma free fatty acid availability. Lipids Health Dis 2024; 23:94. [PMID: 38566151 PMCID: PMC10986142 DOI: 10.1186/s12944-024-02079-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 03/13/2024] [Indexed: 04/04/2024] Open
Abstract
BACKGROUND Insulin-stimulated glucose uptake into skeletal muscle occurs via translocation of GLUT4 from intracellular storage vesicles to the plasma membrane. Elevated free fatty acid (FFA) availability via a lipid infusion reduces glucose disposal, but this occurs in the absence of impaired proximal insulin signalling. Whether GLUT4 localisation to the plasma membrane is subsequently affected by elevated FFA availability is not known. METHODS Trained (n = 11) and sedentary (n = 10) individuals, matched for age, sex and body mass index, received either a 6 h lipid or glycerol infusion in the setting of a concurrent hyperinsulinaemic-euglycaemic clamp. Sequential muscle biopsies (0, 2 and 6 h) were analysed for GLUT4 membrane localisation and microvesicle size and distribution using immunofluorescence microscopy. RESULTS At baseline, trained individuals had more small GLUT4 spots at the plasma membrane, whereas sedentary individuals had larger GLUT4 spots. GLUT4 localisation with the plasma membrane increased at 2 h (P = 0.04) of the hyperinsulinemic-euglycemic clamp, and remained elevated until 6 h, with no differences between groups or infusion type. The number of GLUT4 spots was unchanged at 2 h of infusion. However, from 2 to 6 h there was a decrease in the number of small GLUT4 spots at the plasma membrane (P = 0.047), with no differences between groups or infusion type. CONCLUSION GLUT4 localisation with the plasma membrane increases during a hyperinsulinemic-euglycemic clamp, but this is not altered by elevated FFA availability. GLUT4 appears to disperse from small GLUT4 clusters located at the plasma membrane to support glucose uptake during a hyperinsulinaemic-euglycaemic clamp.
Collapse
Affiliation(s)
- J S Barrett
- Research Institute for Sport & Exercise Sciences, Liverpool John Moores University, Tom Reilly Building, Byrom Street, Liverpool, L3 3AF, UK
| | - J A Strauss
- Research Institute for Sport & Exercise Sciences, Liverpool John Moores University, Tom Reilly Building, Byrom Street, Liverpool, L3 3AF, UK
| | - L S Chow
- Department of Medicine, University of Minnesota, Minneapolis, MN, USA
| | - S O Shepherd
- Research Institute for Sport & Exercise Sciences, Liverpool John Moores University, Tom Reilly Building, Byrom Street, Liverpool, L3 3AF, UK.
| | - A J M Wagenmakers
- Research Institute for Sport & Exercise Sciences, Liverpool John Moores University, Tom Reilly Building, Byrom Street, Liverpool, L3 3AF, UK
| | - Y Wang
- Discovery Sciences, AstraZeneca R&D, Cambridge Science Park, Milton Road, Cambridge, CB4 0WG, UK
| |
Collapse
|
235
|
Yang X, Li R, Yang X, Zhou Y, Liu Y, Han JDJ. Coordinate-wise monotonic transformations enable privacy-preserving age estimation with 3D face point cloud. Sci China Life Sci 2024:10.1007/s11427-023-2518-8. [PMID: 38573362 DOI: 10.1007/s11427-023-2518-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 12/25/2023] [Indexed: 04/05/2024]
Abstract
The human face is a valuable biomarker of aging, but the collection and use of its image raise significant privacy concerns. Here we present an approach for facial data masking that preserves age-related features using coordinate-wise monotonic transformations. We first develop a deep learning model that estimates age directly from non-registered face point clouds with high accuracy and generalizability. We show that the model learns a highly indistinguishable mapping using faces treated with coordinate-wise monotonic transformations, indicating that the relative positioning of facial information is a low-level biomarker of facial aging. Through visual perception tests and computational 3D face verification experiments, we demonstrate that transformed faces are significantly more difficult to perceive for human but not for machines, except when only the face shape information is accessible. Our study leads to a facial data protection guideline that has the potential to broaden public access to face datasets with minimized privacy risks.
Collapse
Affiliation(s)
- Xinyu Yang
- School of Life Sciences, Peking University, Beijing, 100871, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China
| | - Runhan Li
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China
| | - Xindi Yang
- Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100044, China
| | - Yong Zhou
- Clinical Research Institute, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Yi Liu
- Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100044, China
| | - Jing-Dong J Han
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
| |
Collapse
|
236
|
Jeon ES, Choi H, Shukla A, Wang Y, Lee H, Buman MP, Turaga P. Topological Persistence Guided Knowledge Distillation for Wearable Sensor Data. Eng Appl Artif Intell 2024; 130:107719. [PMID: 38282698 PMCID: PMC10810240 DOI: 10.1016/j.engappai.2023.107719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2024]
Abstract
Deep learning methods have achieved a lot of success in various applications involving converting wearable sensor data to actionable health insights. A common application areas is activity recognition, where deep-learning methods still suffer from limitations such as sensitivity to signal quality, sensor characteristic variations, and variability between subjects. To mitigate these issues, robust features obtained by topological data analysis (TDA) have been suggested as a potential solution. However, there are two significant obstacles to using topological features in deep learning: (1) large computational load to extract topological features using TDA, and (2) different signal representations obtained from deep learning and TDA which makes fusion difficult. In this paper, to enable integration of the strengths of topological methods in deep-learning for time-series data, we propose to use two teacher networks - one trained on the raw time-series data, and another trained on persistence images generated by TDA methods. These two teachers are jointly used to distill a single student model, which utilizes only the raw time-series data at test-time. This approach addresses both issues. The use of KD with multiple teachers utilizes complementary information, and results in a compact model with strong supervisory features and an integrated richer representation. To assimilate desirable information from different modalities, we design new constraints, including orthogonality imposed on feature correlation maps for improving feature expressiveness and allowing the student to easily learn from the teacher. Also, we apply an annealing strategy in KD for fast saturation and better accommodation from different features, while the knowledge gap between the teachers and student is reduced. Finally, a robust student model is distilled, which can at test-time uses only the time-series data as an input, while implicitly preserving topological features. The experimental results demonstrate the effectiveness of the proposed method on wearable sensor data. The proposed method shows 71.74% in classification accuracy on GENEActiv with WRN16-1 (1D CNNs) student, which outperforms baselines and takes much less processing time (less than 17 sec) than teachers on 6k testing samples.
Collapse
Affiliation(s)
- Eun Som Jeon
- Geometric Media Lab, School of Arts, Media and Engineering and School of Electrical, Computer and Energy Engineering, Arizona State, University, Tempe, 85281, AZ, USA
| | - Hongjun Choi
- Geometric Media Lab, School of Arts, Media and Engineering and School of Electrical, Computer and Energy Engineering, Arizona State, University, Tempe, 85281, AZ, USA
| | - Ankita Shukla
- Geometric Media Lab, School of Arts, Media and Engineering and School of Electrical, Computer and Energy Engineering, Arizona State, University, Tempe, 85281, AZ, USA
| | - Yuan Wang
- Department of Epidemiology and Biostatistics, University of South Carolina, Columbia, 29208, SC, USA
| | - Hyunglae Lee
- School for Engineering of Matter, Transport and Energy, Tempe, 85281, AZ, USA
| | - Matthew P Buman
- College of Health Solutions, Arizona State University, Phoenix, 85004, AZ, USA
| | - Pavan Turaga
- Geometric Media Lab, School of Arts, Media and Engineering and School of Electrical, Computer and Energy Engineering, Arizona State, University, Tempe, 85281, AZ, USA
| |
Collapse
|
237
|
Mahapatra D, Bozorgtabar B, Ge Z, Reyes M. GANDALF: Graph-based transformer and Data Augmentation Active Learning Framework with interpretable features for multi-label chest Xray classification. Med Image Anal 2024; 93:103075. [PMID: 38199069 DOI: 10.1016/j.media.2023.103075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Revised: 11/26/2023] [Accepted: 12/29/2023] [Indexed: 01/12/2024]
Abstract
Informative sample selection in an active learning (AL) setting helps a machine learning system attain optimum performance with minimum labeled samples, thus reducing annotation costs and boosting performance of computer-aided diagnosis systems in the presence of limited labeled data. Another effective technique to enlarge datasets in a small labeled data regime is data augmentation. An intuitive active learning approach thus consists of combining informative sample selection and data augmentation to leverage their respective advantages and improve the performance of AL systems. In this paper, we propose a novel approach called GANDALF (Graph-based TrANsformer and Data Augmentation Active Learning Framework) to combine sample selection and data augmentation in a multi-label setting. Conventional sample selection approaches in AL have mostly focused on the single-label setting where a sample has only one disease label. These approaches do not perform optimally when a sample can have multiple disease labels (e.g., in chest X-ray images). We improve upon state-of-the-art multi-label active learning techniques by representing disease labels as graph nodes and use graph attention transformers (GAT) to learn more effective inter-label relationships. We identify the most informative samples by aggregating GAT representations. Subsequently, we generate transformations of these informative samples by sampling from a learned latent space. From these generated samples, we identify informative samples via a novel multi-label informativeness score, which beyond the state of the art, ensures that (i) generated samples are not redundant with respect to the training data and (ii) make important contributions to the training stage. We apply our method to two public chest X-ray datasets, as well as breast, dermatology, retina and kidney tissue microscopy MedMNIST datasets, and report improved results over state-of-the-art multi-label AL techniques in terms of model performance, learning rates, and robustness.
Collapse
Affiliation(s)
- Dwarikanath Mahapatra
- Inception Institute of AI, Abu Dhabi, United Arab Emirates; Faculty of IT, Monash University, Melbourne, Australia.
| | - Behzad Bozorgtabar
- École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland; Lausanne University Hospital (CHUV), Lausanne, Switzerland
| | - Zongyuan Ge
- Faculty of IT, Monash University, Melbourne, Australia
| | - Mauricio Reyes
- ARTORG Center for Biomedical Engineering Research, University of Bern, Bern, Switzerland
| |
Collapse
|
238
|
Zagorchev L, Hyde DE, Li C, Wenzel F, Fläschner N, Ewald A, O'Donoghue S, Hancock K, Lim RX, Choi DC, Kelly E, Gupta S, Wilden J. Shape-constrained deformable brain segmentation: Methods and quantitative validation. Neuroimage 2024; 289:120542. [PMID: 38369167 DOI: 10.1016/j.neuroimage.2024.120542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 02/09/2024] [Accepted: 02/13/2024] [Indexed: 02/20/2024] Open
Abstract
MRI-guided neuro interventions require rapid, accurate, and reproducible segmentation of anatomical brain structures for identification of targets during surgical procedures and post-surgical evaluation of intervention efficiency. Segmentation algorithms must be validated and cleared for clinical use. This work introduces a methodology for shape-constrained deformable brain segmentation, describes the quantitative validation used for its clinical clearance, and presents a comparison with manual expert segmentation and FreeSurfer, an open source software for neuroimaging data analysis. ClearPoint Maestro is software for fully-automatic brain segmentation from T1-weighted MRI that combines a shape-constrained deformable brain model with voxel-wise tissue segmentation within the cerebral hemispheres and the cerebellum. The performance of the segmentation was validated in terms of accuracy and reproducibility. Segmentation accuracy was evaluated with respect to training data and independently traced ground truth. Segmentation reproducibility was quantified and compared with manual expert segmentation and FreeSurfer. Quantitative reproducibility analysis indicates superior performance compared to both manual expert segmentation and FreeSurfer. The shape-constrained methodology results in accurate and highly reproducible segmentation. Inherent point based-correspondence provides consistent target identification ideal for MRI-guided neuro interventions.
Collapse
Affiliation(s)
- Lyubomir Zagorchev
- ClearPoint Neuro, Clinical Science and Applications, 120 S. Sierra Ave., Suite 100, Solana Beach, 92075, CA, USA.
| | - Damon E Hyde
- ClearPoint Neuro, Clinical Science and Applications, 120 S. Sierra Ave., Suite 100, Solana Beach, 92075, CA, USA
| | - Chen Li
- ClearPoint Neuro, Clinical Science and Applications, 120 S. Sierra Ave., Suite 100, Solana Beach, 92075, CA, USA
| | - Fabian Wenzel
- Philips Research Hamburg, Medical Image Processing and Analytics, Röntgenstraße 24-26, Hamburg, 22335, Germany
| | - Nick Fläschner
- Philips Research Hamburg, Medical Image Processing and Analytics, Röntgenstraße 24-26, Hamburg, 22335, Germany
| | - Arne Ewald
- Philips Research Hamburg, Medical Image Processing and Analytics, Röntgenstraße 24-26, Hamburg, 22335, Germany
| | - Stefani O'Donoghue
- ClearPoint Neuro, Clinical Science and Applications, 120 S. Sierra Ave., Suite 100, Solana Beach, 92075, CA, USA
| | - Kelli Hancock
- ClearPoint Neuro, Clinical Science and Applications, 120 S. Sierra Ave., Suite 100, Solana Beach, 92075, CA, USA
| | - Ruo Xuan Lim
- ClearPoint Neuro, Clinical Science and Applications, 120 S. Sierra Ave., Suite 100, Solana Beach, 92075, CA, USA
| | - Dennis C Choi
- ClearPoint Neuro, Clinical Science and Applications, 120 S. Sierra Ave., Suite 100, Solana Beach, 92075, CA, USA
| | - Eddie Kelly
- ClearPoint Neuro, Clinical Science and Applications, 120 S. Sierra Ave., Suite 100, Solana Beach, 92075, CA, USA
| | - Shruti Gupta
- ClearPoint Neuro, Clinical Science and Applications, 120 S. Sierra Ave., Suite 100, Solana Beach, 92075, CA, USA
| | - Jessica Wilden
- ClearPoint Neuro, Clinical Science and Applications, 120 S. Sierra Ave., Suite 100, Solana Beach, 92075, CA, USA
| |
Collapse
|
239
|
Nagarajan B, Marques R, Aguilar E, Radeva P. Bayesian DivideMix++ for Enhanced Learning with Noisy Labels. Neural Netw 2024; 172:106122. [PMID: 38244356 DOI: 10.1016/j.neunet.2024.106122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 12/04/2023] [Accepted: 01/09/2024] [Indexed: 01/22/2024]
Abstract
Leveraging inexpensive and human intervention-based annotating methodologies, such as crowdsourcing and web crawling, often leads to datasets with noisy labels. Noisy labels can have a detrimental impact on the performance and generalization of deep neural networks. Robust models that are able to handle and mitigate the effect of these noisy labels are thus essential. In this work, we explore the open challenges of neural network memorization and uncertainty in creating robust learning algorithms with noisy labels. To overcome them, we propose a novel framework called "Bayesian DivideMix++" with two critical components: (i) DivideMix++, to enhance the robustness against memorization and (ii) Monte-Carlo MixMatch, which focuses on improving the effectiveness towards label uncertainty. DivideMix++ improves the pipeline by integrating the warm-up and augmentation pipeline with self-supervised pre-training and dedicated different data augmentations for loss analysis and backpropagation. Monte-Carlo MixMatch leverages uncertainty measurements to mitigate the influence of uncertain samples by reducing their weight in the data augmentation MixMatch step. We validate our proposed pipeline using four datasets encompassing various synthetic and real-world noise settings. We demonstrate the effectiveness and merits of our proposed pipeline using extensive experiments. Bayesian DivideMix++ outperforms the state-of-the-art models by considerable differences in all experiments. Our findings underscore the potential of leveraging these modifications to enhance the performance and generalization of deep neural networks in practical scenarios.
Collapse
Affiliation(s)
- Bhalaji Nagarajan
- Dept. de Matemàtiques i Informàtica, Universitat de Barcelona, Gran Via de les Corts Catalanes 585, 08007, Barcelona, Spain.
| | - Ricardo Marques
- Dept. de Matemàtiques i Informàtica, Universitat de Barcelona, Gran Via de les Corts Catalanes 585, 08007, Barcelona, Spain; Computer Vision Center, Cerdanyola (Barcelona), Spain
| | - Eduardo Aguilar
- Dept. de Matemàtiques i Informàtica, Universitat de Barcelona, Gran Via de les Corts Catalanes 585, 08007, Barcelona, Spain; Dept. de Ingeniería de Sistemas y Computación, Universidad Católica del Norte, Avenida Angamos 0610, 1270709, Antofagasta, Chile; Computer Vision Center, Cerdanyola (Barcelona), Spain
| | - Petia Radeva
- Dept. de Matemàtiques i Informàtica, Universitat de Barcelona, Gran Via de les Corts Catalanes 585, 08007, Barcelona, Spain; Computer Vision Center, Cerdanyola (Barcelona), Spain
| |
Collapse
|
240
|
Yuan M, Zhang C, Wang Z, Liu H, Pan G, Tang H. Trainable Spiking-YOLO for low-latency and high-performance object detection. Neural Netw 2024; 172:106092. [PMID: 38211460 DOI: 10.1016/j.neunet.2023.106092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 12/06/2023] [Accepted: 12/26/2023] [Indexed: 01/13/2024]
Abstract
Spiking neural networks (SNNs) are considered an attractive option for edge-side applications due to their sparse, asynchronous and event-driven characteristics. However, the application of SNNs to object detection tasks faces challenges in achieving good detection accuracy and high detection speed. To overcome the aforementioned challenges, we propose an end-to-end Trainable Spiking-YOLO (Tr-Spiking-YOLO) for low-latency and high-performance object detection. We evaluate our model on not only frame-based PASCAL VOC dataset but also event-based GEN1 Automotive Detection dataset, and investigate the impacts of different decoding methods on detection performance. The experimental results show that our model achieves competitive/better performance in terms of accuracy, latency and energy consumption compared to similar artificial neural network (ANN) and conversion-based SNN object detection model. Furthermore, when deployed on an edge device, our model achieves a processing speed of approximately from 14 to 39 FPS while maintaining a desirable mean Average Precision (mAP), which is capable of real-time detection on resource-constrained platforms.
Collapse
Affiliation(s)
- Mengwen Yuan
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou 311100, China
| | - Chengjun Zhang
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou 311100, China
| | - Ziming Wang
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
| | - Huixiang Liu
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou 311100, China
| | - Gang Pan
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China; The State Key Lab of Brain-Machine Intelligence, Zhejiang University, Hangzhou 310027, China; MOE Frontier Science Center for Brain Science and Brain-Machine Integration, Zhejiang University, Hangzhou 310027, China
| | - Huajin Tang
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China; The State Key Lab of Brain-Machine Intelligence, Zhejiang University, Hangzhou 310027, China; MOE Frontier Science Center for Brain Science and Brain-Machine Integration, Zhejiang University, Hangzhou 310027, China.
| |
Collapse
|
241
|
Xu G, Wang Y, Cheng J, Tang J, Yang X. Accurate and Efficient Stereo Matching via Attention Concatenation Volume. IEEE Trans Pattern Anal Mach Intell 2024; 46:2461-2474. [PMID: 38015702 DOI: 10.1109/tpami.2023.3335480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
Stereo matching is a fundamental building block for many vision and robotics applications. An informative and concise cost volume representation is vital for stereo matching of high accuracy and efficiency. In this article, we present a novel cost volume construction method, named attention concatenation volume (ACV), which generates attention weights from correlation clues to suppress redundant information and enhance matching-related information in the concatenation volume. The ACV can be seamlessly embedded into most stereo matching networks, the resulting networks can use a more lightweight aggregation network and meanwhile achieve higher accuracy. We further design a fast version of ACV to enable real-time performance, named Fast-ACV, which generates high likelihood disparity hypotheses and the corresponding attention weights from low-resolution correlation clues to significantly reduce computational and memory cost and meanwhile maintain a satisfactory accuracy. Furthermore, we design a highly accurate network ACVNet and a real-time network Fast-ACVNet based on our ACV and Fast-ACV respectively, which achieve state-of-the-art performance on several benchmarks.
Collapse
|
242
|
Wu Z, Guo K, Luo E, Wang T, Wang S, Yang Y, Zhu X, Ding R. Medical long-tailed learning for imbalanced data: Bibliometric analysis. Comput Methods Programs Biomed 2024; 247:108106. [PMID: 38452661 DOI: 10.1016/j.cmpb.2024.108106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 02/24/2024] [Accepted: 02/26/2024] [Indexed: 03/09/2024]
Abstract
BACKGROUND In the last decade, long-tail learning has become a popular research focus in deep learning applications in medicine. However, no scientometric reports have provided a systematic overview of this scientific field. We utilized bibliometric techniques to identify and analyze the literature on long-tailed learning in deep learning applications in medicine and investigate research trends, core authors, and core journals. We expanded our understanding of the primary components and principal methodologies of long-tail learning research in the medical field. METHODS Web of Science was utilized to collect all articles on long-tailed learning in medicine published until December 2023. The suitability of all retrieved titles and abstracts was evaluated. For bibliometric analysis, all numerical data were extracted. CiteSpace was used to create clustered and visual knowledge graphs based on keywords. RESULTS A total of 579 articles met the evaluation criteria. Over the last decade, the annual number of publications and citation frequency both showed significant growth, following a power-law and exponential trend, respectively. Noteworthy contributors to this field include Husanbir Singh Pannu, Fadi Thabtah, and Talha Mahboob Alam, while leading journals such as IEEE ACCESS, COMPUTERS IN BIOLOGY AND MEDICINE, IEEE TRANSACTIONS ON MEDICAL IMAGING, and COMPUTERIZED MEDICAL IMAGING AND GRAPHICS have emerged as pivotal platforms for disseminating research in this area. The core of long-tailed learning research within the medical domain is encapsulated in six principal themes: deep learning for imbalanced data, model optimization, neural networks in image analysis, data imbalance in health records, CNN in diagnostics and risk assessment, and genetic information in disease mechanisms. CONCLUSION This study summarizes recent advancements in applying long-tail learning to deep learning in medicine through bibliometric analysis and visual knowledge graphs. It explains new trends, sources, core authors, journals, and research hotspots. Although this field has shown great promise in medical deep learning research, our findings will provide pertinent and valuable insights for future research and clinical practice.
Collapse
Affiliation(s)
- Zheng Wu
- School of Information Engineering, Hunan University of Science and Engineering, Yongzhou 425199, China.
| | - Kehua Guo
- School of Computer Science and Engineering, Central South University, Changsha 410083, China.
| | - Entao Luo
- School of Information Engineering, Hunan University of Science and Engineering, Yongzhou 425199, China.
| | - Tian Wang
- BNU-UIC Institute of Artificial Intelligence and Future Networks, Beijing Normal University (BNU Zhuhai), Zhuhai, China.
| | - Shoujin Wang
- Data Science Institute, University of Technology Sydney, Sydney, Australia.
| | - Yi Yang
- Department of Computer Science, Northeastern Illinois University, Chicago, IL 60625, USA.
| | - Xiangyuan Zhu
- School of Computer Science and Engineering, Central South University, Changsha 410083, China.
| | - Rui Ding
- School of Computer Science and Engineering, Central South University, Changsha 410083, China.
| |
Collapse
|
243
|
Liu Y, Wu YH, Zhang SC, Liu L, Wu M, Cheng MM. Revisiting Computer-Aided Tuberculosis Diagnosis. IEEE Trans Pattern Anal Mach Intell 2024; 46:2316-2332. [PMID: 37934644 DOI: 10.1109/tpami.2023.3330825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2023]
Abstract
Tuberculosis (TB) is a major global health threat, causing millions of deaths annually. Although early diagnosis and treatment can greatly improve the chances of survival, it remains a major challenge, especially in developing countries. Recently, computer-aided tuberculosis diagnosis (CTD) using deep learning has shown promise, but progress is hindered by limited training data. To address this, we establish a large-scale dataset, namely the Tuberculosis X-ray (TBX11 K) dataset, which contains 11 200 chest X-ray (CXR) images with corresponding bounding box annotations for TB areas. This dataset enables the training of sophisticated detectors for high-quality CTD. Furthermore, we propose a strong baseline, SymFormer, for simultaneous CXR image classification and TB infection area detection. SymFormer incorporates Symmetric Search Attention (SymAttention) to tackle the bilateral symmetry property of CXR images for learning discriminative features. Since CXR images may not strictly adhere to the bilateral symmetry property, we also propose Symmetric Positional Encoding (SPE) to facilitate SymAttention through feature recalibration. To promote future research on CTD, we build a benchmark by introducing evaluation metrics, evaluating baseline models reformed from existing detectors, and running an online challenge. Experiments show that SymFormer achieves state-of-the-art performance on the TBX11 K dataset.
Collapse
|
244
|
Huang Z, Zhang J. Contrastive Unfolding Deraining Network. IEEE Trans Neural Netw Learn Syst 2024; 35:5155-5169. [PMID: 36112550 DOI: 10.1109/tnnls.2022.3202724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Due to the fact that the degradation of image quality caused by rain usually affects outdoor vision tasks, image deraining becomes more and more important. Focusing on the single image deraining (SID) task, in this article, we propose a novel Contrastive Unfolding DEraining Network (CUDEN), which combines the traditional iterative algorithm and deep network, exhibiting excellent performance and nice interpretability. CUDEN transforms the challenge of locating rain streaks into discovering rain features and defines the relationship between the image and feature domains in terms of mapping pairs. To obtain the mapping pairs efficiently, we propose a dynamic multidomain translation (DMT) module for decomposing the original mapping into sub-mappings. To enhance the feature extraction capability of networks, we also propose a new serial multireceptive field fusion (SMF) block, which extracts complex and variable rain features with convolution kernels of different receptive fields. Moreover, we are the first to introduce contrastive learning to the SID task and combine it with perceptual loss to propose a new contrastive perceptual loss (CPL), which is quite generalized and greatly helpful in identifying the appropriate gradient descent direction during training. Extensive experiments on synthetic and real-world datasets demonstrate that our proposed CUDEN outperforms the state-of-the-art (SOTA) deraining networks.
Collapse
|
245
|
Ruescas-Nicolau AV, Medina-Ripoll EJ, Parrilla Bernabé E, de Rosario Martínez H. Multimodal human motion dataset of 3D anatomical landmarks and pose keypoints. Data Brief 2024; 53:110157. [PMID: 38375138 PMCID: PMC10875237 DOI: 10.1016/j.dib.2024.110157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 12/22/2023] [Accepted: 01/30/2024] [Indexed: 02/21/2024] Open
Abstract
In this paper, we present a dataset that takes 2D and 3D human pose keypoints estimated from images and relates them to the location of 3D anatomical landmarks. The dataset contains 51,051 poses obtained from 71 persons in A-Pose while performing 7 movements (walking, running, squatting, and four types of jumping). These poses were scanned to build a collection of 3D moving textured meshes with anatomical correspondence. Each mesh in that collection was used to obtain the 3D locations of 53 anatomical landmarks, and 48 images were created using virtual cameras with different perspectives. 2D pose keypoints from those images were obtained using the MediaPipe Human Pose Landmarker, and their corresponding 3D keypoints were calculated by linear triangulation. The dataset consists of a folder for each participant containing two Track Row Column (TRC) files and one JSON file for each movement sequence. One TRC file is used to store the 3D data of the triangulated 3D keypoints while the other contains the 3D anatomical landmarks. The JSON file is used to store the 2D keypoints and the calibration parameters of the virtual cameras. The anthropometric characteristics of the participants are annotated in a single CSV file. These data are intended to be used in developments that require the transformation of existing human pose solutions in computer vision into biomechanical applications or simulations. This dataset can also be used in other applications related to training neural networks for human motion analysis and studying their influence on anthropometric characteristics.
Collapse
Affiliation(s)
- Ana Virginia Ruescas-Nicolau
- Instituto de Biomecánica - IBV, Universitat Politècnica de València, Edificio 9C. Camí de Vera s/n, 46022 Valencia, Spain
| | - Enrique José Medina-Ripoll
- Instituto de Biomecánica - IBV, Universitat Politècnica de València, Edificio 9C. Camí de Vera s/n, 46022 Valencia, Spain
| | - Eduardo Parrilla Bernabé
- Instituto de Biomecánica - IBV, Universitat Politècnica de València, Edificio 9C. Camí de Vera s/n, 46022 Valencia, Spain
| | - Helios de Rosario Martínez
- Instituto de Biomecánica - IBV, Universitat Politècnica de València, Edificio 9C. Camí de Vera s/n, 46022 Valencia, Spain
| |
Collapse
|
246
|
Abrantes J, Rouzrokh P. Explaining explainability: The role of XAI in medical imaging. Eur J Radiol 2024; 173:111389. [PMID: 38422609 DOI: 10.1016/j.ejrad.2024.111389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 02/21/2024] [Indexed: 03/02/2024]
Affiliation(s)
- João Abrantes
- Department of Radiology, Unidade Local de Saúde de Trás-os-Montes e Alto Douro, Vila Real, Portugal.
| | - Pouria Rouzrokh
- Mayo Clinic Artificial Intelligence Laboratory, Department of Radiology, Mayo Clinic, Rochester, MN, USA
| |
Collapse
|
247
|
Joshi CK, Liu F, Xun X, Lin J, Foo CS. On Representation Knowledge Distillation for Graph Neural Networks. IEEE Trans Neural Netw Learn Syst 2024; 35:4656-4667. [PMID: 36459610 DOI: 10.1109/tnnls.2022.3223018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Knowledge distillation (KD) is a learning paradigm for boosting resource-efficient graph neural networks (GNNs) using more expressive yet cumbersome teacher models. Past work on distillation for GNNs proposed the local structure preserving (LSP) loss, which matches local structural relationships defined over edges across the student and teacher's node embeddings. This article studies whether preserving the global topology of how the teacher embeds graph data can be a more effective distillation objective for GNNs, as real-world graphs often contain latent interactions and noisy edges. We propose graph contrastive representation distillation (G-CRD), which uses contrastive learning to implicitly preserve global topology by aligning the student node embeddings to those of the teacher in a shared representation space. Additionally, we introduce an expanded set of benchmarks on large-scale real-world datasets where the performance gap between teacher and student GNNs is non-negligible. Experiments across four datasets and 14 heterogeneous GNN architectures show that G-CRD consistently boosts the performance and robustness of lightweight GNNs, outperforming LSP (and a global structure preserving (GSP) variant of LSP) as well as baselines from 2-D computer vision. An analysis of the representational similarity among teacher and student embedding spaces reveals that G-CRD balances preserving local and global relationships, while structure preserving approaches are best at preserving one or the other.
Collapse
|
248
|
Fang Z, Zhu S, Zhang J, Liu Y, Chen Z, He Y. On Low-Rank Directed Acyclic Graphs and Causal Structure Learning. IEEE Trans Neural Netw Learn Syst 2024; 35:4924-4937. [PMID: 37216232 DOI: 10.1109/tnnls.2023.3273353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Despite several advances in recent years, learning causal structures represented by directed acyclic graphs (DAGs) remains a challenging task in high-dimensional settings when the graphs to be learned are not sparse. In this article, we propose to exploit a low-rank assumption regarding the (weighted) adjacency matrix of a DAG causal model to help address this problem. We utilize existing low-rank techniques to adapt causal structure learning methods to take advantage of this assumption and establish several useful results relating interpretable graphical conditions to the low-rank assumption. Specifically, we show that the maximum rank is highly related to hubs, suggesting that scale-free (SF) networks, which are frequently encountered in practice, tend to be low rank. Our experiments demonstrate the utility of the low-rank adaptations for a variety of data models, especially with relatively large and dense graphs. Moreover, with a validation procedure, the adaptations maintain a superior or comparable performance even when graphs are not restricted to be low rank.
Collapse
|
249
|
Huang W, Sun S, Lin X, Li P, Zhu L, Wang J, Chen CLP, Sheng B. Unsupervised Fusion Feature Matching for Data Bias in Uncertainty Active Learning. IEEE Trans Neural Netw Learn Syst 2024; 35:5749-5763. [PMID: 36215385 DOI: 10.1109/tnnls.2022.3209085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Active learning (AL) aims to sample the most valuable data for model improvement from the unlabeled pool. Traditional works, especially uncertainty-based methods, are prone to suffer from a data bias issue, which means that selected data cannot cover the entire unlabeled pool well. Although there have been lots of literature works focusing on this issue recently, they mainly benefit from the huge additional training costs and the artificially designed complex loss. The latter causes these methods to be redesigned when facing new models or tasks, which is very time-consuming and laborious. This article proposes a feature-matching-based uncertainty that resamples selected uncertainty data by feature matching, thus removing similar data to alleviate the data bias issue. To ensure that our proposed method does not introduce a lot of additional costs, we specially design a unsupervised fusion feature matching (UFFM), which does not require any training in our novel AL framework. Besides, we also redesign several classic uncertainty methods to be applied to more complex visual tasks. We conduct rigorous experiments on lots of standard benchmark datasets to validate our work. The experimental results show that our UFFM is better than the similar unsupervised feature matching technologies, and our proposed uncertainty calculation method outperforms random sampling, classic uncertainty approaches, and recent state-of-the-art (SOTA) uncertainty approaches.
Collapse
|
250
|
de la Cruz G, Lira M, Luaces O, Remeseiro B. Eye-LRCN: A Long-Term Recurrent Convolutional Network for Eye Blink Completeness Detection. IEEE Trans Neural Netw Learn Syst 2024; 35:5130-5140. [PMID: 36083963 DOI: 10.1109/tnnls.2022.3202643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Computer vision syndrome causes vision problems and discomfort mainly due to dry eye. Several studies show that dry eye in computer users is caused by a reduction in the blink rate and an increase in the prevalence of incomplete blinks. In this context, this article introduces Eye-LRCN, a new eye blink detection method that also evaluates the completeness of the blink. The method is based on a long-term recurrent convolutional network (LRCN), which combines a convolutional neural network (CNN) for feature extraction with a bidirectional recurrent neural network that performs sequence learning and classifies the blinks. A Siamese architecture is used during CNN training to overcome the high-class imbalance present in blink detection and the limited amount of data available to train blink detection models. The method was evaluated on three different tasks: blink detection, blink completeness detection, and eye state detection. We report superior performance to the state-of-the-art methods in blink detection and blink completeness detection, and remarkable results in eye state detection.
Collapse
|