1
|
Alabi O, Vercauteren T, Shi M. Multitask learning in minimally invasive surgical vision: A review. Med Image Anal 2025; 101:103480. [PMID: 39938343 DOI: 10.1016/j.media.2025.103480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 11/11/2024] [Accepted: 01/21/2025] [Indexed: 02/14/2025]
Abstract
Minimally invasive surgery (MIS) has revolutionized many procedures and led to reduced recovery time and risk of patient injury. However, MIS poses additional complexity and burden on surgical teams. Data-driven surgical vision algorithms are thought to be key building blocks in the development of future MIS systems with improved autonomy. Recent advancements in machine learning and computer vision have led to successful applications in analysing videos obtained from MIS with the promise of alleviating challenges in MIS videos. Surgical scene and action understanding encompasses multiple related tasks that, when solved individually, can be memory-intensive, inefficient, and fail to capture task relationships. Multitask learning (MTL), a learning paradigm that leverages information from multiple related tasks to improve performance and aid generalization, is well-suited for fine-grained and high-level understanding of MIS data. This review provides a narrative overview of the current state-of-the-art MTL systems that leverage videos obtained from MIS. Beyond listing published approaches, we discuss the benefits and limitations of these MTL systems. Moreover, this manuscript presents an analysis of the literature for various application fields of MTL in MIS, including those with large models, highlighting notable trends, new directions of research, and developments.
Collapse
Affiliation(s)
- Oluwatosin Alabi
- School of Biomedical Engineering & Imaging Sciences, King's College London, United Kingdom
| | - Tom Vercauteren
- School of Biomedical Engineering & Imaging Sciences, King's College London, United Kingdom
| | - Miaojing Shi
- College of Electronic and Information Engineering, Tongji University, China; Shanghai Institute of Intelligent Science and Technology, Tongji University, China.
| |
Collapse
|
2
|
Płotka S, Szczepański T, Szenejko P, Korzeniowski P, Calvo JR, Khalil A, Shamshirsaz A, Brawura-Biskupski-Samaha R, Išgum I, Sánchez CI, Sitek A. Real-time placental vessel segmentation in fetoscopic laser surgery for Twin-to-Twin Transfusion Syndrome. Med Image Anal 2025; 99:103330. [PMID: 39260033 DOI: 10.1016/j.media.2024.103330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 06/07/2024] [Accepted: 08/27/2024] [Indexed: 09/13/2024]
Abstract
Twin-to-Twin Transfusion Syndrome (TTTS) is a rare condition that affects about 15% of monochorionic pregnancies, in which identical twins share a single placenta. Fetoscopic laser photocoagulation (FLP) is the standard treatment for TTTS, which significantly improves the survival of fetuses. The aim of FLP is to identify abnormal connections between blood vessels and to laser ablate them in order to equalize blood supply to both fetuses. However, performing fetoscopic surgery is challenging due to limited visibility, a narrow field of view, and significant variability among patients and domains. In order to enhance the visualization of placental vessels during surgery, we propose TTTSNet, a network architecture designed for real-time and accurate placental vessel segmentation. Our network architecture incorporates a novel channel attention module and multi-scale feature fusion module to precisely segment tiny placental vessels. To address the challenges posed by FLP-specific fiberscope and amniotic sac-based artifacts, we employed novel data augmentation techniques. These techniques simulate various artifacts, including laser pointer, amniotic sac particles, and structural and optical fiber artifacts. By incorporating these simulated artifacts during training, our network architecture demonstrated robust generalizability. We trained TTTSNet on a publicly available dataset of 2060 video frames from 18 independent fetoscopic procedures and evaluated it on a multi-center external dataset of 24 in-vivo procedures with a total of 2348 video frames. Our method achieved significant performance improvements compared to state-of-the-art methods, with a mean Intersection over Union of 78.26% for all placental vessels and 73.35% for a subset of tiny placental vessels. Moreover, our method achieved 172 and 152 frames per second on an A100 GPU, and Clara AGX, respectively. This potentially opens the door to real-time application during surgical procedures. The code is publicly available at https://github.com/SanoScience/TTTSNet.
Collapse
Affiliation(s)
- Szymon Płotka
- Sano Centre for Computational Medicine, Cracow, Poland; Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands; Department of Biomedical Engineering and Physics, Amsterdam University Medical Center, University of Amsterdam, Amsterdam, The Netherlands
| | | | - Paula Szenejko
- First Department of Obstetrics and Gynecology, The University Center for Women and Newborn Health, Medical University of Warsaw, Warsaw, Poland
| | | | - Jesús Rodriguez Calvo
- Fetal Medicine Unit, Obstetrics and Gynecology Division, Complutense University of Madrid, Madrid, Spain
| | - Asma Khalil
- Fetal Medicine Unit, Saint George's Hospital, University of London, London, United Kingdom
| | - Alireza Shamshirsaz
- Maternal Fetal Care Center, Boston Children's Hospital, Boston, MA, United States of America; Harvard Medical School, Boston, MA, United States of America
| | | | - Ivana Išgum
- Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands; Department of Biomedical Engineering and Physics, Amsterdam University Medical Center, University of Amsterdam, Amsterdam, The Netherlands; Department of Radiology and Nuclear Medicine, Amsterdam University Medical Center, University of Amsterdam, Amsterdam, The Netherlands
| | - Clara I Sánchez
- Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands; Department of Biomedical Engineering and Physics, Amsterdam University Medical Center, University of Amsterdam, Amsterdam, The Netherlands
| | - Arkadiusz Sitek
- Harvard Medical School, Boston, MA, United States of America; Center for Advanced Medical Computing and Simulation, Massachusetts General Hospital, Boston, MA, United States of America.
| |
Collapse
|
3
|
Ding S, Hou H, Xu X, Zhang J, Guo L, Ding L. Graph-Based Semi-Supervised Deep Image Clustering With Adaptive Adjacency Matrix. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:18828-18837. [PMID: 38416618 DOI: 10.1109/tnnls.2024.3367322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/01/2024]
Abstract
Image clustering is a research hotspot in machine learning and computer vision. Existing graph-based semi-supervised deep clustering methods suffer from three problems: 1) because clustering uses only high-level features, the detailed information contained in shallow-level features is ignored; 2) most feature extraction networks employ the step odd convolutional kernel, which results in an uneven distribution of receptive field intensity; and 3) because the adjacency matrix is precomputed and fixed, it cannot adapt to changes in the relationship between samples. To solve the above problems, we propose a novel graph-based semi-supervised deep clustering method for image clustering. First, the parity cross-convolutional feature extraction and fusion module is used to extract high-quality image features. Then, the clustering constraint layer is designed to improve the clustering efficiency. And, the output layer is customized to achieve unsupervised regularization training. Finally, the adjacency matrix is inferred by actual network prediction. A graph-based regularization method is adopted for unsupervised training networks. Experimental results show that our method significantly outperforms state-of-the-art methods on USPS, MNIST, street view house numbers (SVHN), and fashion MNIST (FMNIST) datasets in terms of ACC, normalized mutual information (NMI), and ARI.
Collapse
|
4
|
Zolfaghari M, Sajedi H. Automated classification of pollen grains microscopic images using cognitive attention based on human Two Visual Streams Hypothesis. PLoS One 2024; 19:e0309674. [PMID: 39570884 PMCID: PMC11581319 DOI: 10.1371/journal.pone.0309674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Accepted: 08/15/2024] [Indexed: 11/24/2024] Open
Abstract
Aerobiology is a branch of biology that studies microorganisms passively transferred by the air. Bacteria, viruses, fungal spores, tiny insects, and pollen grains are samples of microorganisms. Pollen grains classification is essential in medicine, agronomy, economy, etc. It is performed traditionally (manually) and automatically. The automated approach is faster, more accurate, cost-effective, and with less human intervention than the manual method. In this paper, we introduce a Residual Cognitive Attention Network (RCANet) for the automated classification of pollen grains microscopic images. The suggested attention block, Ventral-Dorsal Ateetntion Block (VDAB), is designed based on the ventral (temporal) and dorsal (parietal) pathways of the occipital lobe. It is embedded in each Basic Block of the architecture of ResNet18. The VDAB is composed of ventral and dorsal attention blocks. The ventral and dorsal streams detect the structure and location of the pollen grain, respectively. According to the mentioned pathways, the Ventral Attention Block (VAB) extracts the channels related to the shape of the pollen grain, and the Dorsal Attention Block (DAB) is focused on its position. Three publicly pollen grains datasets including the Cretan Pollen Dataset (CPD), Pollen13K, and Pollen23E are employed for experiments. The ResNet18 and the proposed method (RCANet) are trained on the datasets and the proposed RCANet obtained higher performance metrics than the ResNet18 in the test step. It achieved weighted F1-score values of 98.69%, 97.83%, and 98.24% with CPD, Pollen13K, and Pollen23E datasets, respectively.
Collapse
Affiliation(s)
- Mohammad Zolfaghari
- Department of Computer Science, University of Tehran, Kish International Campus, Kish, Iran
| | - Hedieh Sajedi
- Department of Mathematics, Statistics and Computer Science, University of Tehran, Tehran, Iran
| |
Collapse
|
5
|
Zhang Y, Gao Y, Xu J, Zhao G, Shi L, Kong L. Unsupervised Joint Domain Adaptation for Decoding Brain Cognitive States From tfMRI Images. IEEE J Biomed Health Inform 2024; 28:1494-1503. [PMID: 38157464 DOI: 10.1109/jbhi.2023.3348130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2024]
Abstract
Recent advances in large model and neuroscience have enabled exploration of the mechanism of brain activity by using neuroimaging data. Brain decoding is one of the most promising researches to further understand the human cognitive function. However, current methods excessively depends on high-quality labeled data, which brings enormous expense of collection and annotation of neural images by experts. Besides, the performance of cross-individual decoding suffers from inconsistency in data distribution caused by individual variation and different collection equipments. To address mentioned above issues, a Join Domain Adapative Decoding (JDAD) framework is proposed for unsupervised decoding specific brain cognitive state related to behavioral task. Based on the volumetric feature extraction from task-based functional Magnetic Resonance Imaging (tfMRI) data, a novel objective loss function is designed by the combination of joint distribution regularizer, which aims to restrict the distance of both the conditional and marginal probability distribution of labeled and unlabeled samples. Experimental results on the public Human Connectome Project (HCP) S1200 dataset show that JDAD achieves superior performance than other prevalent methods, especially for fine-grained task with 11.5%-21.6% improvements of decoding accuracy. The learned 3D features are visualized by Grad-CAM to build a combination with brain functional regions, which provides a novel path to learn the function of brain cortex regions related to specific cognitive task in group level.
Collapse
|
6
|
Bukowski M, Kurek J, Świderski B, Jegorowa A. Custom Loss Functions in XGBoost Algorithm for Enhanced Critical Error Mitigation in Drill-Wear Analysis of Melamine-Faced Chipboard. SENSORS (BASEL, SWITZERLAND) 2024; 24:1092. [PMID: 38400250 PMCID: PMC10891790 DOI: 10.3390/s24041092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 01/29/2024] [Accepted: 02/06/2024] [Indexed: 02/25/2024]
Abstract
The advancement of machine learning in industrial applications has necessitated the development of tailored solutions to address specific challenges, particularly in multi-class classification tasks. This study delves into the customization of loss functions within the eXtreme Gradient Boosting (XGBoost) algorithm, which is a critical step in enhancing the algorithm's performance for specific applications. Our research is motivated by the need for precision and efficiency in the industrial domain, where the implications of misclassification can be substantial. We focus on the drill-wear analysis of melamine-faced chipboard, a common material in furniture production, to demonstrate the impact of custom loss functions. The paper explores several variants of Weighted Softmax Loss Functions, including Edge Penalty and Adaptive Weighted Softmax Loss, to address the challenges of class imbalance and the heightened importance of accurately classifying edge classes. Our findings reveal that these custom loss functions significantly reduce critical errors in classification without compromising the overall accuracy of the model. This research not only contributes to the field of industrial machine learning by providing a nuanced approach to loss function customization but also underscores the importance of context-specific adaptations in machine learning algorithms. The results showcase the potential of tailored loss functions in balancing precision and efficiency, ensuring reliable and effective machine learning solutions in industrial settings.
Collapse
Affiliation(s)
- Michał Bukowski
- Institute of Information Technology, Warsaw University of Life Sciences, 02-776 Warsaw, Poland; (M.B.); (B.Ś.)
| | - Jarosław Kurek
- Institute of Information Technology, Warsaw University of Life Sciences, 02-776 Warsaw, Poland; (M.B.); (B.Ś.)
| | - Bartosz Świderski
- Institute of Information Technology, Warsaw University of Life Sciences, 02-776 Warsaw, Poland; (M.B.); (B.Ś.)
| | - Albina Jegorowa
- Institute of Wood Sciences and Furniture, Warsaw University of Life Sciences, 02-787 Warsaw, Poland;
| |
Collapse
|
7
|
Guo H, Liu H, Zhu H, Li M, Yu H, Zhu Y, Chen X, Xu Y, Gao L, Zhang Q, Shentu Y. Exploring a novel HE image segmentation technique for glioblastoma: A hybrid slime mould and differential evolution approach. Comput Biol Med 2024; 168:107653. [PMID: 37984200 DOI: 10.1016/j.compbiomed.2023.107653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Revised: 10/12/2023] [Accepted: 10/31/2023] [Indexed: 11/22/2023]
Abstract
Glioblastoma is a primary brain tumor with high incidence and mortality rates, posing a significant threat to human health. It is crucial to provide necessary diagnostic assistance for its management. Among them, Multi-threshold Image Segmentation (MIS) is considered the most efficient and intuitive method in image processing. In recent years, many scholars have combined different metaheuristic algorithms with MIS to improve the quality of Image Segmentation (IS). Slime Mould Algorithm (SMA) is a metaheuristic approach inspired by the foraging behavior of slime mould populations in nature. In this investigation, we introduce a hybridized variant named BDSMA, aimed at overcoming the inherent limitations of the original algorithm. These limitations encompass inadequate exploitation capacity and a tendency to converge prematurely towards local optima when dealing with complex multidimensional problems. To bolster the algorithm's optimization prowess, we integrate the original algorithm with a robust exploitative operator called Differential Evolution (DE). Additionally, we introduce a strategy for handling solutions that surpass boundaries. The incorporation of an advanced cooperative mixing model accelerates the convergence of BDSMA, refining its precision and preventing it from becoming trapped in local optima. To substantiate the effectiveness of our proposed approach, we conduct a comprehensive series of comparative experiments involving 30 benchmark functions. The results of these experiments demonstrate the superiority of our method in terms of both convergence speed and precision. Moreover, within this study, we propose a MIS technique. This technique is subsequently employed to conduct experiments on IS at both low and high threshold levels. The effectiveness of the BDSMA-based MIS technique is further showcased through its successful application to the medical image of brain glioblastoma. The evaluation of these experimental outcomes, utilizing image quality metrics, conclusively underscores the exceptional efficacy of the algorithm we have put forth.
Collapse
Affiliation(s)
- Hongliang Guo
- College of Information Technology, Jilin Agricultural University, Changchun 130118, China.
| | - Hanbo Liu
- College of Information Technology, Jilin Agricultural University, Changchun 130118, China.
| | - Hong Zhu
- Department of Pathology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China.
| | - Mingyang Li
- College of Information Technology, Jilin Agricultural University, Changchun 130118, China.
| | - Helong Yu
- College of Information Technology, Jilin Agricultural University, Changchun 130118, China.
| | - Yun Zhu
- Department of Pathology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China.
| | - Xiaoxiao Chen
- Department of Pathology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China.
| | - Yujia Xu
- Department of Pathology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China.
| | - Lianxing Gao
- College of Engineering and Technology, Jilin Agricultural University, Changchun 130118, China.
| | - Qiongying Zhang
- Department of Pathology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China.
| | - Yangping Shentu
- Department of Pathology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China.
| |
Collapse
|
8
|
Casella A, Lena C, Moccia S, Paladini D, De Momi E, Mattos LS. Toward a navigation framework for fetoscopy. Int J Comput Assist Radiol Surg 2023; 18:2349-2356. [PMID: 37587389 PMCID: PMC10632301 DOI: 10.1007/s11548-023-02974-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 05/23/2023] [Indexed: 08/18/2023]
Abstract
PURPOSE Fetoscopic laser photocoagulation of placental anastomoses is the most effective treatment for twin-to-twin transfusion syndrome (TTTS). A robust mosaic of placenta and its vascular network could support surgeons' exploration of the placenta by enlarging the fetoscope field-of-view. In this work, we propose a learning-based framework for field-of-view expansion from intra-operative video frames. METHODS While current state of the art for fetoscopic mosaicking builds upon the registration of anatomical landmarks which may not always be visible, our framework relies on learning-based features and keypoints, as well as robust transformer-based image-feature matching, without requiring any anatomical priors. We further address the problem of occlusion recovery and frame relocalization, relying on the computed features and their descriptors. RESULTS Experiments were conducted on 10 in-vivo TTTS videos from two different fetal surgery centers. The proposed framework was compared with several state-of-the-art approaches, achieving higher [Formula: see text] on 7 out of 10 videos and a success rate of [Formula: see text] in occlusion recovery. CONCLUSION This work introduces a learning-based framework for placental mosaicking with occlusion recovery from intra-operative videos using a keypoint-based strategy and features. The proposed framework can compute the placental panorama and recover even in case of camera tracking loss where other methods fail. The results suggest that the proposed framework has large potential to pave the way to creating a surgical navigation system for TTTS by providing robust field-of-view expansion.
Collapse
Affiliation(s)
- Alessandro Casella
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy.
- Department of Electronic, Information and Bioengineering, Politecnico di Milano, Milan, Italy.
| | - Chiara Lena
- Department of Electronic, Information and Bioengineering, Politecnico di Milano, Milan, Italy
| | - Sara Moccia
- Department of Excellence in Robotics and AI, The BioRobotics Institute, Scuola Superiore Sant'Anna, Pisa, Italy
| | - Dario Paladini
- Department of Fetal and Perinatal Medicine, Istituto Giannina Gaslini, Genoa, Italy
| | - Elena De Momi
- Department of Electronic, Information and Bioengineering, Politecnico di Milano, Milan, Italy
| | - Leonardo S Mattos
- Department of Advanced Robotics, Istituto Italiano di Tecnologia, Genoa, Italy
| |
Collapse
|
9
|
Du W, Yin K, Shi J. Dimensionality Reduction Hybrid U-Net for Brain Extraction in Magnetic Resonance Imaging. Brain Sci 2023; 13:1549. [PMID: 38002509 PMCID: PMC10669566 DOI: 10.3390/brainsci13111549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 10/31/2023] [Accepted: 11/02/2023] [Indexed: 11/26/2023] Open
Abstract
In various applications, such as disease diagnosis, surgical navigation, human brain atlas analysis, and other neuroimage processing scenarios, brain extraction is typically regarded as the initial stage in MRI image processing. Whole-brain semantic segmentation algorithms, such as U-Net, have demonstrated the ability to achieve relatively satisfactory results even with a limited number of training samples. In order to enhance the precision of brain semantic segmentation, various frameworks have been developed, including 3D U-Net, slice U-Net, and auto-context U-Net. However, the processing methods employed in these models are relatively complex when applied to 3D data models. In this article, we aim to reduce the complexity of the model while maintaining appropriate performance. As an initial step to enhance segmentation accuracy, the preprocessing extraction of full-scale information from magnetic resonance images is performed with a cluster tool. Subsequently, three multi-input hybrid U-Net model frameworks are tested and compared. Finally, we propose utilizing a fusion of two-dimensional segmentation outcomes from different planes to attain improved results. The performance of the proposed framework was tested using publicly accessible benchmark datasets, namely LPBA40, in which we obtained Dice overlap coefficients of 98.05%. Improvement was achieved via our algorithm against several previous studies.
Collapse
Affiliation(s)
- Wentao Du
- Nanjing Research Institute of Electronic Technology, Nanjing 210019, China;
| | - Kuiying Yin
- Nanjing Research Institute of Electronic Technology, Nanjing 210019, China;
| | - Jingping Shi
- Department of Neurology, The Affiliated Brain Hospital of Nanjing Medical University, Nanjing 210029, China;
| |
Collapse
|
10
|
Yan P, Sun W, Li X, Li M, Jiang Y, Luo H. PKDN: Prior Knowledge Distillation Network for bronchoscopy diagnosis. Comput Biol Med 2023; 166:107486. [PMID: 37757599 DOI: 10.1016/j.compbiomed.2023.107486] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 08/15/2023] [Accepted: 09/15/2023] [Indexed: 09/29/2023]
Abstract
Bronchoscopy plays a crucial role in diagnosing and treating lung diseases. The deep learning-based diagnostic system for bronchoscopic images can assist physicians in accurately and efficiently diagnosing lung diseases, enabling patients to undergo timely pathological examinations and receive appropriate treatment. However, the existing diagnostic methods overlook the utilization of prior knowledge of medical images, and the limited feature extraction capability hinders precise focus on lesion regions, consequently affecting the overall diagnostic effectiveness. To address these challenges, this paper proposes a prior knowledge distillation network (PKDN) for identifying lung diseases through bronchoscopic images. The proposed method extracts color and edge features from lesion images using the prior knowledge guidance module, and subsequently enhances spatial and channel features by employing the dynamic spatial attention module and gated channel attention module, respectively. Finally, the extracted features undergo refinement and self-regulation through feature distillation. Furthermore, decoupled distillation is implemented to balance the importance of target and non-target class distillation, thereby enhancing the diagnostic performance of the network. The effectiveness of the proposed method is validated on the bronchoscopic dataset provided by Harbin Medical University Cancer Hospital, which consists of 2,029 bronchoscopic images from 200 patients. Experimental results demonstrate that the proposed method achieves an accuracy of 94.78% and an AUC of 98.17%, outperforming other methods significantly in diagnostic performance. These results indicate that the computer-aided diagnostic system based on PKDN provides satisfactory accuracy in diagnosing lung diseases during bronchoscopy.
Collapse
Affiliation(s)
- Pengfei Yan
- Department of Control Science and Engineering, Harbin Institute of Technology, Harbin 150001, China
| | - Weiling Sun
- Department of Endoscope, Harbin Medical University Cancer Hospital, Harbin 150040, China
| | - Xiang Li
- Department of Control Science and Engineering, Harbin Institute of Technology, Harbin 150001, China
| | - Minglei Li
- Department of Control Science and Engineering, Harbin Institute of Technology, Harbin 150001, China
| | - Yuchen Jiang
- Department of Control Science and Engineering, Harbin Institute of Technology, Harbin 150001, China
| | - Hao Luo
- Department of Control Science and Engineering, Harbin Institute of Technology, Harbin 150001, China.
| |
Collapse
|
11
|
Tao R, Zou X, Zheng G. LAST: LAtent Space-Constrained Transformers for Automatic Surgical Phase Recognition and Tool Presence Detection. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:3256-3268. [PMID: 37227905 DOI: 10.1109/tmi.2023.3279838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
When developing context-aware systems, automatic surgical phase recognition and tool presence detection are two essential tasks. There exist previous attempts to develop methods for both tasks but majority of the existing methods utilize a frame-level loss function (e.g., cross-entropy) which does not fully leverage the underlying semantic structure of a surgery, leading to sub-optimal results. In this paper, we propose multi-task learning-based, LAtent Space-constrained Transformers, referred as LAST, for automatic surgical phase recognition and tool presence detection. Our design features a two-branch transformer architecture with a novel and generic way to leverage video-level semantic information during network training. This is done by learning a non-linear compact presentation of the underlying semantic structure information of surgical videos through a transformer variational autoencoder (VAE) and by encouraging models to follow the learned statistical distributions. In other words, LAST is of structure-aware and favors predictions that lie on the extracted low dimensional data manifold. Validated on two public datasets of the cholecystectomy surgery, i.e., the Cholec80 dataset and the M2cai16 dataset, our method achieves better results than other state-of-the-art methods. Specifically, on the Cholec80 dataset, our method achieves an average accuracy of 93.12±4.71%, an average precision of 89.25±5.49%, an average recall of 90.10±5.45% and an average Jaccard of 81.11 ±7.62% for phase recognition, and an average mAP of 95.15±3.87% for tool presence detection. Similar superior performance is also observed when LAST is applied to the M2cai16 dataset.
Collapse
|
12
|
Li Y, Huang Y, Wang M, Zhao Y. An improved U-Net-based in situ root system phenotype segmentation method for plants. FRONTIERS IN PLANT SCIENCE 2023; 14:1115713. [PMID: 36998695 PMCID: PMC10043420 DOI: 10.3389/fpls.2023.1115713] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Accepted: 03/02/2023] [Indexed: 06/19/2023]
Abstract
The condition of plant root systems plays an important role in plant growth and development. The Minirhizotron method is an important tool to detect the dynamic growth and development of plant root systems. Currently, most researchers use manual methods or software to segment the root system for analysis and study. This method is time-consuming and requires a high level of operation. The complex background and variable environment in soils make traditional automated root system segmentation methods difficult to implement. Inspired by deep learning in medical imaging, which is used to segment pathological regions to help determine diseases, we propose a deep learning method for the root segmentation task. U-Net is chosen as the basis, and the encoder layer is replaced by the ResNet Block, which can reduce the training volume of the model and improve the feature utilization capability; the PSA module is added to the up-sampling part of U-Net to improve the segmentation accuracy of the object through multi-scale features and attention fusion; a new loss function is used to avoid the extreme imbalance and data imbalance problems of backgrounds such as root system and soil. After experimental comparison and analysis, the improved network demonstrates better performance. In the test set of the peanut root segmentation task, a pixel accuracy of 0.9917 and Intersection Over Union of 0.9548 were achieved, with an F1-score of 95.10. Finally, we used the Transfer Learning approach to conduct segmentation experiments on the corn in situ root system dataset. The experiments show that the improved network has a good learning effect and transferability.
Collapse
|