1
|
Alabi O, Vercauteren T, Shi M. Multitask learning in minimally invasive surgical vision: A review. Med Image Anal 2025; 101:103480. [PMID: 39938343 DOI: 10.1016/j.media.2025.103480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 11/11/2024] [Accepted: 01/21/2025] [Indexed: 02/14/2025]
Abstract
Minimally invasive surgery (MIS) has revolutionized many procedures and led to reduced recovery time and risk of patient injury. However, MIS poses additional complexity and burden on surgical teams. Data-driven surgical vision algorithms are thought to be key building blocks in the development of future MIS systems with improved autonomy. Recent advancements in machine learning and computer vision have led to successful applications in analysing videos obtained from MIS with the promise of alleviating challenges in MIS videos. Surgical scene and action understanding encompasses multiple related tasks that, when solved individually, can be memory-intensive, inefficient, and fail to capture task relationships. Multitask learning (MTL), a learning paradigm that leverages information from multiple related tasks to improve performance and aid generalization, is well-suited for fine-grained and high-level understanding of MIS data. This review provides a narrative overview of the current state-of-the-art MTL systems that leverage videos obtained from MIS. Beyond listing published approaches, we discuss the benefits and limitations of these MTL systems. Moreover, this manuscript presents an analysis of the literature for various application fields of MTL in MIS, including those with large models, highlighting notable trends, new directions of research, and developments.
Collapse
Affiliation(s)
- Oluwatosin Alabi
- School of Biomedical Engineering & Imaging Sciences, King's College London, United Kingdom
| | - Tom Vercauteren
- School of Biomedical Engineering & Imaging Sciences, King's College London, United Kingdom
| | - Miaojing Shi
- College of Electronic and Information Engineering, Tongji University, China; Shanghai Institute of Intelligent Science and Technology, Tongji University, China.
| |
Collapse
|
2
|
Bhatt N, Bhatt N, Prajapati P, Sorathiya V, Alshathri S, El-Shafai W. A Data-Centric Approach to improve performance of deep learning models. Sci Rep 2024; 14:22329. [PMID: 39333381 PMCID: PMC11436781 DOI: 10.1038/s41598-024-73643-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2024] [Accepted: 09/19/2024] [Indexed: 09/29/2024] Open
Abstract
The Artificial Intelligence has evolved and is now associated with Deep Learning, driven by availability of vast amount of data and computing power. Traditionally, researchers have adopted a Model-Centric Approach, focusing on developing new algorithms and models to enhance performance without altering the underlying data. However, Andrew Ng, a prominent figure in the AI community, has recently emphasized on better (quality) data rather than better models, which has given birth to Data Centric Approach, also known as Data Oriented technique. The transition from model oriented to data oriented approach has rapidly gained momentum within the realm of deep learning. Despite its promise, the Data-Centric Approach faces several challenges, including (a) generating high-quality data, (b) ensuring data privacy, and (c) addressing biases to achieve fairness in datasets. Currently, there has been limited effort in preparing quality data. Our work aims to address this gap by focusing on the generation of high-quality data through methods such as data augmentation, multi-stage hashing to eliminate duplicate instances, to detect and correct noisy labels, using confident learning. The experiments on popular datasets, namely MNIST, Fashion MNIST, and CIFAR-10 were performed by utilizing ResNet-18 as the common framework followed by both Model Centric and Data Centric Approach. Comparative performance analysis revealed that the Data Centric Approach consistently outperformed the Model Centric Approach by a relative margin of at least 3%. This finding highlights the potential for further exploration and adoption of the Data-Centric Approach in various domains such as healthcare, finance, education, and entertainment, where the quality of data could significantly enhance the performance.
Collapse
Affiliation(s)
- Nikita Bhatt
- Department of Computer Engineering, U & P U. Patel, CSPIT, CHARUSAT, Changa, Gujarat, India
| | - Nirav Bhatt
- Department of Artificial Intelligence and Machine Learning, CSPIT, CHARUSAT, Changa, Gujarat, India
| | - Purvi Prajapati
- Smt. K. D. Patel Department of Information Technology, CSPIT, CHARUSAT, Changa, Gujarat, India
| | - Vishal Sorathiya
- Faculty of Engineering and Technology, Parul Institute of Engineering and Technology, Parul University, Vadodara, Gujarat, India.
| | - Samah Alshathri
- Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh, 11671, Saudi Arabia
| | - Walid El-Shafai
- Security Engineering Lab, Computer Science Department, Prince Sultan University, Riyadh, 11586, Saudi Arabia
- Department of Electronics and Electrical Communications Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf, 32952, Egypt
| |
Collapse
|
3
|
Urrea C, Garcia-Garcia Y, Kern J. Improving Surgical Scene Semantic Segmentation through a Deep Learning Architecture with Attention to Class Imbalance. Biomedicines 2024; 12:1309. [PMID: 38927516 PMCID: PMC11201157 DOI: 10.3390/biomedicines12061309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 06/01/2024] [Accepted: 06/11/2024] [Indexed: 06/28/2024] Open
Abstract
This article addresses the semantic segmentation of laparoscopic surgery images, placing special emphasis on the segmentation of structures with a smaller number of observations. As a result of this study, adjustment parameters are proposed for deep neural network architectures, enabling a robust segmentation of all structures in the surgical scene. The U-Net architecture with five encoder-decoders (U-Net5ed), SegNet-VGG19, and DeepLabv3+ employing different backbones are implemented. Three main experiments are conducted, working with Rectified Linear Unit (ReLU), Gaussian Error Linear Unit (GELU), and Swish activation functions. The applied loss functions include Cross Entropy (CE), Focal Loss (FL), Tversky Loss (TL), Dice Loss (DiL), Cross Entropy Dice Loss (CEDL), and Cross Entropy Tversky Loss (CETL). The performance of Stochastic Gradient Descent with momentum (SGDM) and Adaptive Moment Estimation (Adam) optimizers is compared. It is qualitatively and quantitatively confirmed that DeepLabv3+ and U-Net5ed architectures yield the best results. The DeepLabv3+ architecture with the ResNet-50 backbone, Swish activation function, and CETL loss function reports a Mean Accuracy (MAcc) of 0.976 and Mean Intersection over Union (MIoU) of 0.977. The semantic segmentation of structures with a smaller number of observations, such as the hepatic vein, cystic duct, Liver Ligament, and blood, verifies that the obtained results are very competitive and promising compared to the consulted literature. The proposed selected parameters were validated in the YOLOv9 architecture, which showed an improvement in semantic segmentation compared to the results obtained with the original architecture.
Collapse
Affiliation(s)
- Claudio Urrea
- Electrical Engineering Department, Faculty of Engineering, University of Santiago of Chile, Las Sophoras 165, Estación Central, Santiago 9170020, Chile; (Y.G.-G.); (J.K.)
| | | | | |
Collapse
|
4
|
Kang YJ, Kim SJ, Seo SH, Lee S, Kim HS, Yoo JI. Assessment of Automated Identification of Phases in Videos of Total Hip Arthroplasty Using Deep Learning Techniques. Clin Orthop Surg 2024; 16:210-216. [PMID: 38562629 PMCID: PMC10973629 DOI: 10.4055/cios23280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 10/23/2023] [Accepted: 11/06/2023] [Indexed: 04/04/2024] Open
Abstract
Background As the population ages, the rates of hip diseases and fragility fractures are increasing, making total hip arthroplasty (THA) one of the best methods for treating elderly patients. With the increasing number of THA surgeries and diverse surgical methods, there is a need for standard evaluation protocols. This study aimed to use deep learning algorithms to classify THA videos and evaluate the accuracy of the labelling of these videos. Methods In our study, we manually annotated 7 phases in THA, including skin incision, broaching, exposure of acetabulum, acetabular reaming, acetabular cup positioning, femoral stem insertion, and skin closure. Within each phase, a second trained annotator marked the beginning and end of instrument usages, such as the skin blade, forceps, Bovie, suction device, suture material, retractor, rasp, femoral stem, acetabular reamer, head trial, and real head. Results In our study, we utilized YOLOv3 to collect 540 operating images of THA procedures and create a scene annotation model. The results of our study showed relatively high accuracy in the clear classification of surgical techniques such as skin incision and closure, broaching, acetabular reaming, and femoral stem insertion, with a mean average precision (mAP) of 0.75 or higher. Most of the equipment showed good accuracy of mAP 0.7 or higher, except for the suction device, suture material, and retractor. Conclusions Scene annotation for the instrument and phases in THA using deep learning techniques may provide potentially useful tools for subsequent documentation, assessment of skills, and feedback.
Collapse
Affiliation(s)
- Yang Jae Kang
- Division of Bio and Medical Big Data Department (BK4 Program) and Life Science Department, Gyeongsang National University, Jinju, Korea
| | - Shin June Kim
- Biomedical Research Institute, Inha University Hospital, Incheon, Korea
| | - Sung Hyo Seo
- Biomedical Research Institute, Gyeongsang National University Hospital, Jinju, Korea
| | - Sangyeob Lee
- Biomedical Research Institute, Gyeongsang National University Hospital, Jinju, Korea
| | - Hyeon Su Kim
- Biomedical Research Institute, Inha University Hospital, Incheon, Korea
| | - Jun-Il Yoo
- Department of Orthopedic Surgery, Inha University Hospital, Inha University College of Medicine, Incheon, Korea
| |
Collapse
|
5
|
Zhang B, Goel B, Sarhan MH, Goel VK, Abukhalil R, Kalesan B, Stottler N, Petculescu S. Surgical workflow recognition with temporal convolution and transformer for action segmentation. Int J Comput Assist Radiol Surg 2023; 18:785-794. [PMID: 36542253 DOI: 10.1007/s11548-022-02811-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 12/09/2022] [Indexed: 12/24/2022]
Abstract
PURPOSE Automatic surgical workflow recognition enabled by computer vision algorithms plays a key role in enhancing the learning experience of surgeons. It also supports building context-aware systems that allow better surgical planning and decision making which may in turn improve outcomes. Utilizing temporal information is crucial for recognizing context; hence, various recent approaches use recurrent neural networks or transformers to recognize actions. METHODS We design and implement a two-stage method for surgical workflow recognition. We utilize R(2+1)D for video clip modeling in the first stage. We propose Action Segmentation Temporal Convolutional Transformer (ASTCFormer) network for full video modeling in the second stage. ASTCFormer utilizes action segmentation transformers (ASFormers) and temporal convolutional networks (TCNs) to build a temporally aware surgical workflow recognition system. RESULTS We compare the proposed ASTCFormer with recurrent neural networks, multi-stage TCN, and ASFormer approaches. The comparison is done on a dataset comprised of 207 robotic and laparoscopic cholecystectomy surgical videos annotated for 7 surgical phases. The proposed method outperforms the compared methods achieving a [Formula: see text] relative improvement in the average segmental F1-score over the state-of-the-art ASFormer method. Moreover, our proposed method achieves state-of-the-art results on the publicly available Cholec80 dataset. CONCLUSION The improvement in the results when using the proposed method suggests that temporal context could be better captured when adding information from TCN to the ASFormer paradigm. This addition leads to better surgical workflow recognition.
Collapse
Affiliation(s)
- Bokai Zhang
- Johnson & Johnson MedTech, 1100 Olive Way, Suite 1100, Seattle, 98101, WA, USA.
| | - Bharti Goel
- Johnson & Johnson MedTech, 5490 Great America Pkwy, Santa Clara, CA, 95054, USA
| | - Mohammad Hasan Sarhan
- Johnson & Johnson MedTech, Robert-Koch-Straße 1, 22851, Norderstedt, Schleswig-Holstein, Germany
| | - Varun Kejriwal Goel
- Johnson & Johnson MedTech, 5490 Great America Pkwy, Santa Clara, CA, 95054, USA
| | - Rami Abukhalil
- Johnson & Johnson MedTech, 5490 Great America Pkwy, Santa Clara, CA, 95054, USA
| | - Bindu Kalesan
- Johnson & Johnson MedTech, 5490 Great America Pkwy, Santa Clara, CA, 95054, USA
| | - Natalie Stottler
- Johnson & Johnson MedTech, 1100 Olive Way, Suite 1100, Seattle, 98101, WA, USA
| | - Svetlana Petculescu
- Johnson & Johnson MedTech, 1100 Olive Way, Suite 1100, Seattle, 98101, WA, USA
| |
Collapse
|