1
|
De Biase A, Ziegfeld L, Sijtsema NM, Steenbakkers R, Wijsman R, van Dijk LV, Langendijk JA, Cnossen F, van Ooijen P. Probability maps for deep learning-based head and neck tumor segmentation: Graphical User Interface design and test. Comput Biol Med 2024; 177:108675. [PMID: 38820779 DOI: 10.1016/j.compbiomed.2024.108675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 05/27/2024] [Accepted: 05/27/2024] [Indexed: 06/02/2024]
Abstract
BACKGROUND The different tumor appearance of head and neck cancer across imaging modalities, scanners, and acquisition parameters accounts for the highly subjective nature of the manual tumor segmentation task. The variability of the manual contours is one of the causes of the lack of generalizability and the suboptimal performance of deep learning (DL) based tumor auto-segmentation models. Therefore, a DL-based method was developed that outputs predicted tumor probabilities for each PET-CT voxel in the form of a probability map instead of one fixed contour. The aim of this study was to show that DL-generated probability maps for tumor segmentation are clinically relevant, intuitive, and a more suitable solution to assist radiation oncologists in gross tumor volume segmentation on PET-CT images of head and neck cancer patients. METHOD A graphical user interface (GUI) was designed, and a prototype was developed to allow the user to interact with tumor probability maps. Furthermore, a user study was conducted where nine experts in tumor delineation interacted with the interface prototype and its functionality. The participants' experience was assessed qualitatively and quantitatively. RESULTS The interviews with radiation oncologists revealed their preference for using a rainbow colormap to visualize tumor probability maps during contouring, which they found intuitive. They also appreciated the slider feature, which facilitated interaction by allowing the selection of threshold values to create single contours for editing and use as a starting point. Feedback on the prototype highlighted its excellent usability and positive integration into clinical workflows. CONCLUSIONS This study shows that DL-generated tumor probability maps are explainable, transparent, intuitive and a better alternative to the single output of tumor segmentation models.
Collapse
Affiliation(s)
- Alessia De Biase
- Department of Radiation Oncology, University Medical Center Groningen (UMCG), 9700 RB, Groningen, the Netherlands; Data Science Center in Health (DASH), University Medical Center Groningen (UMCG), 9700 RB, Groningen, the Netherlands.
| | - Liv Ziegfeld
- University of Groningen, University of Groningen (RUG), 9700 AK, Groningen, the Netherlands
| | - Nanna Maria Sijtsema
- Department of Radiation Oncology, University Medical Center Groningen (UMCG), 9700 RB, Groningen, the Netherlands
| | - Roel Steenbakkers
- Department of Radiation Oncology, University Medical Center Groningen (UMCG), 9700 RB, Groningen, the Netherlands
| | - Robin Wijsman
- Department of Radiation Oncology, University Medical Center Groningen (UMCG), 9700 RB, Groningen, the Netherlands
| | - Lisanne V van Dijk
- Department of Radiation Oncology, University Medical Center Groningen (UMCG), 9700 RB, Groningen, the Netherlands
| | - Johannes A Langendijk
- Department of Radiation Oncology, University Medical Center Groningen (UMCG), 9700 RB, Groningen, the Netherlands
| | - Fokie Cnossen
- Department of Artificial Intelligence, Bernoulli Institute of Mathematics, Computer Science and Artificial Intelligence, University of Groningen (RUG), 9700 AK, Groningen, the Netherlands
| | - Peter van Ooijen
- Department of Radiation Oncology, University Medical Center Groningen (UMCG), 9700 RB, Groningen, the Netherlands; Data Science Center in Health (DASH), University Medical Center Groningen (UMCG), 9700 RB, Groningen, the Netherlands
| |
Collapse
|
2
|
Dei D, Lambri N, Crespi L, Brioso RC, Loiacono D, Clerici E, Bellu L, De Philippis C, Navarria P, Bramanti S, Carlo-Stella C, Rusconi R, Reggiori G, Tomatis S, Scorsetti M, Mancosu P. Deep learning and atlas-based models to streamline the segmentation workflow of total marrow and lymphoid irradiation. LA RADIOLOGIA MEDICA 2024; 129:515-523. [PMID: 38308062 DOI: 10.1007/s11547-024-01760-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 01/03/2024] [Indexed: 02/04/2024]
Abstract
PURPOSE To improve the workflow of total marrow and lymphoid irradiation (TMLI) by enhancing the delineation of organs at risk (OARs) and clinical target volume (CTV) using deep learning (DL) and atlas-based (AB) segmentation models. MATERIALS AND METHODS Ninety-five TMLI plans optimized in our institute were analyzed. Two commercial DL software were tested for segmenting 18 OARs. An AB model for lymph node CTV (CTV_LN) delineation was built using 20 TMLI patients. The AB model was evaluated on 20 independent patients, and a semiautomatic approach was tested by correcting the automatic contours. The generated OARs and CTV_LN contours were compared to manual contours in terms of topological agreement, dose statistics, and time workload. A clinical decision tree was developed to define a specific contouring strategy for each OAR. RESULTS The two DL models achieved a median [interquartile range] dice similarity coefficient (DSC) of 0.84 [0.71;0.93] and 0.85 [0.70;0.93] across the OARs. The absolute median Dmean difference between manual and the two DL models was 2.0 [0.7;6.6]% and 2.4 [0.9;7.1]%. The AB model achieved a median DSC of 0.70 [0.66;0.74] for CTV_LN delineation, increasing to 0.94 [0.94;0.95] after manual revision, with minimal Dmean differences. Since September 2022, our institution has implemented DL and AB models for all TMLI patients, reducing from 5 to 2 h the time required to complete the entire segmentation process. CONCLUSION DL models can streamline the TMLI contouring process of OARs. Manual revision is still necessary for lymph node delineation using AB models.
Collapse
Affiliation(s)
- Damiano Dei
- Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, 20072, Pieve Emanuele, Milan, Italy
- Department of Radiotherapy and Radiosurgery, IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089, Rozzano, Milan, Italy
| | - Nicola Lambri
- Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, 20072, Pieve Emanuele, Milan, Italy.
- Department of Radiotherapy and Radiosurgery, IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089, Rozzano, Milan, Italy.
| | - Leonardo Crespi
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
- Health Data Science Centre, Human Technopole, Milan, Italy
| | - Ricardo Coimbra Brioso
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
| | - Daniele Loiacono
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
| | - Elena Clerici
- Department of Radiotherapy and Radiosurgery, IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089, Rozzano, Milan, Italy
| | - Luisa Bellu
- Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, 20072, Pieve Emanuele, Milan, Italy
- Department of Radiotherapy and Radiosurgery, IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089, Rozzano, Milan, Italy
| | - Chiara De Philippis
- Department of Oncology and Hematology, IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089, Rozzano, Milan, Italy
| | - Pierina Navarria
- Department of Radiotherapy and Radiosurgery, IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089, Rozzano, Milan, Italy
| | - Stefania Bramanti
- Department of Oncology and Hematology, IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089, Rozzano, Milan, Italy
| | - Carmelo Carlo-Stella
- Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, 20072, Pieve Emanuele, Milan, Italy
- Department of Oncology and Hematology, IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089, Rozzano, Milan, Italy
| | - Roberto Rusconi
- Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, 20072, Pieve Emanuele, Milan, Italy
- Department of Radiotherapy and Radiosurgery, IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089, Rozzano, Milan, Italy
| | - Giacomo Reggiori
- Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, 20072, Pieve Emanuele, Milan, Italy
- Department of Radiotherapy and Radiosurgery, IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089, Rozzano, Milan, Italy
| | - Stefano Tomatis
- Department of Radiotherapy and Radiosurgery, IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089, Rozzano, Milan, Italy
| | - Marta Scorsetti
- Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini 4, 20072, Pieve Emanuele, Milan, Italy
- Department of Radiotherapy and Radiosurgery, IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089, Rozzano, Milan, Italy
| | - Pietro Mancosu
- Department of Radiotherapy and Radiosurgery, IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089, Rozzano, Milan, Italy
| |
Collapse
|
3
|
Maes D, Gates EDH, Meyer J, Kang J, Nguyen BNT, Lavilla M, Melancon D, Weg ES, Tseng YD, Lim A, Bowen SR. Framework for Radiation Oncology Department-wide Evaluation and Implementation of Commercial Artificial Intelligence Autocontouring. Pract Radiat Oncol 2024; 14:e150-e158. [PMID: 37935308 DOI: 10.1016/j.prro.2023.10.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 10/19/2023] [Accepted: 10/22/2023] [Indexed: 11/09/2023]
Abstract
PURPOSE Artificial intelligence (AI)-based autocontouring in radiation oncology has potential benefits such as standardization and time savings. However, commercial AI solutions require careful evaluation before clinical integration. We developed a multidimensional evaluation method to test pretrained AI-based automated contouring solutions across a network of clinics. METHODS AND MATERIALS Curated data included 121 patient planning computed tomography (CT) scans with a total of 859 clinically approved contours used for treatment from 4 clinics. Regions of interest (ROIs) were generated with 3 commercial AI-based automated contouring software solutions (AI1, AI2, AI3) spanning the following disease sites: brain, head and neck (H&N), thorax, abdomen, and pelvis. Quantitative agreement between AI-generated and clinical contours was measured by Dice similarity coefficient (DSC) and Hausdorff distance (HD). Qualitative assessment was performed by multiple experts scoring blinded AI-contours using a Likert scale. Workflow and usability surveying was also conducted. RESULTS AI1, AI2, and AI3 contours had high quantitative agreement in 27.8%, 32.8%, and 34.1% of cases (DSC >0.9), performing well in pelvis (median DSC = 0.86/0.88/0.91) and thorax (median DSC = 0.91/0.89/0.91). All 3 solutions had low quantitative agreement in 7.4%, 8.8%, and 6.1% of cases (DSC <0.5), performing worse in brain (median DSC = 0.65/0.78/0.75) and H&N (median DSC = 0.76/0.80/0.81). Qualitatively, AI1 and AI2 contours were acceptable (rated 1-2) with at most minor edits in 70.7% and 74.6% of ROIs (2906 ratings), higher for abdomen (AI1: 79.2%) and thorax (AI2: 90.2%), and lower for H&N (29.0/35.6%). An end-user survey showed strong user preference for full automation and mixed preferences for accuracy versus total number of structures generated. CONCLUSIONS Our evaluation method provided a comprehensive analysis of both quantitative and qualitative measures of commercially available pretrained AI autocontouring algorithms. The evaluation framework served as a roadmap for clinical integration that aligned with user workflow preference.
Collapse
Affiliation(s)
- Dominic Maes
- Department of Radiation Oncology, Fred Hutchinson Cancer Center, Seattle, Washington; Department of Radiation Oncology, University of Washington, Seattle, Washington.
| | - Evan D H Gates
- Department of Radiation Oncology, University of Washington, Seattle, Washington
| | - Juergen Meyer
- Department of Radiation Oncology, Fred Hutchinson Cancer Center, Seattle, Washington; Department of Radiation Oncology, University of Washington, Seattle, Washington
| | - John Kang
- Department of Radiation Oncology, University of Washington, Seattle, Washington
| | - Bao-Ngoc Thi Nguyen
- Department of Radiation Oncology, Fred Hutchinson Cancer Center, Seattle, Washington
| | - Myra Lavilla
- Department of Radiation Oncology, Fred Hutchinson Cancer Center, Seattle, Washington
| | - Dustin Melancon
- Department of Radiation Oncology, University of Washington, Seattle, Washington
| | - Emily S Weg
- Department of Radiation Oncology, Fred Hutchinson Cancer Center, Seattle, Washington; Department of Radiation Oncology, University of Washington, Seattle, Washington
| | - Yolanda D Tseng
- Department of Radiation Oncology, University of Washington, Seattle, Washington; Clinical Research Division, Fred Hutchinson Cancer Center, Seattle, Washington
| | - Andrew Lim
- Department of Radiation Oncology, Fred Hutchinson Cancer Center, Seattle, Washington; Department of Radiation Oncology, University of Washington, Seattle, Washington; Department of Radiation Oncology, University of Southern California, Los Angeles, California
| | - Stephen R Bowen
- Department of Radiation Oncology, Fred Hutchinson Cancer Center, Seattle, Washington; Department of Radiation Oncology, University of Washington, Seattle, Washington; Department of Radiology, University of Washington, Seattle, Washington
| |
Collapse
|
4
|
Temple SWP, Rowbottom CG. Gross failure rates and failure modes for a commercial AI-based auto-segmentation algorithm in head and neck cancer patients. J Appl Clin Med Phys 2024:e14273. [PMID: 38263866 DOI: 10.1002/acm2.14273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 12/15/2023] [Accepted: 12/20/2023] [Indexed: 01/25/2024] Open
Abstract
PURPOSE Artificial intelligence (AI) based commercial software can be used to automatically delineate organs at risk (OAR), with potential for efficiency savings in the radiotherapy treatment planning pathway, and reduction of inter- and intra-observer variability. There has been little research investigating gross failure rates and failure modes of such systems. METHOD 50 head and neck (H&N) patient data sets with "gold standard" contours were compared to AI-generated contours to produce expected mean and standard deviation values for the Dice Similarity Coefficient (DSC), for four common H&N OARs (brainstem, mandible, left and right parotid). An AI-based commercial system was applied to 500 H&N patients. AI-generated contours were compared to manual contours, outlined by an expert human, and a gross failure was set at three standard deviations below the expected mean DSC. Failures were inspected to assess reason for failure of the AI-based system with failures relating to suboptimal manual contouring censored. True failures were classified into 4 sub-types (setup position, anatomy, image artefacts and unknown). RESULTS There were 24 true failures of the AI-based commercial software, a gross failure rate of 1.2%. Fifteen failures were due to patient anatomy, four were due to dental image artefacts, three were due to patient position and two were unknown. True failure rates by OAR were 0.4% (brainstem), 2.2% (mandible), 1.4% (left parotid) and 0.8% (right parotid). CONCLUSION True failures of the AI-based system were predominantly associated with a non-standard element within the CT scan. It is likely that these non-standard elements were the reason for the gross failure, and suggests that patient datasets used to train the AI model did not contain sufficient heterogeneity of data. Regardless of the reasons for failure, the true failure rate for the AI-based system in the H&N region for the OARs investigated was low (∼1%).
Collapse
Affiliation(s)
- Simon W P Temple
- Medical Physics Department, The Clatterbridge Cancer Centre NHS Foundation Trust, Liverpool, UK
| | - Carl G Rowbottom
- Medical Physics Department, The Clatterbridge Cancer Centre NHS Foundation Trust, Liverpool, UK
- Department of Physics, University of Liverpool, Liverpool, UK
| |
Collapse
|
5
|
Costea M, Zlate A, Serre AA, Racadot S, Baudier T, Chabaud S, Grégoire V, Sarrut D, Biston MC. Evaluation of different algorithms for automatic segmentation of head-and-neck lymph nodes on CT images. Radiother Oncol 2023; 188:109870. [PMID: 37634765 DOI: 10.1016/j.radonc.2023.109870] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 07/27/2023] [Accepted: 08/20/2023] [Indexed: 08/29/2023]
Abstract
PURPOSE To investigate the performance of 4 atlas-based (multi-ABAS) and 2 deep learning (DL) solutions for head-and-neck (HN) elective nodes (CTVn) automatic segmentation (AS) on CT images. MATERIAL AND METHODS Bilateral CTVn levels of 69 HN cancer patients were delineated on contrast-enhanced planning CT. Ten and 49 patients were used for atlas library and for training a mono-centric DL model, respectively. The remaining 20 patients were used for testing. Additionally, three commercial multi-ABAS methods and one commercial multi-centric DL solution were investigated. Quantitative evaluation was assessed using volumetric Dice Similarity Coefficient (DSC) and 95-percentile Hausdorff distance (HD95%). Blind evaluation was performed for 3 solutions by 4 physicians. One recorded the time needed for manual corrections. A dosimetric study was finally conducted using automated planning. RESULTS Overall DL solutions had better DSC and HD95% results than multi-ABAS methods. No statistically significant difference was found between the 2 DL solutions. However, the contours provided by multi-centric DL solution were preferred by all physicians and were also faster to correct (1.1 min vs 4.17 min, on average). Manual corrections for multi-ABAS contours took on average 6.52 min Overall, decreased contour accuracy was observed from CTVn2 to CTVn3 and to CTVn4. Using the AS contours in treatment planning resulted in underdosage of the elective target volume. CONCLUSION Among all methods, the multi-centric DL method showed the highest delineation accuracy and was better rated by experts. Manual corrections remain necessary to avoid elective target underdosage. Finally, AS contours help reducing the workload of manual delineation task.
Collapse
Affiliation(s)
- Madalina Costea
- Centre Léon Bérard, 28 rue Laennec, LYON 69373 Cedex 08, France; CREATIS, CNRS UMR5220, Inserm U1044, INSA-Lyon, Université Lyon 1, Villeurbanne, France
| | | | | | | | - Thomas Baudier
- Centre Léon Bérard, 28 rue Laennec, LYON 69373 Cedex 08, France; CREATIS, CNRS UMR5220, Inserm U1044, INSA-Lyon, Université Lyon 1, Villeurbanne, France
| | - Sylvie Chabaud
- Unité de Biostatistique et d'Evaluation des Thérapeutiques, Centre Léon Bérard, Lyon 69373, France
| | | | - David Sarrut
- Centre Léon Bérard, 28 rue Laennec, LYON 69373 Cedex 08, France; CREATIS, CNRS UMR5220, Inserm U1044, INSA-Lyon, Université Lyon 1, Villeurbanne, France
| | - Marie-Claude Biston
- Centre Léon Bérard, 28 rue Laennec, LYON 69373 Cedex 08, France; CREATIS, CNRS UMR5220, Inserm U1044, INSA-Lyon, Université Lyon 1, Villeurbanne, France.
| |
Collapse
|
6
|
Grohmann M, Petersen C, Todorovic M. Benefits and considerations in using a novel computed tomography system optimized for radiotherapy planning. Phys Imaging Radiat Oncol 2023; 28:100510. [PMID: 38054031 PMCID: PMC10694773 DOI: 10.1016/j.phro.2023.100510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 11/08/2023] [Accepted: 11/08/2023] [Indexed: 12/07/2023] Open
Abstract
In this study, we evaluated a novel 16-bit computed tomography (CT) system optimized for radiotherapy planning. Over six months, using various protocols, we conducted 616 scans, with an average of four CT series per session imported into our treatment planning software (TPS). The direct density (DD) reconstruction enabled a single CT number calibration curve for multiple tube voltages. Metal artifacts could be effectively reduced. The 16-bit character permitted dose calculation in high-density regions, while TPS integration challenges remained. In conclusion, our findings emphasize the system's potential benefits and considerations in radiotherapy workflows.
Collapse
Affiliation(s)
- Maximilian Grohmann
- Department of Radiotherapy and Radiation Oncology, University Medical Center Hamburg-Eppendorf, Martinistr. 52, 20246 Hamburg, Germany
| | - Cordula Petersen
- Department of Radiotherapy and Radiation Oncology, University Medical Center Hamburg-Eppendorf, Martinistr. 52, 20246 Hamburg, Germany
| | - Manuel Todorovic
- Department of Radiotherapy and Radiation Oncology, University Medical Center Hamburg-Eppendorf, Martinistr. 52, 20246 Hamburg, Germany
| |
Collapse
|
7
|
De Kerf G, Claessens M, Raouassi F, Mercier C, Stas D, Ost P, Dirix P, Verellen D. A geometry and dose-volume based performance monitoring of artificial intelligence models in radiotherapy treatment planning for prostate cancer. Phys Imaging Radiat Oncol 2023; 28:100494. [PMID: 37809056 PMCID: PMC10550805 DOI: 10.1016/j.phro.2023.100494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 09/20/2023] [Accepted: 09/20/2023] [Indexed: 10/10/2023] Open
Abstract
Background and Purpose Clinical Artificial Intelligence (AI) implementations lack ground-truth when applied on real-world data. This study investigated how combined geometrical and dose-volume metrics can be used as performance monitoring tools to detect clinically relevant candidates for model retraining. Materials and Methods Fifty patients were analyzed for both AI-segmentation and planning. For AI-segmentation, geometrical (Standard Surface Dice 3 mm and Local Surface Dice 3 mm) and dose-volume based parameters were calculated for two organs (bladder and anorectum) to compare AI output against the clinically corrected structure. A Local Surface Dice was introduced to detect geometrical changes in the vicinity of the target volumes, while an Absolute Dose Difference (ADD) evaluation increased focus on dose-volume related changes. AI-planning performance was evaluated using clinical goal analysis in combination with volume and target overlap metrics. Results The Local Surface Dice reported equal or lower values compared to the Standard Surface Dice (anorectum: (0.93 ± 0.11) vs (0.98 ± 0.04); bladder: (0.97 ± 0.06) vs (0.98 ± 0.04)). The ADD metric showed a difference of (0.9 ± 0.8)Gy for the anorectum D 1 cm 3 . The bladder D 5cm 3 reported a difference of (0.7 ± 1.5)Gy. Mandatory clinical goals were fulfilled in 90 % of the DLP plans. Conclusions Combining dose-volume and geometrical metrics allowed detection of clinically relevant changes, applied to both auto-segmentation and auto-planning output and the Local Surface Dice was more sensitive to local changes compared to the Standard Surface Dice. This monitoring is able to evaluate AI behavior in clinical practice and allows candidate selection for active learning.
Collapse
Affiliation(s)
- Geert De Kerf
- Department of Radiation Oncology, Iridium Netwerk, Wilrijk (Antwerp), Belgium
| | - Michaël Claessens
- Department of Radiation Oncology, Iridium Netwerk, Wilrijk (Antwerp), Belgium
- Centre for Oncological Research (CORE), Integrated Personalized and Precision Oncology Network (IPPON), University of Antwerp, Antwerp, Belgium
| | - Fadoua Raouassi
- Department of Radiation Oncology, Iridium Netwerk, Wilrijk (Antwerp), Belgium
| | - Carole Mercier
- Department of Radiation Oncology, Iridium Netwerk, Wilrijk (Antwerp), Belgium
- Centre for Oncological Research (CORE), Integrated Personalized and Precision Oncology Network (IPPON), University of Antwerp, Antwerp, Belgium
| | - Daan Stas
- Department of Radiation Oncology, Iridium Netwerk, Wilrijk (Antwerp), Belgium
- Faculty of Medicine and Health Sciences, University of Antwerp, Antwerp, Belgium
| | - Piet Ost
- Department of Radiation Oncology, Iridium Netwerk, Wilrijk (Antwerp), Belgium
- Centre for Oncological Research (CORE), Integrated Personalized and Precision Oncology Network (IPPON), University of Antwerp, Antwerp, Belgium
| | - Piet Dirix
- Department of Radiation Oncology, Iridium Netwerk, Wilrijk (Antwerp), Belgium
- Centre for Oncological Research (CORE), Integrated Personalized and Precision Oncology Network (IPPON), University of Antwerp, Antwerp, Belgium
| | - Dirk Verellen
- Department of Radiation Oncology, Iridium Netwerk, Wilrijk (Antwerp), Belgium
- Centre for Oncological Research (CORE), Integrated Personalized and Precision Oncology Network (IPPON), University of Antwerp, Antwerp, Belgium
| |
Collapse
|
8
|
Lempart M, Scherman J, Nilsson MP, Jamtheim Gustafsson C. Deep learning-based classification of organs at risk and delineation guideline in pelvic cancer radiation therapy. J Appl Clin Med Phys 2023; 24:e14022. [PMID: 37177830 PMCID: PMC10476996 DOI: 10.1002/acm2.14022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 04/13/2023] [Accepted: 04/19/2023] [Indexed: 05/15/2023] Open
Abstract
Deep learning (DL) models for radiation therapy (RT) image segmentation require accurately annotated training data. Multiple organ delineation guidelines exist; however, information on the used guideline is not provided with the delineation. Extraction of training data with coherent guidelines can therefore be challenging. We present a supervised classification method for pelvis structure delineations where bowel cavity, femoral heads, bladder, and rectum data, with two guidelines, were classified. The impact on DL-based segmentation quality using mixed guideline training data was also demonstrated. Bowel cavity was manually delineated on CT images for anal cancer patients (n = 170) according to guidelines Devisetty and RTOG. The DL segmentation quality from using training data with coherent or mixed guidelines was investigated. A supervised 3D squeeze-and-excite SENet-154 model was trained to classify two bowel cavity delineation guidelines. In addition, a pelvis CT dataset with manual delineations from prostate cancer patients (n = 1854) was used where data with an alternative guideline for femoral heads, rectum, and bladder were generated using commercial software. The model was evaluated on internal (n = 200) and external test data (n = 99). By using mixed, compared to coherent, delineation guideline training data mean DICE score decreased 3% units, mean Hausdorff distance (95%) increased 5 mm and mean surface distance (MSD) increased 1 mm. The classification of bowel cavity test data achieved 99.8% unweighted classification accuracy, 99.9% macro average precision, 97.2% macro average recall, and 98.5% macro average F1. Corresponding metrics for the pelvis internal test data were all 99% or above and for the external pelvis test data they were 96.3%, 96.6%, 93.3%, and 94.6%. Impaired segmentation performance was observed for training data with mixed guidelines. The DL delineation classification models achieved excellent results on internal and external test data. This can facilitate automated guideline-specific data extraction while avoiding the need for consistent and correct structure labels.
Collapse
Affiliation(s)
- Michael Lempart
- Radiation Physics, Department of HematologyOncology, and Radiation PhysicsSkåne University HospitalLundSweden
- Department of Translational MedicineMedical Radiation PhysicsLund UniversityMalmöSweden
| | - Jonas Scherman
- Radiation Physics, Department of HematologyOncology, and Radiation PhysicsSkåne University HospitalLundSweden
| | - Martin P. Nilsson
- Department of HematologyOncology, and Radiation PhysicsSkåne University HospitalLundSweden
| | - Christian Jamtheim Gustafsson
- Radiation Physics, Department of HematologyOncology, and Radiation PhysicsSkåne University HospitalLundSweden
- Department of Translational MedicineMedical Radiation PhysicsLund UniversityMalmöSweden
| |
Collapse
|
9
|
McQuinlan Y, Brouwer CL, Lin Z, Gan Y, Sung Kim J, van Elmpt W, Gooding MJ. An investigation into the risk of population bias in deep learning autocontouring. Radiother Oncol 2023; 186:109747. [PMID: 37330053 DOI: 10.1016/j.radonc.2023.109747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 05/30/2023] [Accepted: 06/08/2023] [Indexed: 06/19/2023]
Abstract
BACKGROUND AND PURPOSE To date, data used in the development of Deep Learning-based automatic contouring (DLC) algorithms have been largely sourced from single geographic populations. This study aimed to evaluate the risk of population-based bias by determining whether the performance of an autocontouring system is impacted by geographic population. MATERIALS AND METHODS 80 Head Neck CT deidentified scans were collected from four clinics in Europe (n = 2) and Asia (n = 2). A single observer manually delineated 16 organs-at-risk in each. Subsequently, the data was contoured using a DLC solution, and trained using single institution (European) data. Autocontours were compared to manual delineations using quantitative measures. A Kruskal-Wallis test was used to test for any difference between populations. Clinical acceptability of automatic and manual contours to observers from each participating institution was assessed using a blinded subjective evaluation. RESULTS Seven organs showed a significant difference in volume between groups. Four organs showed statistical differences in quantitative similarity measures. The qualitative test showed greater variation in acceptance of contouring between observers than between data from different origins, with greater acceptance by the South Korean observers. CONCLUSION Much of the statistical difference in quantitative performance could be explained by the difference in organ volume impacting the contour similarity measures and the small sample size. However, the qualitative assessment suggests that observer perception bias has a greater impact on the apparent clinical acceptability than quantitatively observed differences. This investigation of potential geographic bias should extend to more patients, populations, and anatomical regions in the future.
Collapse
Affiliation(s)
| | - Charlotte L Brouwer
- University of Groningen, University Medical Center Groningen, Department of Radiation Oncology, Groningen, The Netherlands.
| | - Zhixiong Lin
- Shantou University Medical Centre, Guangdong, China.
| | - Yong Gan
- Shantou University Medical Centre, Guangdong, China.
| | - Jin Sung Kim
- Yonsei University Health System, Seoul, Republic of Korea.
| | - Wouter van Elmpt
- Department of Radiation Oncology (MAASTRO), GROW - School for Oncology and Reproduction, Maastricht University Medical Centre+, Maastricht, The Netherlands.
| | - Mark J Gooding
- Mirada Medical Ltd, Oxford, United Kingdom; Inpictura Ltd, Oxford, United Kingdom.
| |
Collapse
|
10
|
Boukerroui D, Vasquez Osorio E, Brunenberg E, Gooding MJ. Analytic calculations and synthetic shapes for validation of quantitative contour comparison software. Phys Imaging Radiat Oncol 2023; 26:100436. [PMID: 37089904 PMCID: PMC10119950 DOI: 10.1016/j.phro.2023.100436] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 03/24/2023] [Accepted: 03/29/2023] [Indexed: 04/05/2023] Open
Abstract
A high level of variability in reported values was observed in a recent survey of contour similarity measures (CSMs) calculation tools. Such variations in the output measurements prevent meaningful comparison between studies. The purpose of this study was to develop a dataset with analytically calculated gold standard values to facilitate standardization and ensure accuracy of CSM implementations. The dataset was generated in the Digital Imaging and Communications in Medicine (DICOM) format. Both the dataset and the software used for its generation are made publicly available to encourage robust testing of CSM implementations for accuracy, improving consistency between different implementations.
Collapse
|
11
|
Mackay K, Bernstein D, Glocker B, Kamnitsas K, Taylor A. A Review of the Metrics Used to Assess Auto-Contouring Systems in Radiotherapy. Clin Oncol (R Coll Radiol) 2023; 35:354-369. [PMID: 36803407 DOI: 10.1016/j.clon.2023.01.016] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Revised: 12/05/2022] [Accepted: 01/23/2023] [Indexed: 02/01/2023]
Abstract
Auto-contouring could revolutionise future planning of radiotherapy treatment. The lack of consensus on how to assess and validate auto-contouring systems currently limits clinical use. This review formally quantifies the assessment metrics used in studies published during one calendar year and assesses the need for standardised practice. A PubMed literature search was undertaken for papers evaluating radiotherapy auto-contouring published during 2021. Papers were assessed for types of metric and the methodology used to generate ground-truth comparators. Our PubMed search identified 212 studies, of which 117 met the criteria for clinical review. Geometric assessment metrics were used in 116 of 117 studies (99.1%). This includes the Dice Similarity Coefficient used in 113 (96.6%) studies. Clinically relevant metrics, such as qualitative, dosimetric and time-saving metrics, were less frequently used in 22 (18.8%), 27 (23.1%) and 18 (15.4%) of 117 studies, respectively. There was heterogeneity within each category of metric. Over 90 different names for geometric measures were used. Methods for qualitative assessment were different in all but two papers. Variation existed in the methods used to generate radiotherapy plans for dosimetric assessment. Consideration of editing time was only given in 11 (9.4%) papers. A single manual contour as a ground-truth comparator was used in 65 (55.6%) studies. Only 31 (26.5%) studies compared auto-contours to usual inter- and/or intra-observer variation. In conclusion, significant variation exists in how research papers currently assess the accuracy of automatically generated contours. Geometric measures are the most popular, however their clinical utility is unknown. There is heterogeneity in the methods used to perform clinical assessment. Considering the different stages of system implementation may provide a framework to decide the most appropriate metrics. This analysis supports the need for a consensus on the clinical implementation of auto-contouring.
Collapse
Affiliation(s)
- K Mackay
- The Institute of Cancer Research, London, UK; The Royal Marsden Hospital, London, UK.
| | - D Bernstein
- The Institute of Cancer Research, London, UK; The Royal Marsden Hospital, London, UK
| | - B Glocker
- Department of Computing, Imperial College London, South Kensington Campus, London, UK
| | - K Kamnitsas
- Department of Computing, Imperial College London, South Kensington Campus, London, UK; Department of Engineering Science, University of Oxford, Oxford, UK
| | - A Taylor
- The Institute of Cancer Research, London, UK; The Royal Marsden Hospital, London, UK
| |
Collapse
|
12
|
Costea M, Zlate A, Durand M, Baudier T, Grégoire V, Sarrut D, Biston MC. Comparison of atlas-based and deep learning methods for organs at risk delineation on head-and-neck CT images using an automated treatment planning system. Radiother Oncol 2022; 177:61-70. [PMID: 36328093 DOI: 10.1016/j.radonc.2022.10.029] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 10/21/2022] [Accepted: 10/23/2022] [Indexed: 11/06/2022]
Abstract
BACKGROUND AND PURPOSE To investigate the performance of head-and-neck (HN) organs-at-risk (OAR) automatic segmentation (AS) using four atlas-based (ABAS) and two deep learning (DL) solutions. MATERIAL AND METHODS All patients underwent iodine contrast-enhanced planning CT. Fourteen OAR were manually delineated. DL.1 and DL.2 solutions were trained with 63 mono-centric patients and > 1000 multi-centric patients, respectively. Ten and 15 patients with varied anatomies were selected for the atlas library and for testing, respectively. The evaluation was based on geometric indices (DICE coefficient and 95th percentile-Hausdorff Distance (HD95%)), time needed for manual corrections and clinical dosimetric endpoints obtained using automated treatment planning. RESULTS Both DICE and HD95% results indicated that DL algorithms generally performed better compared with ABAS algorithms for automatic segmentation of HN OAR. However, the hybrid-ABAS (ABAS.3) algorithm sometimes provided the highest agreement to the reference contours compared with the 2 DL. Compared with DL.2 and ABAS.3, DL.1 contours were the fastest to correct. For the 3 solutions, the differences in dose distributions obtained using AS contours and AS + manually corrected contours were not statistically significant. High dose differences could be observed when OAR contours were at short distances to the targets. However, this was not always interrelated. CONCLUSION DL methods generally showed higher delineation accuracy compared with ABAS methods for AS segmentation of HN OAR. Most ABAS contours had high conformity to the reference but were more time consuming than DL algorithms, especially when considering the computing time and the time spent on manual corrections.
Collapse
Affiliation(s)
- Madalina Costea
- Centre Léon Bérard, 28 rue Laennec, 69373 LYON Cedex 08, France; CREATIS, CNRS UMR5220, Inserm U1044, INSA-Lyon, Université Lyon 1, Villeurbanne, France
| | | | - Morgane Durand
- Centre Léon Bérard, 28 rue Laennec, 69373 LYON Cedex 08, France
| | - Thomas Baudier
- Centre Léon Bérard, 28 rue Laennec, 69373 LYON Cedex 08, France; CREATIS, CNRS UMR5220, Inserm U1044, INSA-Lyon, Université Lyon 1, Villeurbanne, France
| | | | - David Sarrut
- Centre Léon Bérard, 28 rue Laennec, 69373 LYON Cedex 08, France; CREATIS, CNRS UMR5220, Inserm U1044, INSA-Lyon, Université Lyon 1, Villeurbanne, France
| | - Marie-Claude Biston
- Centre Léon Bérard, 28 rue Laennec, 69373 LYON Cedex 08, France; CREATIS, CNRS UMR5220, Inserm U1044, INSA-Lyon, Université Lyon 1, Villeurbanne, France.
| |
Collapse
|
13
|
Cubero L, Castelli J, Simon A, de Crevoisier R, Acosta O, Pascau J. Deep Learning-Based Segmentation of Head and Neck Organs-at-Risk with Clinical Partially Labeled Data. ENTROPY (BASEL, SWITZERLAND) 2022; 24:e24111661. [PMID: 36421515 PMCID: PMC9689629 DOI: 10.3390/e24111661] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 10/28/2022] [Accepted: 11/09/2022] [Indexed: 06/06/2023]
Abstract
Radiotherapy is one of the main treatments for localized head and neck (HN) cancer. To design a personalized treatment with reduced radio-induced toxicity, accurate delineation of organs at risk (OAR) is a crucial step. Manual delineation is time- and labor-consuming, as well as observer-dependent. Deep learning (DL) based segmentation has proven to overcome some of these limitations, but requires large databases of homogeneously contoured image sets for robust training. However, these are not easily obtained from the standard clinical protocols as the OARs delineated may vary depending on the patient's tumor site and specific treatment plan. This results in incomplete or partially labeled data. This paper presents a solution to train a robust DL-based automated segmentation tool exploiting a clinical partially labeled dataset. We propose a two-step workflow for OAR segmentation: first, we developed longitudinal OAR-specific 3D segmentation models for pseudo-contour generation, completing the missing contours for some patients; with all OAR available, we trained a multi-class 3D convolutional neural network (nnU-Net) for final OAR segmentation. Results obtained in 44 independent datasets showed superior performance of the proposed methodology for the segmentation of fifteen OARs, with an average Dice score coefficient and surface Dice similarity coefficient of 80.59% and 88.74%. We demonstrated that the model can be straightforwardly integrated into the clinical workflow for standard and adaptive radiotherapy.
Collapse
Affiliation(s)
- Lucía Cubero
- Departamento de Bioingeniería, Universidad Carlos III de Madrid, 28911 Madrid, Spain
- Université Rennes, CLCC Eugène Marquis, Inserm, LTSI-UMR 1099, F-35000 Rennes, France
| | - Joël Castelli
- Université Rennes, CLCC Eugène Marquis, Inserm, LTSI-UMR 1099, F-35000 Rennes, France
| | - Antoine Simon
- Université Rennes, CLCC Eugène Marquis, Inserm, LTSI-UMR 1099, F-35000 Rennes, France
| | - Renaud de Crevoisier
- Université Rennes, CLCC Eugène Marquis, Inserm, LTSI-UMR 1099, F-35000 Rennes, France
| | - Oscar Acosta
- Université Rennes, CLCC Eugène Marquis, Inserm, LTSI-UMR 1099, F-35000 Rennes, France
| | - Javier Pascau
- Departamento de Bioingeniería, Universidad Carlos III de Madrid, 28911 Madrid, Spain
- Instituto de Investigación Sanitaria Gregorio Marañón, 28007 Madrid, Spain
| |
Collapse
|
14
|
Delpon G, Barateau A, Beneux A, Bessières I, Latorzeff I, Welmant J, Tallet A. [What do we need to deliver "online" adapted radiotherapy treatment plans?]. Cancer Radiother 2022; 26:794-802. [PMID: 36028418 DOI: 10.1016/j.canrad.2022.06.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 06/27/2022] [Accepted: 06/29/2022] [Indexed: 11/17/2022]
Abstract
During the joint SFRO/SFPM session of the 2019 congress, a state of the art of adaptive radiotherapy announced a strong impact in our clinical practice, in particular with the availability of treatment devices coupled to an MRI system. Three years later, it seems relevant to take stock of adaptive radiotherapy in practice, and especially the "online" strategy because it is indeed more and more accessible with recent hardware and software developments, such as coupled accelerators to a three-dimensional imaging device and algorithms based on artificial intelligence. However, the deployment of this promising strategy is complex because it contracts the usual time scale and upsets the usual organizations. So what do we need to deliver adapted treatment plans with an "online" strategy?
Collapse
Affiliation(s)
- G Delpon
- Institut de cancérologie de l'Ouest, Saint-Herblain et IMT Atlantique, Nantes université, CNRS/IN2P3, Subatech, Nantes, France.
| | - A Barateau
- Université Rennes, CLCC Eugène-Marquis, Inserm, LTSI-UMR 1099, Rennes, France
| | - A Beneux
- Hospices Civils de Lyon, Lyon, France
| | - I Bessières
- Centre Georges-François Leclerc, Dijon, France
| | | | - J Welmant
- Institut du cancer de Montpellier, Montpellier, France
| | - A Tallet
- Institut Paoli-Calmettes, Marseille, France
| |
Collapse
|
15
|
Khalal DM, Behouch A, Azizi H, Maalej N. Automatic segmentation of thoracic CT images using three deep learning models. Cancer Radiother 2022; 26:1008-1015. [PMID: 35803861 DOI: 10.1016/j.canrad.2022.02.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Revised: 01/10/2022] [Accepted: 02/09/2022] [Indexed: 11/18/2022]
Abstract
PURPOSE Deep learning (DL) techniques are widely used in medical imaging and in particular for segmentation. Indeed, manual segmentation of organs at risk (OARs) is time-consuming and suffers from inter- and intra-observer segmentation variability. Image segmentation using DL has given very promising results. In this work, we present and compare the results of segmentation of OARs and a clinical target volume (CTV) in thoracic CT images using three DL models. MATERIALS AND METHODS We used CT images of 52 patients with breast cancer from a public dataset. Automatic segmentation of the lungs, the heart and a CTV was performed using three models based on the U-Net architecture. Three metrics were used to quantify and compare the segmentation results obtained with these models: the Dice similarity coefficient (DSC), the Jaccard coefficient (J) and the Hausdorff distance (HD). RESULTS The obtained values of DSC, J and HD were presented for each segmented organ and for the three models. Examples of automatic segmentation were presented and compared to the corresponding ground truth delineations. Our values were also compared to recent results obtained by other authors. CONCLUSION The performance of three DL models was evaluated for the delineation of the lungs, the heart and a CTV. This study showed clearly that these 2D models based on the U-Net architecture can be used to delineate organs in CT images with a good performance compared to other models. Generally, the three models present similar performances. Using a dataset with more CT images, the three models should give better results.
Collapse
Affiliation(s)
- D M Khalal
- Department of Physics, Faculty of Sciences, Laboratory of dosing, analysis and characterization in high resolution, Ferhat Abbas Sétif 1 University, El Baz campus, 19137 Sétif, Algeria.
| | - A Behouch
- Department of Physics, Faculty of Sciences, Laboratory of dosing, analysis and characterization in high resolution, Ferhat Abbas Sétif 1 University, El Baz campus, 19137 Sétif, Algeria
| | - H Azizi
- Department of Physics, Faculty of Sciences, Laboratory of dosing, analysis and characterization in high resolution, Ferhat Abbas Sétif 1 University, El Baz campus, 19137 Sétif, Algeria
| | - N Maalej
- Department of Physics, Khalifa University, Abu Dhabi, United Arab Emirates
| |
Collapse
|