1
|
Kazemifar S, Balagopal A, Nguyen D, McGuire S, Hannan R, Jiang S, Owrangi A. Segmentation of the prostate and organs at risk in male pelvic CT images using deep learning. Biomed Phys Eng Express 2018. [DOI: 10.1088/2057-1976/aad100] [Citation(s) in RCA: 54] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
2
|
Marsousi M, Plataniotis KN, Stergiopoulos S. Kidney Detection in 3-D Ultrasound Imagery via Shape-to-Volume Registration Based on Spatially Aligned Neural Network. IEEE J Biomed Health Inform 2018; 23:227-242. [PMID: 29993823 DOI: 10.1109/jbhi.2018.2805777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
This paper introduces a computer-aided kidney shape detection method suitable for volumetric (3D) ultrasound images. Using shape and texture priors, the proposed method automates the process of kidney detection, which is a problem of great importance in computer-assisted trauma diagnosis. This paper introduces a new complex-valued implicit shape model, which represents the multiregional structure of the kidney shape. A spatially aligned neural network classifiers with complex-valued output is designed to classify voxels into background and multiregional structure of the kidney shape. The complex values of the shape model and classification outputs are selected and incorporated in a new similarity metric, such as the shape-to-volume registration process only fits the shape model on the actual kidney shape in input ultrasound volumes. The algorithm's accuracy and sensitivity are evaluated using both simulated and actual 3-D ultrasound images, and it is compared against the performance of the state of the art. The results support the claims about accuracy and robustness of the proposed kidney detection method, and statistical analysis validates its superiority over the state of the art.
Collapse
|
3
|
Segmentation of the hippocampus by transferring algorithmic knowledge for large cohort processing. Med Image Anal 2018; 43:214-228. [DOI: 10.1016/j.media.2017.11.004] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Revised: 09/14/2017] [Accepted: 11/06/2017] [Indexed: 01/27/2023]
|
4
|
Marsousi M, Plataniotis KN, Stergiopoulos S. An Automated Approach for Kidney Segmentation in Three-Dimensional Ultrasound Images. IEEE J Biomed Health Inform 2017; 21:1079-1094. [DOI: 10.1109/jbhi.2016.2580040] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
5
|
Huo Y, Asman AJ, Plassard AJ, Landman BA. Simultaneous total intracranial volume and posterior fossa volume estimation using multi-atlas label fusion. Hum Brain Mapp 2016; 38:599-616. [PMID: 27726243 DOI: 10.1002/hbm.23432] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2016] [Revised: 08/02/2016] [Accepted: 10/01/2016] [Indexed: 01/09/2023] Open
Abstract
Total intracranial volume (TICV) is an essential covariate in brain volumetric analyses. The prevalent brain imaging software packages provide automatic TICV estimates. FreeSurfer and FSL estimate TICV using a scaling factor while SPM12 accumulates probabilities of brain tissues. None of the three provide explicit skull/CSF boundary (SCB) since it is challenging to distinguish these dark structures in a T1-weighted image. However, explicit SCB not only leads to a natural way of obtaining TICV (i.e., counting voxels inside the skull) but also allows sub-definition of TICV, for example, the posterior fossa volume (PFV). In this article, they proposed to use multi-atlas label fusion to obtain TICV and PFV simultaneously. The main contributions are: (1) TICV and PFV are simultaneously obtained with explicit SCB from a single T1-weighted image. (2) TICV and PFV labels are added to the widely used BrainCOLOR atlases. (3) Detailed mathematical derivation of non-local spatial STAPLE (NLSS) label fusion is presented. As the skull is clearly distinguished in CT images, we use a semi-manual procedure to obtain atlases with TICV and PFV labels using 20 subjects who both have a MR and CT scan. The proposed method provides simultaneous TICV and PFV estimation while achieving more accurate TICV estimation compared with FreeSurfer, FSL, SPM12, and the previously proposed STAPLE based approach. The newly developed TICV and PFV labels for the OASIS BrainCOLOR atlases provide acceptable performance, which enables simultaneous TICV and PFV estimation during whole brain segmentation. The NLSS method and the new atlases have been made freely available. Hum Brain Mapp 38:599-616, 2017. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Yuankai Huo
- Electrical Engineering, Vanderbilt University, Nashville, Tennessee
| | - Andrew J Asman
- Electrical Engineering, Vanderbilt University, Nashville, Tennessee
| | | | - Bennett A Landman
- Electrical Engineering, Vanderbilt University, Nashville, Tennessee.,Computer Science, Vanderbilt University, Nashville, Tennessee.,Biomedical Engineering, Vanderbilt University, Nashville, Tennessee.,Radiology and Radiological Sciences, Vanderbilt University, Nashville, Tennessee.,Institute of Imaging Science, Vanderbilt University, Nashville, Tennessee
| |
Collapse
|
6
|
|
7
|
Investigation of Bias in Continuous Medical Image Label Fusion. PLoS One 2016; 11:e0155862. [PMID: 27258158 PMCID: PMC4892597 DOI: 10.1371/journal.pone.0155862] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2015] [Accepted: 05/05/2016] [Indexed: 11/30/2022] Open
Abstract
Image labeling is essential for analyzing morphometric features in medical imaging data. Labels can be obtained by either human interaction or automated segmentation algorithms, both of which suffer from errors. The Simultaneous Truth and Performance Level Estimation (STAPLE) algorithm for both discrete-valued and continuous-valued labels has been proposed to find the consensus fusion while simultaneously estimating rater performance. In this paper, we first show that the previously reported continuous STAPLE in which bias and variance are used to represent rater performance yields a maximum likelihood solution in which bias is indeterminate. We then analyze the major cause of the deficiency and evaluate two classes of auxiliary bias estimation processes, one that estimates the bias as part of the algorithm initialization and the other that uses a maximum a posteriori criterion with a priori probabilities on the rater bias. We compare the efficacy of six methods, three variants from each class, in simulations and through empirical human rater experiments. We comment on their properties, identify deficient methods, and propose effective methods as solution.
Collapse
|
8
|
Sevetlidis V, Giuffrida MV, Tsaftaris SA. Whole Image Synthesis Using a Deep Encoder-Decoder Network. SIMULATION AND SYNTHESIS IN MEDICAL IMAGING 2016. [DOI: 10.1007/978-3-319-46630-9_13] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
9
|
Iglesias JE, Sabuncu MR. Multi-atlas segmentation of biomedical images: A survey. Med Image Anal 2015; 24:205-219. [PMID: 26201875 PMCID: PMC4532640 DOI: 10.1016/j.media.2015.06.012] [Citation(s) in RCA: 353] [Impact Index Per Article: 39.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Revised: 06/12/2015] [Accepted: 06/15/2015] [Indexed: 10/23/2022]
Abstract
Multi-atlas segmentation (MAS), first introduced and popularized by the pioneering work of Rohlfing, et al. (2004), Klein, et al. (2005), and Heckemann, et al. (2006), is becoming one of the most widely-used and successful image segmentation techniques in biomedical applications. By manipulating and utilizing the entire dataset of "atlases" (training images that have been previously labeled, e.g., manually by an expert), rather than some model-based average representation, MAS has the flexibility to better capture anatomical variation, thus offering superior segmentation accuracy. This benefit, however, typically comes at a high computational cost. Recent advancements in computer hardware and image processing software have been instrumental in addressing this challenge and facilitated the wide adoption of MAS. Today, MAS has come a long way and the approach includes a wide array of sophisticated algorithms that employ ideas from machine learning, probabilistic modeling, optimization, and computer vision, among other fields. This paper presents a survey of published MAS algorithms and studies that have applied these methods to various biomedical problems. In writing this survey, we have three distinct aims. Our primary goal is to document how MAS was originally conceived, later evolved, and now relates to alternative methods. Second, this paper is intended to be a detailed reference of past research activity in MAS, which now spans over a decade (2003-2014) and entails novel methodological developments and application-specific solutions. Finally, our goal is to also present a perspective on the future of MAS, which, we believe, will be one of the dominant approaches in biomedical image segmentation.
Collapse
Affiliation(s)
| | - Mert R Sabuncu
- A.A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Harvard Medical School, USA.
| |
Collapse
|
10
|
De Luca V, Benz T, Kondo S, König L, Lübke D, Rothlübbers S, Somphone O, Allaire S, Lediju Bell MA, Chung DYF, Cifor A, Grozea C, Günther M, Jenne J, Kipshagen T, Kowarschik M, Navab N, Rühaak J, Schwaab J, Tanner C. The 2014 liver ultrasound tracking benchmark. Phys Med Biol 2015; 60:5571-99. [PMID: 26134417 PMCID: PMC5454593 DOI: 10.1088/0031-9155/60/14/5571] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
The Challenge on Liver Ultrasound Tracking (CLUST) was held in conjunction with the MICCAI 2014 conference to enable direct comparison of tracking methods for this application. This paper reports the outcome of this challenge, including setup, methods, results and experiences. The database included 54 2D and 3D sequences of the liver of healthy volunteers and tumor patients under free breathing. Participants had to provide the tracking results of 90% of the data (test set) for pre-defined point-landmarks (healthy volunteers) or for tumor segmentations (patient data). In this paper we compare the best six methods which participated in the challenge. Quantitative evaluation was performed by the organizers with respect to manual annotations. Results of all methods showed a mean tracking error ranging between 1.4 mm and 2.1 mm for 2D points, and between 2.6 mm and 4.6 mm for 3D points. Fusing all automatic results by considering the median tracking results, improved the mean error to 1.2 mm (2D) and 2.5 mm (3D). For all methods, the performance is still not comparable to human inter-rater variability, with a mean tracking error of 0.5–0.6 mm (2D) and 1.2–1.8 mm (3D). The segmentation task was fulfilled only by one participant, resulting in a Dice coefficient ranging from 76.7% to 92.3%. The CLUST database continues to be available and the online leader-board will be updated as an ongoing challenge.
Collapse
Affiliation(s)
- V De Luca
- Computer Vision Lab, ETH Zurich, 8092 Zurich, Switzerland
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Abstract
In this paper we analyze the properties of the well-known segmentation fusion algorithm STAPLE, using a novel inference technique that analytically marginalizes out all model parameters. We demonstrate both theoretically and empirically that when the number of raters is large, or when consensus regions are included in the model, STAPLE devolves into thresholding the average of the input segmentations. We further show that when the number of raters is small, the STAPLE result may not be the optimal segmentation truth estimate, and its model parameter estimates might not reflect the individual raters' actual segmentation performance. Our experiments indicate that these intrinsic weaknesses are frequently exacerbated by the presence of undesirable global optima and convergence issues. Together these results cast doubt on the soundness and usefulness of typical STAPLE outcomes.
Collapse
|
12
|
Akhondi-Asl A, Hoyte L, Lockhart ME, Warfield SK. A logarithmic opinion pool based STAPLE algorithm for the fusion of segmentations with associated reliability weights. IEEE TRANSACTIONS ON MEDICAL IMAGING 2014; 33:1997-2009. [PMID: 24951681 PMCID: PMC4264575 DOI: 10.1109/tmi.2014.2329603] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Pelvic floor dysfunction is common in women after childbirth and precise segmentation of magnetic resonance images (MRI) of the pelvic floor may facilitate diagnosis and treatment of patients. However, because of the complexity of its structures, manual segmentation of the pelvic floor is challenging and suffers from high inter and intra-rater variability of expert raters. Multiple template fusion algorithms are promising segmentation techniques for these types of applications, but they have been limited by imperfections in the alignment of templates to the target, and by template segmentation errors. A number of algorithms sought to improve segmentation performance by combining image intensities and template labels as two independent sources of information, carrying out fusion through local intensity weighted voting schemes. This class of approach is a form of linear opinion pooling, and achieves unsatisfactory performance for this application. We hypothesized that better decision fusion could be achieved by assessing the contribution of each template in comparison to a reference standard segmentation of the target image and developed a novel segmentation algorithm to enable automatic segmentation of MRI of the female pelvic floor. The algorithm achieves high performance by estimating and compensating for both imperfect registration of the templates to the target image and template segmentation inaccuracies. A local image similarity measure is used to infer a local reliability weight, which contributes to the fusion through a novel logarithmic opinion pooling. We evaluated our new algorithm in comparison to nine state-of-the-art segmentation methods and demonstrated our algorithm achieves the highest performance.
Collapse
Affiliation(s)
- Alireza Akhondi-Asl
- Computational Radiology Laboratory, Department of Radiology, Children's Hospital, 300 Longwood Avenue, Boston, MA, 02115, USA
| | - Lennox Hoyte
- Department of Obstetrics and Gynecology, University of South Florida, 2 Tampa General Circle, 6th oor, Tampa, FL 33606, USA
| | - Mark E. Lockhart
- Department of Radiology, University of Alabama at Birmingham, 1802 6th Avenue South, Birmingham, AL 35233, USA
| | - Simon K. Warfield
- Computational Radiology Laboratory, Department of Radiology, Children's Hospital, 300 Longwood Avenue, Boston, MA, 02115, USA
| |
Collapse
|
13
|
Binaghi E, Pedoia V, Balbi S. Collection and fuzzy estimation of truth labels in glial tumour segmentation studies. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2014. [DOI: 10.1080/21681163.2014.947006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
14
|
Modality propagation: coherent synthesis of subject-specific scans with data-driven regularization. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION : MICCAI ... INTERNATIONAL CONFERENCE ON MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION 2014; 16:606-13. [PMID: 24505717 DOI: 10.1007/978-3-642-40811-3_76] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
We propose a general database-driven framework for coherent synthesis of subject-specific scans of desired modality, which adopts and generalizes the patch-based label propagation (LP) strategy. While modality synthesis has received increased attention lately, current methods are mainly tailored to specific applications. On the other hand, the LP framework has been extremely successful for certain segmentation tasks, however, so far it has not been used for estimation of entities other than categorical segmentation labels. We approach the synthesis task as a modality propagation, and demonstrate that with certain modifications the LP framework can be generalized to continuous settings providing coherent synthesis of different modalities, beyond segmentation labels. To achieve high-quality estimates we introduce a new data-driven regularization scheme, in which we integrate intermediate estimates within an iterative search-and-synthesis strategy. To efficiently leverage population data and ensure coherent synthesis, we employ a spatio-population search space restriction. In experiments, we demonstrate the quality of synthesis of different MRI signals (T2 and DTI-FA) from a T1 input, and show a novel application of modality synthesis for abnormality detection in multi-channel MRI of brain tumor patients.
Collapse
|
15
|
Akhondi-Asl A, Warfield SK. Simultaneous truth and performance level estimation through fusion of probabilistic segmentations. IEEE TRANSACTIONS ON MEDICAL IMAGING 2013; 32:1840-52. [PMID: 23744673 PMCID: PMC3788853 DOI: 10.1109/tmi.2013.2266258] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Recent research has demonstrated that improved image segmentation can be achieved by multiple template fusion utilizing both label and intensity information. However, intensity weighted fusion approaches use local intensity similarity as a surrogate measure of local template quality for predicting target segmentation and do not seek to characterize template performance. This limits both the usefulness and accuracy of these techniques. Our work here was motivated by the observation that the local intensity similarity is a poor surrogate measure for direct comparison of the template image with the true image target segmentation. Although the true image target segmentation is not available, a high quality estimate can be inferred, and this in turn allows a principled estimate to be made of the local quality of each template at contributing to the target segmentation. We developed a fusion algorithm that uses probabilistic segmentations of the target image to simultaneously infer a reference standard segmentation of the target image and the local quality of each probabilistic segmentation. The concept of comparing templates to a hidden reference standard segmentation enables accurate assessments of the contribution of each template to inferring the target image segmentation to be made, and in practice leads to excellent target image segmentation. We have used the new algorithm for the multiple-template-based segmentation and parcellation of magnetic resonance images of the brain. Intensity and label map images of each one of the aligned templates are used to train a local Gaussian mixture model based classifier. Then, each classifier is used to compute the probabilistic segmentations of the target image. Finally, the generated probabilistic segmentations are fused together using the new fusion algorithm to obtain the segmentation of the target image. We evaluated our method in comparison to other state-of-the-art segmentation methods. We demonstrated that our new fusion algorithm has higher segmentation performance than these methods.
Collapse
Affiliation(s)
- Alireza Akhondi-Asl
- Computational Radiology Laboratory, Department of Radiology, Children’s Hospital, 300 Longwood Avenue, Boston, MA, 02115, USA
| | - Simon K. Warfield
- Computational Radiology Laboratory, Department of Radiology, Children’s Hospital, 300 Longwood Avenue, Boston, MA, 02115, USA
| |
Collapse
|
16
|
Abstract
We present a new fusion algorithm for the segmentation and parcellation of magnetic resonance (MR) images of the brain. Our algorithm is a parametric empirical Bayesian extension of the STAPLE algorithm which uses the observations to accurately estimate the prior distribution of the hidden ground truth using an expectation maximization (EM) algorithm. We use IBSR dataset for the evaluation of our fusion algorithm. We segment 128 principle gray and white matter structures of the brain using our novel method and eight other state-of-the-art algorithms in the literature. Our prior distribution estimation strategy improves the accuracy of the fusion algorithm. It was shown that our new fusion algorithm has superior performance compared to the other state-of-the-art fusion methods in the literature.
Collapse
|
17
|
Commowick O, Akhondi-Asl A, Warfield SK. Estimating a reference standard segmentation with spatially varying performance parameters: local MAP STAPLE. IEEE TRANSACTIONS ON MEDICAL IMAGING 2012; 31:1593-606. [PMID: 22562727 PMCID: PMC3496174 DOI: 10.1109/tmi.2012.2197406] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
We present a new algorithm, called local MAP STAPLE, to estimate from a set of multi-label segmentations both a reference standard segmentation and spatially varying performance parameters. It is based on a sliding window technique to estimate the segmentation and the segmentation performance parameters for each input segmentation. In order to allow for optimal fusion from the small amount of data in each local region, and to account for the possibility of labels not being observed in a local region of some (or all) input segmentations, we introduce prior probabilities for the local performance parameters through a new maximum a posteriori formulation of STAPLE. Further, we propose an expression to compute confidence intervals in the estimated local performance parameters. We carried out several experiments with local MAP STAPLE to characterize its performance and value for local segmentation evaluation. First, with simulated segmentations with known reference standard segmentation and spatially varying performance, we show that local MAP STAPLE performs better than both STAPLE and majority voting. Then we present evaluations with data sets from clinical applications. These experiments demonstrate that spatial adaptivity in segmentation performance is an important property to capture. We compared the local MAP STAPLE segmentations to STAPLE, and to previously published fusion techniques and demonstrate the superiority of local MAP STAPLE over other state-of-the-art algorithms.
Collapse
|
18
|
Gholipour A, Akhondi-Asl A, Estroff JA, Warfield SK. Multi-atlas multi-shape segmentation of fetal brain MRI for volumetric and morphometric analysis of ventriculomegaly. Neuroimage 2012; 60:1819-31. [PMID: 22500924 PMCID: PMC3329183 DOI: 10.1016/j.neuroimage.2012.01.128] [Citation(s) in RCA: 63] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2011] [Revised: 01/05/2012] [Accepted: 01/29/2012] [Indexed: 11/18/2022] Open
Abstract
The recent development of motion robust super-resolution fetal brain MRI holds out the potential for dramatic new advances in volumetric and morphometric analysis. Volumetric analysis based on volumetric and morphometric biomarkers of the developing fetal brain must include segmentation. Automatic segmentation of fetal brain MRI is challenging, however, due to the highly variable size and shape of the developing brain; possible structural abnormalities; and the relatively poor resolution of fetal MRI scans. To overcome these limitations, we present a novel, constrained, multi-atlas, multi-shape automatic segmentation method that specifically addresses the challenge of segmenting multiple structures with similar intensity values in subjects with strong anatomic variability. Accordingly, we have applied this method to shape segmentation of normal, dilated, or fused lateral ventricles for quantitative analysis of ventriculomegaly (VM), which is a pivotal finding in the earliest stages of fetal brain development, and warrants further investigation. Utilizing these innovative techniques, we introduce novel volumetric and morphometric biomarkers of VM comparing these values to those that are generated by standard methods of VM analysis, i.e., by measuring the ventricular atrial diameter (AD) on manually selected sections of 2D ultrasound or 2D MRI. To this end, we studied 25 normal and abnormal fetuses in the gestation age (GA) range of 19 to 39 weeks (mean=28.26, stdev=6.56). This heterogeneous dataset was essentially used to 1) validate our segmentation method for normal and abnormal ventricles; and 2) show that the proposed biomarkers may provide improved detection of VM as compared to the AD measurement.
Collapse
Affiliation(s)
- Ali Gholipour
- Computational Radiology Laboratory, Department of Radiology, Children’s Hospital Boston, and Harvard Medical School, Boston, MA, 02115 USA
| | - Alireza Akhondi-Asl
- Computational Radiology Laboratory, Department of Radiology, Children’s Hospital Boston, and Harvard Medical School, Boston, MA, 02115 USA
| | - Judy A. Estroff
- Advanced Fetal Care Center, Department of Radiology, Children’s Hospital Boston, and Harvard Medical School, Boston, MA, 02115 USA
| | - Simon K. Warfield
- Computational Radiology Laboratory, Department of Radiology, Children’s Hospital Boston, and Harvard Medical School, Boston, MA, 02115 USA
| |
Collapse
|
19
|
Xing F, Asman AJ, Prince JL, Landman BA. Finding Seeds for Segmentation Using Statistical Fusion. PROCEEDINGS OF SPIE--THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING 2012; 8314. [PMID: 23019385 DOI: 10.1117/12.911524] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Image labeling is an essential step for quantitative analysis of medical images. Many image labeling algorithms require seed identification in order to initialize segmentation algorithms such as region growing, graph cuts, and the random walker. Seeds are usually placed manually by human raters, which makes these algorithms semi-automatic and can be prohibitive for very large datasets. In this paper an automatic algorithm for placing seeds using multi-atlas registration and statistical fusion is proposed. Atlases containing the centers of mass of a collection of neuroanatomical objects are deformably registered in a training set to determine where these centers of mass go after labels transformed by registration. The biases of these transformations are determined and incorporated in a continuous form of Simultaneous Truth And Performance Level Estimation (STAPLE) fusion, thereby improving the estimates (on average) over a single registration strategy that does not incorporate bias or fusion. We evaluate this technique using real 3D brain MR image atlases and demonstrate its efficacy on correcting the data bias and reducing the fusion error.
Collapse
Affiliation(s)
- Fangxu Xing
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA 21218
| | | | | | | |
Collapse
|
20
|
Landman BA, Asman AJ, Scoggins AG, Bogovic JA, Xing F, Prince JL. Robust statistical fusion of image labels. IEEE TRANSACTIONS ON MEDICAL IMAGING 2012; 31:512-22. [PMID: 22010145 PMCID: PMC3262958 DOI: 10.1109/tmi.2011.2172215] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Image labeling and parcellation (i.e., assigning structure to a collection of voxels) are critical tasks for the assessment of volumetric and morphometric features in medical imaging data. The process of image labeling is inherently error prone as images are corrupted by noise and artifacts. Even expert interpretations are subject to subjectivity and the precision of the individual raters. Hence, all labels must be considered imperfect with some degree of inherent variability. One may seek multiple independent assessments to both reduce this variability and quantify the degree of uncertainty. Existing techniques have exploited maximum a posteriori statistics to combine data from multiple raters and simultaneously estimate rater reliabilities. Although quite successful, wide-scale application has been hampered by unstable estimation with practical datasets, for example, with label sets with small or thin objects to be labeled or with partial or limited datasets. As well, these approaches have required each rater to generate a complete dataset, which is often impossible given both human foibles and the typical turnover rate of raters in a research or clinical environment. Herein, we propose a robust approach to improve estimation performance with small anatomical structures, allow for missing data, account for repeated label sets, and utilize training/catch trial data. With this approach, numerous raters can label small, overlapping portions of a large dataset, and rater heterogeneity can be robustly controlled while simultaneously estimating a single, reliable label set and characterizing uncertainty. The proposed approach enables many individuals to collaborate in the construction of large datasets for labeling tasks (e.g., human parallel processing) and reduces the otherwise detrimental impact of rater unavailability.
Collapse
Affiliation(s)
- Bennett A. Landman
- Department of Electrical Engineering, Vanderbilt University, Nashville, TN, 37235 USA (phone: 615-322-2338; fax: 615-343-5459 )
| | - Andrew J. Asman
- Department of Electrical Engineering, Vanderbilt University, Nashville, TN, 37235 USA ()
| | - Andrew G. Scoggins
- Department of Electrical Engineering, Vanderbilt University, Nashville, TN, 37235 USA ()
| | - John A. Bogovic
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, 21218 USA ()
| | - Fangxu Xing
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, 21218 USA ()
| | - Jerry L. Prince
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, 21218 USA ()
| |
Collapse
|
21
|
Asman AJ, Landman BA. Robust statistical label fusion through COnsensus Level, Labeler Accuracy, and Truth Estimation (COLLATE). IEEE TRANSACTIONS ON MEDICAL IMAGING 2011; 30:1779-94. [PMID: 21536519 PMCID: PMC3150602 DOI: 10.1109/tmi.2011.2147795] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Segmentation and delineation of structures of interest in medical images is paramount to quantifying and characterizing structural, morphological, and functional correlations with clinically relevant conditions. The established gold standard for performing segmentation has been manual voxel-by-voxel labeling by a neuroanatomist expert. This process can be extremely time consuming, resource intensive and fraught with high inter-observer variability. Hence, studies involving characterizations of novel structures or appearances have been limited in scope (numbers of subjects), scale (extent of regions assessed), and statistical power. Statistical methods to fuse data sets from several different sources (e.g., multiple human observers) have been proposed to simultaneously estimate both rater performance and the ground truth labels. However, with empirical datasets, statistical fusion has been observed to result in visually inconsistent findings. So, despite the ease and elegance of a statistical approach, single observers and/or direct voting are often used in practice. Hence, rater performance is not systematically quantified and exploited during label estimation. To date, statistical fusion methods have relied on characterizations of rater performance that do not intrinsically include spatially varying models of rater performance. Herein, we present a novel, robust statistical label fusion algorithm to estimate and account for spatially varying performance. This algorithm, COnsensus Level, Labeler Accuracy and Truth Estimation (COLLATE), is based on the simple idea that some regions of an image are difficult to label (e.g., confusion regions: boundaries or low contrast areas) while other regions are intrinsically obvious (e.g., consensus regions: centers of large regions or high contrast edges). Unlike its predecessors, COLLATE estimates the consensus level of each voxel and estimates differing models of observer behavior in each region. We show that COLLATE provides significant improvement in label accuracy and rater assessment over previous fusion methods in both simulated and empirical datasets.
Collapse
Affiliation(s)
- Andrew J. Asman
- Department of Electrical Engineering, Vanderbilt University, Nashville, TN, 37235 USA (phone: 615-322-2338; fax: 615-343-5459; )
| | - Bennett A. Landman
- Department of Electrical Engineering, Vanderbilt University, Nashville, TN, 37235 USA ()
| |
Collapse
|
22
|
Shi F, Fan Y, Tang S, Gilmore JH, Lin W, Shen D. Neonatal brain image segmentation in longitudinal MRI studies. Neuroimage 2009; 49:391-400. [PMID: 19660558 DOI: 10.1016/j.neuroimage.2009.07.066] [Citation(s) in RCA: 159] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2009] [Revised: 07/20/2009] [Accepted: 07/24/2009] [Indexed: 11/29/2022] Open
Abstract
In the study of early brain development, tissue segmentation of neonatal brain MR images remains challenging because of the insufficient image quality due to the properties of developing tissues. Among various brain tissue segmentation algorithms, atlas-based brain image segmentation can potentially achieve good segmentation results on neonatal brain images. However, their performances rely on both the quality of the atlas and the spatial correspondence between the atlas and the to-be-segmented image. Moreover, it is difficult to build a population atlas for neonates due to the requirement of a large set of tissue-segmented neonatal brain images. To combat these obstacles, we present a longitudinal neonatal brain image segmentation framework by taking advantage of the longitudinal data acquired at late time-point to build a subject-specific tissue probabilistic atlas. Specifically, tissue segmentation of the neonatal brain is formulated as two iterative steps of bias correction and probabilistic-atlas-based tissue segmentation, along with the longitudinal atlas reconstructed by the late time image of the same subject. The proposed method has been evaluated qualitatively through visual inspection and quantitatively by comparing with manual delineations and two population-atlas-based segmentation methods. Experimental results show that the utilization of a subject-specific probabilistic atlas can substantially improve tissue segmentation of neonatal brain images.
Collapse
Affiliation(s)
- Feng Shi
- IDEA Lab, Department of Radiology and BRIC, University of North Carolina at Chapel Hill, 106 Mason Farm Road, Chapel Hill, NC 27599, USA
| | | | | | | | | | | |
Collapse
|
23
|
Warfield SK, Zou KH, Wells WM. Validation of image segmentation by estimating rater bias and variance. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2008; 366:2361-75. [PMID: 18407896 PMCID: PMC3227147 DOI: 10.1098/rsta.2008.0040] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
The accuracy and precision of segmentations of medical images has been difficult to quantify in the absence of a 'ground truth' or reference standard segmentation for clinical data. Although physical or digital phantoms can help by providing a reference standard, they do not allow the reproduction of the full range of imaging and anatomical characteristics observed in clinical data. An alternative assessment approach is to compare with segmentations generated by domain experts. Segmentations may be generated by raters who are trained experts or by automated image analysis algorithms. Typically, these segmentations differ due to intra-rater and inter-rater variability. The most appropriate way to compare such segmentations has been unclear. We present here a new algorithm to enable the estimation of performance characteristics, and a true labelling, from observations of segmentations of imaging data where segmentation labels may be ordered or continuous measures. This approach may be used with, among others, surface, distance transform or level-set representations of segmentations, and can be used to assess whether or not a rater consistently overestimates or underestimates the position of a boundary.
Collapse
Affiliation(s)
- Simon K. Warfield
- Computational Radiology Laboratory, Department of Radiology, Children's Hospital, Harvard Medical School300 Longwood Avenue, Boston, MA 02115, USA
| | - Kelly H. Zou
- Computational Radiology Laboratory, Department of Radiology, Children's Hospital, Harvard Medical School300 Longwood Avenue, Boston, MA 02115, USA
| | - William M. Wells
- Department of Radiology, Brigham and Women's Hospital, Harvard Medical School221 Longwood Avenue, Boston, MA 02115, USA
| |
Collapse
|
24
|
Chou YY, Leporé N, de Zubicaray GI, Carmichael OT, Becker JT, Toga AW, Thompson PM. Automated ventricular mapping with multi-atlas fluid image alignment reveals genetic effects in Alzheimer's disease. Neuroimage 2008; 40:615-630. [PMID: 18222096 PMCID: PMC2720413 DOI: 10.1016/j.neuroimage.2007.11.047] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2007] [Revised: 11/20/2007] [Accepted: 11/28/2007] [Indexed: 11/22/2022] Open
Abstract
We developed and validated a new method to create automated 3D parametric surface models of the lateral ventricles in brain MRI scans, providing an efficient approach to monitor degenerative disease in clinical studies and drug trials. First, we used a set of parameterized surfaces to represent the ventricles in four subjects' manually labeled brain MRI scans (atlases). We fluidly registered each atlas and mesh model to MRIs from 17 Alzheimer's disease (AD) patients and 13 age- and gender-matched healthy elderly control subjects, and 18 asymptomatic ApoE4-carriers and 18 age- and gender-matched non-carriers. We examined genotyped healthy subjects with the goal of detecting subtle effects of a gene that confers heightened risk for Alzheimer's disease. We averaged the meshes extracted for each 3D MR data set, and combined the automated segmentations with a radial mapping approach to localize ventricular shape differences in patients. Validation experiments comparing automated and expert manual segmentations showed that (1) the Hausdorff labeling error rapidly decreased, and (2) the power to detect disease- and gene-related alterations improved, as the number of atlases, N, was increased from 1 to 9. In surface-based statistical maps, we detected more widespread and intense anatomical deficits as we increased the number of atlases. We formulated a statistical stopping criterion to determine the optimal number of atlases to use. Healthy ApoE4-carriers and those with AD showed local ventricular abnormalities. This high-throughput method for morphometric studies further motivates the combination of genetic and neuroimaging strategies in predicting AD progression and treatment response.
Collapse
Affiliation(s)
- Yi-Yu Chou
- Laboratory of Neuro Imaging, Department of Neurology, UCLA School of Medicine, 635 Charles E. Young Drive South, Suite 225E, Los Angeles, CA, USA
| | - Natasha Leporé
- Laboratory of Neuro Imaging, Department of Neurology, UCLA School of Medicine, 635 Charles E. Young Drive South, Suite 225E, Los Angeles, CA, USA
| | | | - Owen T Carmichael
- Departments of Neurology and Computer Science, University of California, Davis, CA, USA
| | - James T Becker
- Department of Neurology and Alzheimer's Disease Research Center, University of Pittsburgh Medical Center, Pittsburgh, PA, USA
| | - Arthur W Toga
- Laboratory of Neuro Imaging, Department of Neurology, UCLA School of Medicine, 635 Charles E. Young Drive South, Suite 225E, Los Angeles, CA, USA
| | - Paul M Thompson
- Laboratory of Neuro Imaging, Department of Neurology, UCLA School of Medicine, 635 Charles E. Young Drive South, Suite 225E, Los Angeles, CA, USA.
| |
Collapse
|
25
|
Bouix S, Martin-Fernandez M, Ungar L, Nakamura M, Koo MS, McCarley RW, Shenton ME. On evaluating brain tissue classifiers without a ground truth. Neuroimage 2007; 36:1207-24. [PMID: 17532646 PMCID: PMC2702211 DOI: 10.1016/j.neuroimage.2007.04.031] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2006] [Revised: 04/02/2007] [Accepted: 04/17/2007] [Indexed: 11/29/2022] Open
Abstract
In this paper, we present a set of techniques for the evaluation of brain tissue classifiers on a large data set of MR images of the head. Due to the difficulty of establishing a gold standard for this type of data, we focus our attention on methods which do not require a ground truth, but instead rely on a common agreement principle. Three different techniques are presented: the Williams' index, a measure of common agreement; STAPLE, an Expectation Maximization algorithm which simultaneously estimates performance parameters and constructs an estimated reference standard; and Multidimensional Scaling, a visualization technique to explore similarity data. We apply these different evaluation methodologies to a set of eleven different segmentation algorithms on forty MR images. We then validate our evaluation pipeline by building a ground truth based on human expert tracings. The evaluations with and without a ground truth are compared. Our findings show that comparing classifiers without a gold standard can provide a lot of interesting information. In particular, outliers can be easily detected, strongly consistent or highly variable techniques can be readily discriminated, and the overall similarity between different techniques can be assessed. On the other hand, we also find that some information present in the expert segmentations is not captured by the automatic classifiers, suggesting that common agreement alone may not be sufficient for a precise performance evaluation of brain tissue classifiers.
Collapse
Affiliation(s)
- Sylvain Bouix
- Psychiatry Neuroimaging Laboratory, Department of Psychiatry, Brigham and Women's Hospital, Boston, MA, USA.
| | | | | | | | | | | | | |
Collapse
|
26
|
Heckemann RA, Hajnal JV, Aljabar P, Rueckert D, Hammers A. Automatic anatomical brain MRI segmentation combining label propagation and decision fusion. Neuroimage 2006; 33:115-26. [PMID: 16860573 DOI: 10.1016/j.neuroimage.2006.05.061] [Citation(s) in RCA: 466] [Impact Index Per Article: 25.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2005] [Revised: 05/18/2006] [Accepted: 05/23/2006] [Indexed: 10/24/2022] Open
Abstract
Regions in three-dimensional magnetic resonance (MR) brain images can be classified using protocols for manually segmenting and labeling structures. For large cohorts, time and expertise requirements make this approach impractical. To achieve automation, an individual segmentation can be propagated to another individual using an anatomical correspondence estimate relating the atlas image to the target image. The accuracy of the resulting target labeling has been limited but can potentially be improved by combining multiple segmentations using decision fusion. We studied segmentation propagation and decision fusion on 30 normal brain MR images, which had been manually segmented into 67 structures. Correspondence estimates were established by nonrigid registration using free-form deformations. Both direct label propagation and an indirect approach were tested. Individual propagations showed an average similarity index (SI) of 0.754+/-0.016 against manual segmentations. Decision fusion using 29 input segmentations increased SI to 0.836+/-0.009. For indirect propagation of a single source via 27 intermediate images, SI was 0.779+/-0.013. We also studied the effect of the decision fusion procedure using a numerical simulation with synthetic input data. The results helped to formulate a model that predicts the quality improvement of fused brain segmentations based on the number of individual propagated segmentations combined. We demonstrate a practicable procedure that exceeds the accuracy of previous automatic methods and can compete with manual delineations.
Collapse
Affiliation(s)
- Rolf A Heckemann
- Imaging Sciences Department, MRC Clinical Sciences Centre, Imperial College at Hammersmith Hospital Campus, Du Cane Road, London W12 0HS, UK
| | | | | | | | | |
Collapse
|
27
|
Warfield SK, Zou KH, Wells WM. Validation of image segmentation by estimating rater bias and variance. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION : MICCAI ... INTERNATIONAL CONFERENCE ON MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION 2006; 9:839-47. [PMID: 17354851 DOI: 10.1007/11866763_103] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The accuracy and precision of segmentations of medical images has been difficult to quantify in the absence of a "ground truth" or reference standard segmentation for clinical data. Although physical or digital phantoms can help by providing a reference standard, they do not allow the reproduction of the full range of imaging and anatomical characteristics observed in clinical data. An alternative assessment approach is to compare to segmentations generated by domain experts. Segmentations may be generated by raters who are trained experts or by automated image analysis algorithms. Typically these segmentations differ due to intra-rater and inter-rater variability. The most appropriate way to compare such segmentations has been unclear. We present here a new algorithm to enable the estimation of performance characteristics, and a true labeling, from observations of segmentations of imaging data where segmentation labels may be ordered or continuous measures. This approach may be used with, amongst others, surface, distance transform or level set representations of segmentations, and can be used to assess whether or not a rater consistently over-estimates or under-estimates the position of a boundary.
Collapse
Affiliation(s)
- Simon K Warfield
- Computational Radiology Laboratory, Dept. Radiology, Children's Hospital 2 Dept. Radiology, Brigham and Women's Hospital, Harvard Medical School, 75 Francis St., Boston, MA 02115, USA.
| | | | | |
Collapse
|
28
|
Klein A, Mensh B, Ghosh S, Tourville J, Hirsch J. Mindboggle: automated brain labeling with multiple atlases. BMC Med Imaging 2005; 5:7. [PMID: 16202176 PMCID: PMC1283974 DOI: 10.1186/1471-2342-5-7] [Citation(s) in RCA: 55] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2005] [Accepted: 10/05/2005] [Indexed: 11/26/2022] Open
Abstract
Background To make inferences about brain structures or activity across multiple individuals, one first needs to determine the structural correspondences across their image data. We have recently developed Mindboggle as a fully automated, feature-matching approach to assign anatomical labels to cortical structures and activity in human brain MRI data. Label assignment is based on structural correspondences between labeled atlases and unlabeled image data, where an atlas consists of a set of labels manually assigned to a single brain image. In the present work, we study the influence of using variable numbers of individual atlases to nonlinearly label human brain image data. Methods Each brain image voxel of each of 20 human subjects is assigned a label by each of the remaining 19 atlases using Mindboggle. The most common label is selected and is given a confidence rating based on the number of atlases that assigned that label. The automatically assigned labels for each subject brain are compared with the manual labels for that subject (its atlas). Unlike recent approaches that transform subject data to a labeled, probabilistic atlas space (constructed from a database of atlases), Mindboggle labels a subject by each atlas in a database independently. Results When Mindboggle labels a human subject's brain image with at least four atlases, the resulting label agreement with coregistered manual labels is significantly higher than when only a single atlas is used. Different numbers of atlases provide significantly higher label agreements for individual brain regions. Conclusion Increasing the number of reference brains used to automatically label a human subject brain improves labeling accuracy with respect to manually assigned labels. Mindboggle software can provide confidence measures for labels based on probabilistic assignment of labels and could be applied to large databases of brain images.
Collapse
Affiliation(s)
- Arno Klein
- fMRI Research Center, Columbia University, New York, USA
- Parsons Institute for Information Mapping, The New School, New York, USA
| | - Brett Mensh
- New York State Psychiatric Institute, Columbia University, New York, USA
| | - Satrajit Ghosh
- Speech Communication Group, Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, USA
| | - Jason Tourville
- Department of Cognitive and Neural Systems, Boston University, Boston, USA
| | - Joy Hirsch
- fMRI Research Center, Columbia University, New York, USA
| |
Collapse
|
29
|
Rohlfing T, Russakoff DB, Maurer CR. Performance-based classifier combination in atlas-based image segmentation using expectation-maximization parameter estimation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2004; 23:983-94. [PMID: 15338732 DOI: 10.1109/tmi.2004.830803] [Citation(s) in RCA: 153] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
It is well known in the pattern recognition community that the accuracy of classifications obtained by combining decisions made by independent classifiers can be substantially higher than the accuracy of the individual classifiers. We have previously shown this to be true for atlas-based segmentation of biomedical images. The conventional method for combining individual classifiers weights each classifier equally (vote or sum rule fusion). In this paper, we propose two methods that estimate the performances of the individual classifiers and combine the individual classifiers by weighting them according to their estimated performance. The two methods are multiclass extensions of an expectation-maximization (EM) algorithm for ground truth estimation of binary classification based on decisions of multiple experts (Warfield et al., 2004). The first method performs parameter estimation independently for each class with a subsequent integration step. The second method considers all classes simultaneously. We demonstrate the efficacy of these performance-based fusion methods by applying them to atlas-based segmentations of three-dimensional confocal microscopy images of bee brains. In atlas-based image segmentation, multiple classifiers arise naturally by applying different registration methods to the same atlas, or the same registration method to different atlases, or both. We perform a validation study designed to quantify the success of classifier combination methods in atlas-based segmentation. By applying random deformations, a given ground truth atlas is transformed into multiple segmentations that could result from imperfect registrations of an image to multiple atlas images. In a second evaluation study, multiple actual atlas-based segmentations are combined and their accuracies computed by comparing them to a manual segmentation. We demonstrate in both evaluation studies that segmentations produced by combining multiple individual registration-based segmentations are more accurate for the two classifier fusion methods we propose, which weight the individual classifiers according to their EM-based performance estimates, than for simple sum rule fusion, which weights each classifier equally.
Collapse
Affiliation(s)
- Torsten Rohlfing
- Image Guidance Laboratories, Department of Neurosurgery, Stanford University, Stanford, CA 94305-5327, USA.
| | | | | |
Collapse
|
30
|
Warfield SK, Zou KH, Wells WM. Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2004; 23:903-21. [PMID: 15250643 PMCID: PMC1283110 DOI: 10.1109/tmi.2004.828354] [Citation(s) in RCA: 1121] [Impact Index Per Article: 56.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Characterizing the performance of image segmentation approaches has been a persistent challenge. Performance analysis is important since segmentation algorithms often have limited accuracy and precision. Interactive drawing of the desired segmentation by human raters has often been the only acceptable approach, and yet suffers from intra-rater and inter-rater variability. Automated algorithms have been sought in order to remove the variability introduced by raters, but such algorithms must be assessed to ensure they are suitable for the task. The performance of raters (human or algorithmic) generating segmentations of medical images has been difficult to quantify because of the difficulty of obtaining or estimating a known true segmentation for clinical data. Although physical and digital phantoms can be constructed for which ground truth is known or readily estimated, such phantoms do not fully reflect clinical images due to the difficulty of constructing phantoms which reproduce the full range of imaging characteristics and normal and pathological anatomical variability observed in clinical data. Comparison to a collection of segmentations by raters is an attractive alternative since it can be carried out directly on the relevant clinical imaging data. However, the most appropriate measure or set of measures with which to compare such segmentations has not been clarified and several measures are used in practice. We present here an expectation-maximization algorithm for simultaneous truth and performance level estimation (STAPLE). The algorithm considers a collection of segmentations and computes a probabilistic estimate of the true segmentation and a measure of the performance level represented by each segmentation. The source of each segmentation in the collection may be an appropriately trained human rater or raters, or may be an automated segmentation algorithm. The probabilistic estimate of the true segmentation is formed by estimating an optimal combination of the segmentations, weighting each segmentation depending upon the estimated performance level, and incorporating a prior model for the spatial distribution of structures being segmented as well as spatial homogeneity constraints. STAPLE is straightforward to apply to clinical imaging data, it readily enables assessment of the performance of an automated image segmentation algorithm, and enables direct comparison of human rater and algorithm performance.
Collapse
Affiliation(s)
- Simon K Warfield
- Harvard Medical School and the Department of Radiology of Brigham and Women's Hospital, 75 Francis St, Boston, MA 02115, USA.
| | | | | |
Collapse
|
31
|
Extraction and Application of Expert Priors to Combine Multiple Segmentations of Human Brain Tissue. ACTA ACUST UNITED AC 2003. [DOI: 10.1007/978-3-540-39903-2_71] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/14/2023]
|