1
|
Adaptive Dimensionality Reduction with Semi-Supervision (AdDReSS): Classifying Multi-Attribute Biomedical Data. PLoS One 2016; 11:e0159088. [PMID: 27421116 PMCID: PMC4946789 DOI: 10.1371/journal.pone.0159088] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2016] [Accepted: 06/27/2016] [Indexed: 11/19/2022] Open
Abstract
Medical diagnostics is often a multi-attribute problem, necessitating sophisticated tools for analyzing high-dimensional biomedical data. Mining this data often results in two crucial bottlenecks: 1) high dimensionality of features used to represent rich biological data and 2) small amounts of labelled training data due to the expense of consulting highly specific medical expertise necessary to assess each study. Currently, no approach that we are aware of has attempted to use active learning in the context of dimensionality reduction approaches for improving the construction of low dimensional representations. We present our novel methodology, AdDReSS (Adaptive Dimensionality Reduction with Semi-Supervision), to demonstrate that fewer labeled instances identified via AL in embedding space are needed for creating a more discriminative embedding representation compared to randomly selected instances. We tested our methodology on a wide variety of domains ranging from prostate gene expression, ovarian proteomic spectra, brain magnetic resonance imaging, and breast histopathology. Across these various high dimensional biomedical datasets with 100+ observations each and all parameters considered, the median classification accuracy across all experiments showed AdDReSS (88.7%) to outperform SSAGE, a SSDR method using random sampling (85.5%), and Graph Embedding (81.5%). Furthermore, we found that embeddings generated via AdDReSS achieved a mean 35.95% improvement in Raghavan efficiency, a measure of learning rate, over SSAGE. Our results demonstrate the value of AdDReSS to provide low dimensional representations of high dimensional biomedical data while achieving higher classification rates with fewer labelled examples as compared to without active learning.
Collapse
|
2
|
Xu J, Luo X, Wang G, Gilmore H, Madabhushi A. A Deep Convolutional Neural Network for segmenting and classifying epithelial and stromal regions in histopathological images. Neurocomputing 2016; 191:214-223. [PMID: 28154470 PMCID: PMC5283391 DOI: 10.1016/j.neucom.2016.01.034] [Citation(s) in RCA: 224] [Impact Index Per Article: 24.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Epithelial (EP) and stromal (ST) are two types of tissues in histological images. Automated segmentation or classification of EP and ST tissues is important when developing computerized system for analyzing the tumor microenvironment. In this paper, a Deep Convolutional Neural Networks (DCNN) based feature learning is presented to automatically segment or classify EP and ST regions from digitized tumor tissue microarrays (TMAs). Current approaches are based on handcraft feature representation, such as color, texture, and Local Binary Patterns (LBP) in classifying two regions. Compared to handcrafted feature based approaches, which involve task dependent representation, DCNN is an end-to-end feature extractor that may be directly learned from the raw pixel intensity value of EP and ST tissues in a data driven fashion. These high-level features contribute to the construction of a supervised classifier for discriminating the two types of tissues. In this work we compare DCNN based models with three handcraft feature extraction based approaches on two different datasets which consist of 157 Hematoxylin and Eosin (H&E) stained images of breast cancer and 1376 immunohistological (IHC) stained images of colorectal cancer, respectively. The DCNN based feature learning approach was shown to have a F1 classification score of 85%, 89%, and 100%, accuracy (ACC) of 84%, 88%, and 100%, and Matthews Correlation Coefficient (MCC) of 86%, 77%, and 100% on two H&E stained (NKI and VGH) and IHC stained data, respectively. Our DNN based approach was shown to outperform three handcraft feature extraction based approaches in terms of the classification of EP and ST regions.
Collapse
Affiliation(s)
- Jun Xu
- Jiangsu Key Laboratory of Big Data Analysis Technique, Nanjing University of Information Science and Technology, Nanjing 210044, China
| | - Xiaofei Luo
- Jiangsu Key Laboratory of Big Data Analysis Technique, Nanjing University of Information Science and Technology, Nanjing 210044, China
| | - Guanhao Wang
- Jiangsu Key Laboratory of Big Data Analysis Technique, Nanjing University of Information Science and Technology, Nanjing 210044, China
| | - Hannah Gilmore
- Institute for Pathology, University Hospitals Case Medical Center, Case Western Reserve University, OH 44106-7207, USA
| | - Anant Madabhushi
- Department of Biomedical Engineering, Case Western Reserve University, OH 44106, USA
| |
Collapse
|
3
|
Cameron A, Khalvati F, Haider MA, Wong A. MAPS: A Quantitative Radiomics Approach for Prostate Cancer Detection. IEEE Trans Biomed Eng 2015; 63:1145-56. [PMID: 26441442 DOI: 10.1109/tbme.2015.2485779] [Citation(s) in RCA: 113] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
This paper presents a quantitative radiomics feature model for performing prostate cancer detection using multiparametric MRI (mpMRI). It incorporates a novel tumor candidate identification algorithm to efficiently and thoroughly identify the regions of concern and constructs a comprehensive radiomics feature model to detect tumorous regions. In contrast to conventional automated classification schemes, this radiomics-based feature model aims to ground its decisions in a way that can be interpreted and understood by the diagnostician. This is done by grouping features into high-level feature categories which are already used by radiologists to diagnose prostate cancer: Morphology, Asymmetry, Physiology, and Size (MAPS), using biomarkers inspired by the PI-RADS guidelines for performing structured reporting on prostate MRI. Clinical mpMRI data were collected from 13 men with histology-confirmed prostate cancer and labeled by an experienced radiologist. These annotated data were used to train classifiers using the proposed radiomics-driven feature model in order to evaluate the classification performance. The preliminary experimental results indicated that the proposed model outperformed each of its constituent feature groups as well as a comparable conventional mpMRI feature model. A further validation of the proposed algorithm will be conducted using a larger dataset as future work.
Collapse
|
4
|
Viswanath S, Madabhushi A. Consensus embedding: theory, algorithms and application to segmentation and classification of biomedical data. BMC Bioinformatics 2012; 13:26. [PMID: 22316103 PMCID: PMC3395843 DOI: 10.1186/1471-2105-13-26] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2011] [Accepted: 02/08/2012] [Indexed: 11/21/2022] Open
Abstract
Background Dimensionality reduction (DR) enables the construction of a lower dimensional space (embedding) from a higher dimensional feature space while preserving object-class discriminability. However several popular DR approaches suffer from sensitivity to choice of parameters and/or presence of noise in the data. In this paper, we present a novel DR technique known as consensus embedding that aims to overcome these problems by generating and combining multiple low-dimensional embeddings, hence exploiting the variance among them in a manner similar to ensemble classifier schemes such as Bagging. We demonstrate theoretical properties of consensus embedding which show that it will result in a single stable embedding solution that preserves information more accurately as compared to any individual embedding (generated via DR schemes such as Principal Component Analysis, Graph Embedding, or Locally Linear Embedding). Intelligent sub-sampling (via mean-shift) and code parallelization are utilized to provide for an efficient implementation of the scheme. Results Applications of consensus embedding are shown in the context of classification and clustering as applied to: (1) image partitioning of white matter and gray matter on 10 different synthetic brain MRI images corrupted with 18 different combinations of noise and bias field inhomogeneity, (2) classification of 4 high-dimensional gene-expression datasets, (3) cancer detection (at a pixel-level) on 16 image slices obtained from 2 different high-resolution prostate MRI datasets. In over 200 different experiments concerning classification and segmentation of biomedical data, consensus embedding was found to consistently outperform both linear and non-linear DR methods within all applications considered. Conclusions We have presented a novel framework termed consensus embedding which leverages ensemble classification theory within dimensionality reduction, allowing for application to a wide range of high-dimensional biomedical data classification and segmentation problems. Our generalizable framework allows for improved representation and classification in the context of both imaging and non-imaging data. The algorithm offers a promising solution to problems that currently plague DR methods, and may allow for extension to other areas of biomedical data analysis.
Collapse
Affiliation(s)
- Satish Viswanath
- Dept. of Biomedical Engineering, Rutgers University, 599 Taylor Road, Piscataway, New Jersey 08854, USA.
| | | |
Collapse
|
5
|
Madabhushi A, Doyle S, Lee G, Basavanhally A, Monaco J, Masters S, Tomaszewski J, Feldman M. Integrated diagnostics: a conceptual framework with examples. Clin Chem Lab Med 2010; 48:989-98. [PMID: 20491597 DOI: 10.1515/cclm.2010.193] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
With the advent of digital pathology, imaging scientists have begun to develop computerized image analysis algorithms for making diagnostic (disease presence), prognostic (outcome prediction), and theragnostic (choice of therapy) predictions from high resolution images of digitized histopathology. One of the caveats to developing image analysis algorithms for digitized histopathology is the ability to deal with highly dense, information rich datasets; datasets that would overwhelm most computer vision and image processing algorithms. Over the last decade, manifold learning and non-linear dimensionality reduction schemes have emerged as popular and powerful machine learning tools for pattern recognition problems. However, these techniques have thus far been applied primarily to classification and analysis of computer vision problems (e.g., face detection). In this paper, we discuss recent work by a few groups in the application of manifold learning methods to problems in computer aided diagnosis, prognosis, and theragnosis of digitized histopathology. In addition, we discuss some exciting recent developments in the application of these methods for multi-modal data fusion and classification; specifically the building of meta-classifiers by fusion of histological image and proteomic signatures for prostate cancer outcome prediction.
Collapse
Affiliation(s)
- Anant Madabhushi
- Laboratory for Computational Imaging and Bioinformatics, Department of Biomedical Engineering, Rutgers University, NJ, USA
| | | | | | | | | | | | | | | |
Collapse
|
6
|
Tiwari P, Rosen M, Madabhushi A. A hierarchical spectral clustering and nonlinear dimensionality reduction scheme for detection of prostate cancer from magnetic resonance spectroscopy (MRS). Med Phys 2009; 36:3927-39. [PMID: 19810465 DOI: 10.1118/1.3180955] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022] Open
Abstract
Magnetic resonance spectroscopy (MRS) has been shown to have great clinical potential as a supplement to magnetic resonance imaging in the detection of prostate cancer (CaP). MRS provides functional information in the form of changes in the relative concentration of specific metabolites including choline, creatine, and citrate which can be used to identify potential areas of CaP. With a view to assisting radiologists in interpretation and analysis of MRS data, some researchers have begun to develop computer-aided detection (CAD) schemes for CaP identification from spectroscopy. Most of these schemes have been centered on identifying and integrating the area under metabolite peaks which is then used to compute relative metabolite ratios. However, manual identification of metabolite peaks on the MR spectra, and especially via CAD, is a challenging problem due to low signal-to-noise ratio, baseline irregularity, peak overlap, and peak distortion. In this article the authors present a novel CAD scheme that integrates nonlinear dimensionality reduction (NLDR) with an unsupervised hierarchical clustering algorithm to automatically identify suspicious regions on the prostate using MRS and hence avoids the need to explicitly identify metabolite peaks. The methodology comprises two stages. In stage 1, a hierarchical spectral clustering algorithm is used to distinguish between extracapsular and prostatic spectra in order to localize the region of interest (ROI) corresponding to the prostate. Once the prostate ROI is localized, in stage 2, a NLDR scheme, in conjunction with a replicated clustering algorithm, is used to automatically discriminate between three classes of spectra (normal appearing, suspicious appearing, and indeterminate). The methodology was quantitatively and qualitatively evaluated on a total of 18 1.5 T in vivo prostate T2-weighted (w) and MRS studies obtained from the multisite, multi-institutional American College of Radiology (ACRIN) trial. In the absence of the precise ground truth for CaP extent on the MR imaging for most of the ACRIN studies, probabilistic quantitative metrics were defined based on partial knowledge on the quadrant location and size of the tumor. The scheme, when evaluated against this partial ground truth, was found to have a CaP detection sensitivity of 89.33% and specificity of 79.79%. The results obtained from randomized threefold and fivefold cross validation suggest that the NLDR based clustering scheme has a higher CaP detection accuracy compared to such commonly used MRS analysis schemes as z score and PCA. In addition, the scheme was found to be robust to changes in system parameters. For 6 of the 18 studies an expert radiologist laboriously labeled each of the individual spectra according to a five point scale, with 1/2 representing spectra that the expert considered normal and 3/4/5 being spectra the expert deemed suspicious. When evaluated on these expert annotated datasets, the CAD system yielded an average sensitivity (cluster corresponding to suspicious spectra being identified as the CaP class) and specificity of 81.39% and 64.71%, respectively.
Collapse
Affiliation(s)
- Pallavi Tiwari
- Department of Biomedical Engineering, Rutgers, The State University of New Jersey, Piscataway, New Jersey 08854, USA
| | | | | |
Collapse
|
7
|
Lee G, Rodriguez C, Madabhushi A. Investigating the efficacy of nonlinear dimensionality reduction schemes in classifying gene and protein expression studies. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2008; 5:368-84. [PMID: 18670041 PMCID: PMC2562675 DOI: 10.1109/tcbb.2008.36] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
The recent explosion in procurement and availability of high-dimensional gene- and protein-expression profile datasets for cancer diagnostics has necessitated the development of sophisticated machine learning tools with which to analyze them. A major limitation in the ability to accurate classify these high-dimensional datasets stems from the 'curse of dimensionality', occurring in situations where the number of genes or peptides significantly exceeds the total number of patient samples. Previous attempts at dealing with this issue have mostly centered on the use of a dimensionality reduction (DR) scheme, Principal Component Analysis (PCA), to obtain a low-dimensional projection of the high-dimensional data. However, linear PCA and other linear DR methods, which rely on Euclidean distances to estimate object similarity, do not account for the inherent underlying nonlinear structure associated with most biomedical data. The motivation behind this work is to identify the appropriate DR methods for analysis of high-dimensional gene- and protein-expression studies. Towards this end, we empirically and rigorously compare three nonlinear (Isomap, Locally Linear Embedding, Laplacian Eigenmaps) and three linear DR schemes (PCA, Linear Discriminant Analysis, Multidimensional Scaling) with the intent of determining a reduced subspace representation in which the individual object classes are more easily discriminable.
Collapse
Affiliation(s)
- George Lee
- Department of Biomedical Engineering, Rutgers The State University of New Jersey, 599 Taylor Road, Piscatway, NJ 08854, USA.
| | | | | |
Collapse
|
8
|
A hierarchical unsupervised spectral clustering scheme for detection of prostate cancer from magnetic resonance spectroscopy (MRS). MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION : MICCAI ... INTERNATIONAL CONFERENCE ON MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION 2008. [PMID: 18044579 DOI: 10.1007/978-3-540-75759-7_34] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register]
Abstract
Magnetic Resonance Spectroscopy (MRS) along with MRI has emerged as a promising tool in diagnosis and potentially screening for prostate cancer. Surprisingly little work, however, has been done in the area of automated quantitative analysis of MRS data for identifying likely cancerous areas in the prostate. In this paper we present a novel approach that integrates a manifold learning scheme (spectral clustering) with an unsupervised hierarchical clustering algorithm to identify spectra corresponding to cancer on prostate MRS. Ground truth location for cancer on prostate was determined from the sextant location and maximum size of cancer available from the ACRIN database, from where a total of 14 MRS studies were obtained. The high dimensional information in the MR spectra is non linearly transformed to a low dimensional embedding space and via repeated clustering of the voxels in this space, non informative spectra are eliminated and only informative spectra retained. Our scheme successfully identified MRS cancer voxels with sensitivity of 77.8%, false positive rate of 28.92%, and false negative rate of 20.88% on a total of 14 prostate MRS studies. Qualitative results seem to suggest that our method has higher specificity compared to a popular scheme, z-score, routinely used for analysis of MRS data.
Collapse
|
9
|
Finn WG. Diagnostic pathology and laboratory medicine in the age of "omics": a paper from the 2006 William Beaumont Hospital Symposium on Molecular Pathology. J Mol Diagn 2007; 9:431-6. [PMID: 17652635 PMCID: PMC1975093 DOI: 10.2353/jmoldx.2007.070023] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Functional genomics and proteomics involve the simultaneous analysis of hundreds or thousands of expressed genes or proteins and have spawned the modern discipline of computational biology. Novel informatic applications, including sophisticated dimensionality reduction strategies and cancer outlier profile analysis, can distill clinically exploitable biomarkers from enormous experimental datasets. Diagnostic pathologists are now charged with translating the knowledge generated by the "omics" revolution into clinical practice. Food and Drug Administration-approved proprietary testing platforms based on microarray technologies already exist and will expand greatly in the coming years. However, for diagnostic pathology, the greatest promise of the "omics" age resides in the explosion in information technology (IT). IT applications allow for the digitization of histological slides, transforming them into minable data and enabling content-based searching and archiving of histological materials. IT will also allow for the optimization of existing (and often underused) clinical laboratory technologies such as flow cytometry and high-throughput core laboratory functions. The state of pathology practice does not always keep up with the pace of technological advancement. However, to use fully the potential of these emerging technologies for the benefit of patients, pathologists and clinical scientists must embrace the changes and transformational advances that will characterize this new era.
Collapse
Affiliation(s)
- William G Finn
- University of Michigan Department of Pathology, Room M242 Medical Science I, 1301 Catherine Rd., Ann Arbor, MI 48109-0602, USA.
| |
Collapse
|