1
|
Isgut M, Gloster L, Choi K, Venugopalan J, Wang MD. Systematic Review of Advanced AI Methods for Improving Healthcare Data Quality in Post COVID-19 Era. IEEE Rev Biomed Eng 2023; 16:53-69. [PMID: 36269930 DOI: 10.1109/rbme.2022.3216531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
At the beginning of the COVID-19 pandemic, there was significant hype about the potential impact of artificial intelligence (AI) tools in combatting COVID-19 on diagnosis, prognosis, or surveillance. However, AI tools have not yet been widely successful. One of the key reason is the COVID-19 pandemic has demanded faster real-time development of AI-driven clinical and health support tools, including rapid data collection, algorithm development, validation, and deployment. However, there was not enough time for proper data quality control. Learning from the hard lessons in COVID-19, we summarize the important health data quality challenges during COVID-19 pandemic such as lack of data standardization, missing data, tabulation errors, and noise and artifact. Then we conduct a systematic investigation of computational methods that address these issues, including emerging novel advanced AI data quality control methods that achieve better data quality outcomes and, in some cases, simplify or automate the data cleaning process. We hope this article can assist healthcare community to improve health data quality going forward with novel AI development.
Collapse
|
2
|
Mitchel J, Chatlin K, Tong L, Wang MD. A Translational Pipeline for Overall Survival Prediction of Breast Cancer Patients by Decision-Level Integration of Multi-Omics Data. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2020; 2019:1573-1580. [PMID: 32601549 DOI: 10.1109/bibm47256.2019.8983243] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Breast cancer is the most prevalent and among the most deadly cancers in females. Patients with breast cancer have highly variable survival rates, indicating a need to identify prognostic biomarkers. By integrating multi-omics data (e.g., gene expression, DNA methylation, miRNA expression, and copy number variations (CNVs)), it is likely to improve the accuracy of patient survival predictions compared to prediction using single modality data. Therefore, we propose to develop a machine learning pipeline using decision-level integration of multi-omics tumor data from The Cancer Genome Atlas (TCGA) to predict the overall survival of breast cancer patients. With multi-omics data consisting of gene expression, methylation, miRNA expression, and CNVs, the top performing model predicted survival with an accuracy of 85% and area under the curve (AUC) of 87%. Furthermore, the model was able to identify which modalities best contributed to prediction performance, identifying methylation, miRNA, and gene expression as the best integrated classification combination. Our method not only recapitulated several breast cancer-specific prognostic biomarkers that were previously reported in the literature but also yielded several novel biomarkers. Further analysis of these biomarkers could lend insight into the molecular mechanisms that lead to poor survival.
Collapse
Affiliation(s)
- Jonathan Mitchel
- Dept. of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332
| | - Kevin Chatlin
- Dept. of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332
| | - Li Tong
- Dept. of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332
| | - May D Wang
- Dept. of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332
| |
Collapse
|
3
|
Venugopalan J, Chanani N, Maher K, Wang MD. Novel Data Imputation for Multiple Types of Missing Data in Intensive Care Units. IEEE J Biomed Health Inform 2019; 23:1243-1250. [DOI: 10.1109/jbhi.2018.2883606] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
4
|
Young WC, Raftery AE, Yeung KY. Model-Based Clustering With Data Correction For Removing Artifacts In Gene Expression Data. Ann Appl Stat 2017; 11:1998-2026. [PMID: 30740193 DOI: 10.1214/17-aoas1051] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
The NIH Library of Integrated Network-based Cellular Signatures (LINCS) contains gene expression data from over a million experiments, using Luminex Bead technology. Only 500 colors are used to measure the expression levels of the 1,000 landmark genes measured, and the data for the resulting pairs of genes are deconvolved. The raw data are sometimes inadequate for reliable deconvolution, leading to artifacts in the final processed data. These include the expression levels of paired genes being flipped or given the same value, and clusters of values that are not at the true expression level. We propose a new method called model-based clustering with data correction (MCDC) that is able to identify and correct these three kinds of artifacts simultaneously. We show that MCDC improves the resulting gene expression data in terms of agreement with external baselines, as well as improving results from subsequent analysis.
Collapse
Affiliation(s)
- William Chad Young
- Department of Statistics, University of Washington, Box 354322, Seattle, WA 98195
| | - Adrian E Raftery
- Department of Statistics, University of Washington, Box 354322, Seattle, WA 98195
| | - Ka Yee Yeung
- Institute of Technology, University of Washington Tacoma, Campus Box 358426, 1900 Commerce Street, Tacoma, WA 98402
| |
Collapse
|
5
|
Wu PY, Cheng CW, Kaddi CD, Venugopalan J, Hoffman R, Wang MD. -Omic and Electronic Health Record Big Data Analytics for Precision Medicine. IEEE Trans Biomed Eng 2016; 64:263-273. [PMID: 27740470 DOI: 10.1109/tbme.2016.2573285] [Citation(s) in RCA: 110] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
OBJECTIVE Rapid advances of high-throughput technologies and wide adoption of electronic health records (EHRs) have led to fast accumulation of -omic and EHR data. These voluminous complex data contain abundant information for precision medicine, and big data analytics can extract such knowledge to improve the quality of healthcare. METHODS In this paper, we present -omic and EHR data characteristics, associated challenges, and data analytics including data preprocessing, mining, and modeling. RESULTS To demonstrate how big data analytics enables precision medicine, we provide two case studies, including identifying disease biomarkers from multi-omic data and incorporating -omic information into EHR. CONCLUSION Big data analytics is able to address -omic and EHR data challenges for paradigm shift toward precision medicine. SIGNIFICANCE Big data analytics makes sense of -omic and EHR data to improve healthcare outcome. It has long lasting societal impact.
Collapse
|
6
|
Kashyap H, Ahmed HA, Hoque N, Roy S, Bhattacharyya DK. Big data analytics in bioinformatics: architectures, techniques, tools and issues. ACTA ACUST UNITED AC 2016. [DOI: 10.1007/s13721-016-0135-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
7
|
Reddington AP, Monroe MR, Ünlü MS. Integrated imaging instrument for self-calibrated fluorescence protein microarrays. THE REVIEW OF SCIENTIFIC INSTRUMENTS 2013; 84:103702. [PMID: 24182114 PMCID: PMC3799691 DOI: 10.1063/1.4823790] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/05/2013] [Accepted: 09/16/2013] [Indexed: 06/02/2023]
Abstract
Protein microarrays, or multiplexed and high-throughput assays, monitor multiple protein binding events to facilitate the understanding of disease progression and cell physiology. Fluorescence imaging is a popular method to detect proteins captured by immobilized probes with high sensitivity and specificity. Reliability of fluorescence assays depends on achieving minimal inter- and intra-assay probe immobilization variation, an ongoing challenge for protein microarrays. Therefore, it is desirable to establish a label-free method to quantify the probe density prior to target incubation to calibrate the fluorescence readout. Previously, a silicon oxide on silicon chip design was introduced to enhance the fluorescence signal and enable interferometric imaging to self-calibrate the signal with the immobilized probe density. In this paper, an integrated interferometric reflectance imaging sensor and wide-field fluorescence instrument is introduced for sensitive and calibrated microarray measurements. This platform is able to analyze a 2.5 mm × 3.4 mm area, or 200 spots (100 μm diameter with 200 μm pitch), in a single field-of-view.
Collapse
Affiliation(s)
- A P Reddington
- Department of Electrical and Computer Engineering, Boston University, Boston, Massachusetts 02215, USA
| | | | | |
Collapse
|
8
|
Correction of spatial bias in oligonucleotide array data. Adv Bioinformatics 2013; 2013:167915. [PMID: 23573083 PMCID: PMC3610395 DOI: 10.1155/2013/167915] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2012] [Accepted: 02/02/2013] [Indexed: 01/17/2023] Open
Abstract
Background. Oligonucleotide microarrays allow for high-throughput gene expression profiling assays. The technology relies on the fundamental assumption that observed hybridization signal intensities (HSIs) for each intended target, on average, correlate with their target's true concentration in the sample. However, systematic, nonbiological variation from several sources undermines this hypothesis. Background hybridization signal has been previously identified as one such important source, one manifestation of which appears in the form of spatial autocorrelation. Results. We propose an algorithm, pyn, for the elimination of spatial autocorrelation in HSIs, exploiting the duality of desirable mutual information shared by probes in a common probe set and undesirable mutual information shared by spatially proximate probes. We show that this correction procedure reduces spatial autocorrelation in HSIs; increases HSI reproducibility across replicate arrays; increases differentially expressed gene detection power; and performs better than previously published methods. Conclusions. The proposed algorithm increases both precision and accuracy, while requiring virtually no changes to users' current analysis pipelines: the correction consists merely of a transformation of raw HSIs (e.g., CEL files for Affymetrix arrays). A free, open-source implementation is provided as an R package, compatible with standard Bioconductor tools. The approach may also be tailored to other platform types and other sources of bias.
Collapse
|
9
|
Quo CF, Kaddi C, Phan JH, Zollanvari A, Xu M, Wang MD, Alterovitz G. Reverse engineering biomolecular systems using -omic data: challenges, progress and opportunities. Brief Bioinform 2012; 13:430-45. [PMID: 22833495 DOI: 10.1093/bib/bbs026] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Recent advances in high-throughput biotechnologies have led to the rapid growing research interest in reverse engineering of biomolecular systems (REBMS). 'Data-driven' approaches, i.e. data mining, can be used to extract patterns from large volumes of biochemical data at molecular-level resolution while 'design-driven' approaches, i.e. systems modeling, can be used to simulate emergent system properties. Consequently, both data- and design-driven approaches applied to -omic data may lead to novel insights in reverse engineering biological systems that could not be expected before using low-throughput platforms. However, there exist several challenges in this fast growing field of reverse engineering biomolecular systems: (i) to integrate heterogeneous biochemical data for data mining, (ii) to combine top-down and bottom-up approaches for systems modeling and (iii) to validate system models experimentally. In addition to reviewing progress made by the community and opportunities encountered in addressing these challenges, we explore the emerging field of synthetic biology, which is an exciting approach to validate and analyze theoretical system models directly through experimental synthesis, i.e. analysis-by-synthesis. The ultimate goal is to address the present and future challenges in reverse engineering biomolecular systems (REBMS) using integrated workflow of data mining, systems modeling and synthetic biology.
Collapse
Affiliation(s)
- Chang F Quo
- Georgia Institute of Technology, Atlanta, GA 30332, USA
| | | | | | | | | | | | | |
Collapse
|
10
|
Moffitt RA, Yin-Goen Q, Stokes TH, Parry RM, Torrance JH, Phan JH, Young AN, Wang MD. caCORRECT2: Improving the accuracy and reliability of microarray data in the presence of artifacts. BMC Bioinformatics 2011; 12:383. [PMID: 21957981 PMCID: PMC3230913 DOI: 10.1186/1471-2105-12-383] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2011] [Accepted: 09/29/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In previous work, we reported the development of caCORRECT, a novel microarray quality control system built to identify and correct spatial artifacts commonly found on Affymetrix arrays. We have made recent improvements to caCORRECT, including the development of a model-based data-replacement strategy and integration with typical microarray workflows via caCORRECT's web portal and caBIG grid services. In this report, we demonstrate that caCORRECT improves the reproducibility and reliability of experimental results across several common Affymetrix microarray platforms. caCORRECT represents an advance over state-of-art quality control methods such as Harshlighting, and acts to improve gene expression calculation techniques such as PLIER, RMA and MAS5.0, because it incorporates spatial information into outlier detection as well as outlier information into probe normalization. The ability of caCORRECT to recover accurate gene expressions from low quality probe intensity data is assessed using a combination of real and synthetic artifacts with PCR follow-up confirmation and the affycomp spike in data. The caCORRECT tool can be accessed at the website: http://cacorrect.bme.gatech.edu. RESULTS We demonstrate that (1) caCORRECT's artifact-aware normalization avoids the undesirable global data warping that happens when any damaged chips are processed without caCORRECT; (2) When used upstream of RMA, PLIER, or MAS5.0, the data imputation of caCORRECT generally improves the accuracy of microarray gene expression in the presence of artifacts more than using Harshlighting or not using any quality control; (3) Biomarkers selected from artifactual microarray data which have undergone the quality control procedures of caCORRECT are more likely to be reliable, as shown by both spike in and PCR validation experiments. Finally, we present a case study of the use of caCORRECT to reliably identify biomarkers for renal cell carcinoma, yielding two diagnostic biomarkers with potential clinical utility, PRKAB1 and NNMT. CONCLUSIONS caCORRECT is shown to improve the accuracy of gene expression, and the reproducibility of experimental results in clinical application. This study suggests that caCORRECT will be useful to clean up possible artifacts in new as well as archived microarray data.
Collapse
Affiliation(s)
- Richard A Moffitt
- The Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, 313 Ferst Drive, Atlanta, GA 30332, USA
| | | | | | | | | | | | | | | |
Collapse
|
11
|
Wu PY, Phan JH, Wang MD. Exploring the feasibility of next-generation sequencing and microarray data meta-analysis. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2011; 2011:7618-7621. [PMID: 22256102 PMCID: PMC5003043 DOI: 10.1109/iembs.2011.6091877] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Emerging next-generation sequencing (NGS) technology potentially resolves many issues that prevent widespread clinical use of gene expression microarrays. However, the number of publicly available NGS datasets is still smaller than that of microarrays. This paper explores the possibilities for combining information from both microarray and NGS gene expression datasets for the discovery of differentially expressed genes (DEGs). We evaluate several existing methods in detecting DEGs using individual datasets as well as combined NGS and microarray datasets. Results indicate that analysis of combined NGS and microarray data is feasible, but successful detection of DEGs may depend on careful selection of algorithms as well as on data normalization and pre-processing.
Collapse
Affiliation(s)
- Po-Yen Wu
- Department of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA. pwu33@ gatech.edu
| | | | | |
Collapse
|
12
|
Stokes TH, Wang MD. SimplevisGrid: grid services for visualization of diverse biomedical knowledge and molecular systems data. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2009; 2009:4178-81. [PMID: 19964624 DOI: 10.1109/iembs.2009.5333932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Biomedical data visualization is a great challenge due to the scale, complexity, and diversity of systems, system component interactions and experimental data. Standards for interoperable data are a good start to addressing these problems, but standardization of visualization technologies is an emerging topic. SimpleVisGrid builds on Cancer Biomedical Informatics Grid (caBIG) common infrastructure for cancer research, and clearly specifies and extends three standard data formats for inputs and outputs to grid services: comma-separated values (CSV), Portable Network Graphics (PNG), and Scalable Vector Graphics (SVG). Four prototype visualizations are available: 2D array data quality visualization, correlation heatmaps between high-dimensional data and associated meta-data, feature landscapes, and biochemical or semantic network graphs. The services and data model are prepared for submission for caBIG Silver-level compatibility review and for integration into automated research workflows. Making these tools available to caBIG developers and ultimately to biomedical researchers can (1) help with biomedical communication, discovery, and decision-making, (2) encourage more research on standardization of visualization formats, and (3) improve the efficiency of large data transfers across the grid.
Collapse
Affiliation(s)
- Todd H Stokes
- Electrical and Computer Engineering Department, Georgia Institute of Technology, Atlanta, GA 30332, USA.
| | | |
Collapse
|
13
|
Osunkoya AO, Yin-Goen Q, Phan JH, Moffitt RA, Stokes TH, Wang MD, Young AN. Diagnostic biomarkers for renal cell carcinoma: selection using novel bioinformatics systems for microarray data analysis. Hum Pathol 2009; 40:1671-8. [PMID: 19695674 DOI: 10.1016/j.humpath.2009.05.006] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/23/2009] [Revised: 05/04/2009] [Accepted: 05/07/2009] [Indexed: 11/15/2022]
Abstract
The differential diagnosis of clear cell, papillary, and chromophobe renal cell carcinoma is clinically important, because these tumor subtypes are associated with different pathobiology and clinical behavior. For cases in which histopathology is equivocal, immunohistochemistry and quantitative reverse transcriptase-polymerase chain reaction can assist in the differential diagnosis by measuring expression of subtype-specific biomarkers. Several renal tumor biomarkers have been discovered in expression microarray studies. However, due to heterogeneity of gene and protein expression, additional biomarkers are needed for reliable diagnostic classification. We developed novel bioinformatics systems to identify candidate renal tumor biomarkers from the microarray profiles of 45 clear cell, 16 papillary, and 10 chromophobe renal cell carcinomas; the microarray data was derived from 2 independent published studies. The ArrayWiki biocomputing system merged the microarray data sets into a single file, so gene expression could be analyzed from a larger number of tumors. The caCORRECT system removed non-random sources of error from the microarray data, and the omniBioMarker system analyzed data with several gene-ranking algorithms to identify algorithms effective at recognizing previously described renal tumor biomarkers. We predicted these algorithms would also be effective at identifying unknown biomarkers that could be verified by independent methods. We selected 6 novel candidate biomarkers from the omniBioMarker analysis and verified their differential expression in formalin-fixed paraffin-embedded tissues by quantitative reverse transcriptase-polymerase chain reaction and immunohistochemistry. The candidate biomarkers were carbonic anhydrase IX, ceruloplasmin, schwannomin-interacting protein 1, E74-like factor 3, cytochrome c oxidase subunit 5a, and acetyl-CoA acetyltransferase 1. Quantitative reverse transcriptase-polymerase chain reaction was performed on 17 clear cell, 13 papillary and 7 chromophobe renal cell carcinoma. Carbonic anhydrase IX and ceruloplasmin were overexpressed in clear cell renal cell carcinoma; schwannomin-interacting protein 1 and E74-like factor 3 were overexpressed in papillary renal cell carcinoma; and cytochrome c oxidase subunit 5a and acetyl-CoA acetyltransferase 1 were overexpressed in chromophobe renal cell carcinoma. Immunohistochemistry was performed on tissue microarrays containing 66 clear cell, 16 papillary, and 12 chromophobe renal cell carcinomas. Cytoplasmic carbonic anhydrase IX staining was significantly associated with clear cell renal cell carcinoma. Strong cytoplasmic schwannomin-interacting protein 1 and cytochrome c oxidase subunit 5a staining were significantly more frequent in papillary and chromophobe renal cell carcinoma, respectively. In summary, we developed a novel process for identifying candidate renal tumor biomarkers from microarray data, and verifying differential expression in independent assays. The tumor biomarkers have potential utility as a multiplex expression panel for classifying renal cell carcinoma with equivocal histology. Biomarker expression assays are increasingly important for renal cell carcinoma diagnosis, as needle core biopsies become more common and different therapies for tumor subtypes continue to be developed.
Collapse
Affiliation(s)
- Adeboye O Osunkoya
- Department of Pathology and Laboratory Medicine, Emory University School of Medicine, Atlanta, GA 30322, USA
| | | | | | | | | | | | | |
Collapse
|
14
|
Howard BE, Sick B, Heber S. Unsupervised assessment of microarray data quality using a Gaussian mixture model. BMC Bioinformatics 2009; 10:191. [PMID: 19545436 PMCID: PMC2717951 DOI: 10.1186/1471-2105-10-191] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2008] [Accepted: 06/22/2009] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Quality assessment of microarray data is an important and often challenging aspect of gene expression analysis. This task frequently involves the examination of a variety of summary statistics and diagnostic plots. The interpretation of these diagnostics is often subjective, and generally requires careful expert scrutiny. RESULTS We show how an unsupervised classification technique based on the Expectation-Maximization (EM) algorithm and the naïve Bayes model can be used to automate microarray quality assessment. The method is flexible and can be easily adapted to accommodate alternate quality statistics and platforms. We evaluate our approach using Affymetrix 3' gene expression and exon arrays and compare the performance of this method to a similar supervised approach. CONCLUSION This research illustrates the efficacy of an unsupervised classification approach for the purpose of automated microarray data quality assessment. Since our approach requires only unannotated training data, it is easy to customize and to keep up-to-date as technology evolves. In contrast to other "black box" classification systems, this method also allows for intuitive explanations.
Collapse
Affiliation(s)
- Brian E Howard
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA.
| | | | | |
Collapse
|
15
|
Phan JH, Moffitt RA, Stokes TH, Liu J, Young AN, Nie S, Wang MD. Convergence of biomarkers, bioinformatics and nanotechnology for individualized cancer treatment. Trends Biotechnol 2009; 27:350-8. [PMID: 19409634 PMCID: PMC3779321 DOI: 10.1016/j.tibtech.2009.02.010] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2008] [Revised: 02/12/2009] [Accepted: 02/25/2009] [Indexed: 12/23/2022]
Abstract
Recent advances in biomarker discovery, biocomputing and nanotechnology have raised new opportunities in the emerging fields of personalized medicine (in which disease detection, diagnosis and therapy are tailored to each individual's molecular profile) and predictive medicine (in which genetic and molecular information is used to predict disease development, progression and clinical outcome). Here, we discuss advanced biocomputing tools for cancer biomarker discovery and multiplexed nanoparticle probes for cancer biomarker profiling, in addition to the prospects for and challenges involved in correlating biomolecular signatures with clinical outcome. This bio-nano-info convergence holds great promise for molecular diagnosis and individualized therapy of cancer and other human diseases.
Collapse
Affiliation(s)
- John H. Phan
- Departments of Biomedical Engineering and Electrical and Computer Engineering, Georgia Institute of Technology and Emory University, 313 Ferst Drive, UA Whitaker Building 4106, Atlanta, GA 30332, USA
| | - Richard A. Moffitt
- Departments of Biomedical Engineering and Electrical and Computer Engineering, Georgia Institute of Technology and Emory University, 313 Ferst Drive, UA Whitaker Building 4106, Atlanta, GA 30332, USA
| | - Todd H. Stokes
- Departments of Biomedical Engineering and Electrical and Computer Engineering, Georgia Institute of Technology and Emory University, 313 Ferst Drive, UA Whitaker Building 4106, Atlanta, GA 30332, USA
| | - Jian Liu
- Departments of Biomedical Engineering and Chemistry, Emory University and Georgia Institute of Technology, 101 Woodruff Circle Suite 2001, Atlanta, GA 30322, USA
| | - Andrew N. Young
- Department of Pathology and Laboratory Medicine, Emory University School of Medicine and the Grady Memorial Hospital, Atlanta, GA 30322, USA
| | - Shuming Nie
- Departments of Biomedical Engineering and Chemistry, Emory University and Georgia Institute of Technology, 101 Woodruff Circle Suite 2001, Atlanta, GA 30322, USA
| | - May D. Wang
- Departments of Biomedical Engineering and Electrical and Computer Engineering, Georgia Institute of Technology and Emory University, 313 Ferst Drive, UA Whitaker Building 4106, Atlanta, GA 30332, USA
| |
Collapse
|
16
|
Moffitt RA, Caldwell ML, Liu T, Liu J, Nie S, Wang MD. Quality control of highly multiplexed proteomic immunostaining with quantum dots: correcting for crosstalk. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2009; 2009:6739-6742. [PMID: 19963937 PMCID: PMC5859565 DOI: 10.1109/iembs.2009.5332857] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
The process of developing molecular assays for disease diagnosis and prognosis requires cross-disciplinary research which monitors quality and reproducibility at all levels. This paper discusses challenges in the quality control of highly multiplexed Quantum Dot (QD) staining and provides a method for improving accuracy of QD quantification in two phases. Phase one is the estimation of unintended crosstalk between multiplexed QD-antibody reporters, and phase two is digital correction of this crosstalk. Results show that crosstalk varies among tissues and reagents, and in some cases it can be on the same order of magnitude as the original intended signal. In cases where target protein expression is assumed to be independent, crosstalk can be empirically estimated from imaging data and corrected for. This work is expected to improve the overall reproducibility and quantification of multiplexed QD staining.
Collapse
Affiliation(s)
- Richard A Moffitt
- Department of Biomedical Engineering at the Georgia Institute of Technology, Atlanta, GA 30332, USA.
| | | | | | | | | | | |
Collapse
|
17
|
Cairns JM, Dunning MJ, Ritchie ME, Russell R, Lynch AG. BASH: a tool for managing BeadArray spatial artefacts. Bioinformatics 2008; 24:2921-2. [PMID: 18953044 PMCID: PMC2639304 DOI: 10.1093/bioinformatics/btn557] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
SUMMARY With their many replicates and their random layouts, Illumina BeadArrays provide greater scope fordetecting spatial artefacts than do other microarray technologies. They are also robust to artefact exclusion, yet there is a lack of tools that can perform these tasks for Illumina. We present BASH, a tool for this purpose. BASH adopts the concepts of Harshlight, but implements them in a manner that utilizes the unique characteristics of the Illumina technology. Using bead-level data, spatial artefacts of various kinds can thus be identified and excluded from further analyses. AVAILABILITY The beadarray Bioconductor package (version 1.10 onwards), www.bioconductor.org
Collapse
Affiliation(s)
- J M Cairns
- Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB20RE, UK
| | | | | | | | | |
Collapse
|
18
|
Stokes TH, Torrance JT, Li H, Wang MD. ArrayWiki: an enabling technology for sharing public microarray data repositories and meta-analyses. BMC Bioinformatics 2008; 9 Suppl 6:S18. [PMID: 18541053 PMCID: PMC2423441 DOI: 10.1186/1471-2105-9-s6-s18] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Background A survey of microarray databases reveals that most of the repository contents and data models are heterogeneous (i.e., data obtained from different chip manufacturers), and that the repositories provide only basic biological keywords linking to PubMed. As a result, it is difficult to find datasets using research context or analysis parameters information beyond a few keywords. For example, to reduce the "curse-of-dimension" problem in microarray analysis, the number of samples is often increased by merging array data from different datasets. Knowing chip data parameters such as pre-processing steps (e.g., normalization, artefact removal, etc), and knowing any previous biological validation of the dataset is essential due to the heterogeneity of the data. However, most of the microarray repositories do not have meta-data information in the first place, and do not have a a mechanism to add or insert this information. Thus, there is a critical need to create "intelligent" microarray repositories that (1) enable update of meta-data with the raw array data, and (2) provide standardized archiving protocols to minimize bias from the raw data sources. Results To address the problems discussed, we have developed a community maintained system called ArrayWiki that unites disparate meta-data of microarray meta-experiments from multiple primary sources with four key features. First, ArrayWiki provides a user-friendly knowledge management interface in addition to a programmable interface using standards developed by Wikipedia. Second, ArrayWiki includes automated quality control processes (caCORRECT) and novel visualization methods (BioPNG, Gel Plots), which provide extra information about data quality unavailable in other microarray repositories. Third, it provides a user-curation capability through the familiar Wiki interface. Fourth, ArrayWiki provides users with simple text-based searches across all experiment meta-data, and exposes data to search engine crawlers (Semantic Agents) such as Google to further enhance data discovery. Conclusions Microarray data and meta information in ArrayWiki are distributed and visualized using a novel and compact data storage format, BioPNG. Also, they are open to the research community for curation, modification, and contribution. By making a small investment of time to learn the syntax and structure common to all sites running MediaWiki software, domain scientists and practioners can all contribute to make better use of microarray technologies in research and medical practices. ArrayWiki is available at .
Collapse
Affiliation(s)
- Todd H Stokes
- Department of Electrical and Computer Engineering, Georgia Institute of Technology, Van Leer Building, 777 Atlantic Drive NW, Atlanta, GA 30332, USA.
| | | | | | | |
Collapse
|
19
|
Stokes TH, Han X, Moffitt RA, Wang MD. Extending microarray quality control and analysis algorithms to Illumina chip platform. ACTA ACUST UNITED AC 2008; 2007:4637-40. [PMID: 18003039 DOI: 10.1109/iembs.2007.4353373] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
This paper presents a novel data quality control technique for a challenging new microarray platform supplied by Illumina, Inc. Microarray is a revolutionary biotechnology that enables the study of thousands of genes and proteins simultaneously. While the type of microarray chip platforms keeps increasing and the manufacture quality keeps improving, the array data quality control and analysis tools are still lagging behind. In this research, we design an adaptable microarray data quality control and analysis system capable of handling multiple microarray platforms. We demonstrate that the Illumina chips, even though the layouts are randomly assembled, still contain artifacts. We conclude that it is necessary for chip manufacturers to provide low-level bead location output as a standard feature for better data quality assurance.
Collapse
Affiliation(s)
- Todd H Stokes
- Electrical and Computer Engineering Department, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | | | | | | |
Collapse
|