1
|
McCarthy MS, Colburn ZT, Yeung KY, Gillette LH, Hung LH, Elshaw E. A Randomized Controlled Trial of Precision Nutrition Counseling for Service Members at Risk for Metabolic Syndrome. Mil Med 2023; 188:606-613. [PMID: 37948286 DOI: 10.1093/milmed/usad276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 04/13/2023] [Accepted: 07/29/2023] [Indexed: 11/12/2023] Open
Abstract
INTRODUCTION Metabolic syndrome (MetS) is a threat to the active component military as it impacts health, readiness, retention, and cost to the Military Health System. The most prevalent risk factors documented in service members' health records are high blood pressure (BP), low high-density lipoprotein cholesterol, and elevated triglycerides. Other risk factors include abdominal obesity and elevated fasting blood glucose. Precision nutrition counseling and wellness software applications have demonstrated positive results for weight management when coupled with high levels of participant engagement and motivation. MATERIALS AND METHODS In this prospective randomized controlled trial, trained registered dietitians conducted nutrition counseling using results of targeted sequencing, biomarkers, and expert recommendations to reduce the risk for MetS. Upon randomization, the treatment arm initiated six weekly sessions and the control arm received educational pamphlets. An eHealth application captured diet and physical activity. Anthropometrics and BP were measured at baseline, 6 weeks, and 12 weeks, and biomarkers were measured at baseline and 12 weeks. The primary outcome was a change in weight at 12 weeks. Statistical analysis included descriptive statistics and t-tests or analysis of variance with significance set at P < .05. RESULTS Overall, 138 subjects enrolled from November 2019 to February 2021 between two military bases; 107 completed the study. Demographics were as follows: 66% male, mean age 31 years, 66% married, and 49% Caucasian and non-Hispanic. Weight loss was not significant between groups or sites at 12 weeks. Overall, 27% of subjects met the diagnostic criteria for MetS on enrollment and 17.8% upon study completion. High deleterious variant prevalence was identified for genes with single-nucleotide polymorphisms linked to obesity (40%), cholesterol (38%), and BP (58%). Overall, 65% of subjects had low 25(OH)D upon enrollment; 45% remained insufficient at study completion. eHealth app had low adherence yet sufficient correlation with a valid reference. CONCLUSIONS Early signs of progress with weight loss at 6 weeks were not sustained at 12 weeks. DNA-based nutrition counseling was not efficacious for weight loss.
Collapse
Affiliation(s)
- Mary S McCarthy
- Center for Nursing Science & Clinical Inquiry, Madigan Army Medical Center, Tacoma, WA 98431, USA
| | - Zachary T Colburn
- Center for Nursing Science & Clinical Inquiry, Madigan Army Medical Center, Tacoma, WA 98431, USA
| | - Ka Yee Yeung
- School of Engineering and Technology, University of Washington, Tacoma, WA 98402, USA
| | - Laurel H Gillette
- Center for Nursing Science & Clinical Inquiry, Madigan Army Medical Center, Tacoma, WA 98431, USA
| | - Ling-Hong Hung
- School of Engineering and Technology, University of Washington, Tacoma, WA 98402, USA
| | | |
Collapse
|
2
|
Sala-Torra O, Reddy S, Hung LH, Beppu L, Wu D, Radich J, Yeung KY, Yeung CCS. Rapid detection of myeloid neoplasm fusions using single-molecule long-read sequencing. PLOS Glob Public Health 2023; 3:e0002267. [PMID: 37699001 PMCID: PMC10497132 DOI: 10.1371/journal.pgph.0002267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 07/17/2023] [Indexed: 09/14/2023]
Abstract
Recurrent gene fusions are common drivers of disease pathophysiology in leukemias. Identifying these structural variants helps stratify disease by risk and assists with therapy choice. Precise molecular diagnosis in low-and-middle-income countries (LMIC) is challenging given the complexity of assays, trained technical support, and the availability of reliable electricity. Current fusion detection methods require a long turnaround time (7-10 days) or advance knowledge of the genes involved in the fusions. Recent technology developments have made sequencing possible without a sophisticated molecular laboratory, potentially making molecular diagnosis accessible to remote areas and low-income settings. We describe a long-read sequencing DNA assay designed with CRISPR guides to select and enrich for recurrent leukemia fusion genes, that does not need a priori knowledge of the abnormality present. By applying rapid sequencing technology based on nanopores, we sequenced long pieces of genomic DNA and successfully detected fusion genes in cell lines and primary specimens (e.g., BCR::ABL1, PML::RARA, CBFB::MYH11, KMT2A::AFF1) using cloud-based bioinformatics workflows with novel custom fusion finder software. We detected fusion genes in 100% of cell lines with the expected breakpoints and confirmed the presence or absence of a recurrent fusion gene in 12 of 14 patient cases. With our optimized assay and cloud-based bioinformatics workflow, these assays and analyses could be performed in under 8 hours. The platform's portability, potential for adaptation to lower-cost devices, and integrated cloud analysis make this assay a candidate to be placed in settings like LMIC to bridge the need of bedside rapid molecular diagnostics.
Collapse
Affiliation(s)
- Olga Sala-Torra
- Translational Science and Therapeutics Division, Fred Hutchinson Cancer Center, Seattle, Washington, United States of America
- University of Washington, Seattle, Washington, United States of America
| | - Shishir Reddy
- University of Washington, Seattle, Washington, United States of America
| | - Ling-Hong Hung
- University of Washington, Seattle, Washington, United States of America
| | - Lan Beppu
- Translational Science and Therapeutics Division, Fred Hutchinson Cancer Center, Seattle, Washington, United States of America
| | - David Wu
- School of Engineering and Technology, University of Washington Tacoma, Tacoma, Washington, United States of America
| | - Jerald Radich
- Translational Science and Therapeutics Division, Fred Hutchinson Cancer Center, Seattle, Washington, United States of America
- School of Engineering and Technology, University of Washington Tacoma, Tacoma, Washington, United States of America
| | - Ka Yee Yeung
- University of Washington, Seattle, Washington, United States of America
| | - Cecilia C. S. Yeung
- Translational Science and Therapeutics Division, Fred Hutchinson Cancer Center, Seattle, Washington, United States of America
- School of Engineering and Technology, University of Washington Tacoma, Tacoma, Washington, United States of America
| |
Collapse
|
3
|
Hoang V, Hung LH, Perez D, Deng H, Schooley R, Arumilli N, Yeung KY, Lloyd W. Container Profiler: Profiling resource utilization of containerized big data pipelines. Gigascience 2022; 12:giad069. [PMID: 37624874 PMCID: PMC10452954 DOI: 10.1093/gigascience/giad069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 08/02/2023] [Accepted: 08/15/2023] [Indexed: 08/27/2023] Open
Abstract
BACKGROUND This article presents the Container Profiler, a software tool that measures and records the resource usage of any containerized task. Our tool profiles the CPU, memory, disk, and network utilization of containerized tasks collecting over 60 Linux operating system metrics at the virtual machine, container, and process levels. The Container Profiler supports performing time-series profiling at a configurable sampling interval to enable continuous monitoring of the resources consumed by containerized tasks and pipelines. RESULTS To investigate the utility of the Container Profiler, we profile the resource utilization requirements of a multistage bioinformatics analytical pipeline (RNA sequencing using unique molecular identifiers). We examine profiling metrics to assess patterns of CPU, disk, and network resource utilization across the different stages of the pipeline. We also quantify the profiling overhead of our Container Profiler tool to assess the impact of profiling a running pipeline with different levels of profiling granularity, verifying that impacts are negligible. CONCLUSIONS The Container Profiler provides a useful tool that can be used to continuously monitor the resource consumption of long and complex containerized applications that run locally or on the cloud. This can help identify bottlenecks where more resources are needed to improve performance.
Collapse
Affiliation(s)
- Varik Hoang
- School of Engineering and Technology, University of Washington, Tacoma, WA 98402, USA
| | - Ling-Hong Hung
- School of Engineering and Technology, University of Washington, Tacoma, WA 98402, USA
| | - David Perez
- School of Engineering and Technology, University of Washington, Tacoma, WA 98402, USA
| | - Huazeng Deng
- School of Engineering and Technology, University of Washington, Tacoma, WA 98402, USA
| | - Raymond Schooley
- School of Engineering and Technology, University of Washington, Tacoma, WA 98402, USA
| | - Niharika Arumilli
- School of Engineering and Technology, University of Washington, Tacoma, WA 98402, USA
| | - Ka Yee Yeung
- School of Engineering and Technology, University of Washington, Tacoma, WA 98402, USA
| | - Wes Lloyd
- School of Engineering and Technology, University of Washington, Tacoma, WA 98402, USA
| |
Collapse
|
4
|
Hung LH, Straw E, Reddy S, Schmitz R, Colburn Z, Yeung KY. Cloud-enabled Biodepot workflow builder integrates image processing using Fiji with reproducible data analysis using Jupyter notebooks. Sci Rep 2022; 12:14920. [PMID: 36056115 PMCID: PMC9440253 DOI: 10.1038/s41598-022-19173-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 08/25/2022] [Indexed: 11/16/2022] Open
Abstract
Modern biomedical image analyses workflows contain multiple computational processing tasks giving rise to problems in reproducibility. In addition, image datasets can span both spatial and temporal dimensions, with additional channels for fluorescence and other data, resulting in datasets that are too large to be processed locally on a laptop. For omics analyses, software containers have been shown to enhance reproducibility, facilitate installation and provide access to scalable computational resources on the cloud. However, most image analyses contain steps that are graphical and interactive, features that are not supported by most omics execution engines. We present the containerized and cloud-enabled Biodepot-workflow-builder platform that supports graphics from software containers and has been extended for image analyses. We demonstrate the potential of our modular approach with multi-step workflows that incorporate the popular and open-source Fiji suite for image processing. One of our examples integrates fully interactive ImageJ macros with Jupyter notebooks. Our second example illustrates how the complicated cloud setup of an computationally intensive process such as stitching 3D digital pathology datasets using BigStitcher can be automated and simplified. In both examples, users can leverage a form-based graphical interface to execute multi-step workflows with a single click, using the provided sample data and preset input parameters. Alternatively, users can interactively modify the image processing steps in the workflow, apply the workflows to their own data, change the input parameters and macros. By providing interactive graphics support to software containers, our modular platform supports reproducible image analysis workflows, simplified access to cloud resources for analysis of large datasets, and integration across different applications such as Jupyter.
Collapse
Affiliation(s)
- Ling-Hong Hung
- School of Engineering and Technology, University of Washington Tacoma, Box 358426, Tacoma, 98402, WA, USA
| | - Evan Straw
- Biodepot LLC, Seattle, 98195, WA, USA
- University of Washington, Seattle, 98195, WA, USA
| | - Shishir Reddy
- School of Engineering and Technology, University of Washington Tacoma, Box 358426, Tacoma, 98402, WA, USA
| | - Robert Schmitz
- School of Engineering and Technology, University of Washington Tacoma, Box 358426, Tacoma, 98402, WA, USA
- Biodepot LLC, Seattle, 98195, WA, USA
| | | | - Ka Yee Yeung
- School of Engineering and Technology, University of Washington Tacoma, Box 358426, Tacoma, 98402, WA, USA.
- Biodepot LLC, Seattle, 98195, WA, USA.
| |
Collapse
|
5
|
Chan CY, Tang MHY, Wong KC, Chong YK, Yeung KY, Mak TWL. Acute poisoning by dexmedetomidine-containing chewing gum in a child. Pathology 2021; 54:666-667. [PMID: 34801281 DOI: 10.1016/j.pathol.2021.08.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Accepted: 08/20/2021] [Indexed: 10/19/2022]
Affiliation(s)
- Candace Y Chan
- Hospital Authority Toxicology Reference Laboratory, Princess Margaret Hospital, Hong Kong; Chemical Pathology Department, Princess Margaret Hospital, Hong Kong
| | - Magdalene H Y Tang
- Hospital Authority Toxicology Reference Laboratory, Princess Margaret Hospital, Hong Kong
| | - K C Wong
- Chemical Pathology Department, Queen Mary Hospital, Hong Kong
| | - Y K Chong
- Hospital Authority Toxicology Reference Laboratory, Princess Margaret Hospital, Hong Kong; Chemical Pathology Department, Princess Margaret Hospital, Hong Kong
| | - K Y Yeung
- Pediatric and Adolescence Medicine Department, Tuen Mun Hospital, Hong Kong
| | - Tony W L Mak
- Hospital Authority Toxicology Reference Laboratory, Princess Margaret Hospital, Hong Kong; Chemical Pathology Department, Princess Margaret Hospital, Hong Kong.
| |
Collapse
|
6
|
Reddy S, Hung LH, Sala-Torra O, Radich JP, Yeung CC, Yeung KY. A graphical, interactive and GPU-enabled workflow to process long-read sequencing data. BMC Genomics 2021; 22:626. [PMID: 34425749 PMCID: PMC8381503 DOI: 10.1186/s12864-021-07927-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 08/10/2021] [Indexed: 12/18/2022] Open
Abstract
Background Long-read sequencing has great promise in enabling portable, rapid molecular-assisted cancer diagnoses. A key challenge in democratizing long-read sequencing technology in the biomedical and clinical community is the lack of graphical bioinformatics software tools which can efficiently process the raw nanopore reads, support graphical output and interactive visualizations for interpretations of results. Another obstacle is that high performance software tools for long-read sequencing data analyses often leverage graphics processing units (GPU), which is challenging and time-consuming to configure, especially on the cloud. Results We present a graphical cloud-enabled workflow for fast, interactive analysis of nanopore sequencing data using GPUs. Users customize parameters, monitor execution and visualize results through an accessible graphical interface. The workflow and its components are completely containerized to ensure reproducibility and facilitate installation of the GPU-enabled software. We also provide an Amazon Machine Image (AMI) with all software and drivers pre-installed for GPU computing on the cloud. Most importantly, we demonstrate the potential of applying our software tools to reduce the turnaround time of cancer diagnostics by generating blood cancer (NB4, K562, ME1, 238 MV4;11) cell line Nanopore data using the Flongle adapter. We observe a 29x speedup and a 93x reduction in costs for the rate-limiting basecalling step in the analysis of blood cancer cell line data. Conclusions Our interactive and efficient software tools will make analyses of Nanopore data using GPU and cloud computing accessible to biomedical and clinical scientists, thus facilitating the adoption of cost effective, fast, portable and real-time long-read sequencing. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07927-1.
Collapse
Affiliation(s)
| | - Ling-Hong Hung
- School of Engineering and Technology, University of Washington, 98402, Tacoma, WA, USA
| | - Olga Sala-Torra
- Clinical Research Division, Fred Hutchinson Cancer Research Center, 98109, Seattle, WA, USA
| | - Jerald P Radich
- Clinical Research Division, Fred Hutchinson Cancer Research Center, 98109, Seattle, WA, USA.,Clinical Research Division, Kurt Enslein Endowed Chair, Fred Hutchinson Cancer Research Center, 98109, Seattle, WA, USA.,Department of Medicine, University of Washington, 98109, Seattle, WA, USA
| | - Cecilia Cs Yeung
- Clinical Research Division, Fred Hutchinson Cancer Research Center, 98109, Seattle, WA, USA.,Department of Laboratory Medicine and Pathology, University of Washington, 98109, Seattle, WA, USA
| | - Ka Yee Yeung
- School of Engineering and Technology, University of Washington, 98402, Tacoma, WA, USA.
| |
Collapse
|
7
|
Hung LH, Lloyd W, Agumbe Sridhar R, Athmalingam Ravishankar SD, Xiong Y, Sobie E, Yeung KY. Holistic optimization of an RNA-seq workflow for multi-threaded environments. Bioinformatics 2019; 35:4173-4175. [PMID: 30859176 DOI: 10.1093/bioinformatics/btz169] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Revised: 02/01/2019] [Accepted: 03/09/2019] [Indexed: 11/12/2022] Open
Abstract
SUMMARY For many next generation-sequencing pipelines, the most computationally intensive step is the alignment of reads to a reference sequence. As a result, alignment software such as the Burrows-Wheeler Aligner is optimized for speed and is often executed in parallel on the cloud. However, there are other less demanding steps that can also be optimized to significantly increase the speed especially when using many threads. We demonstrate this using a unique molecular identifier RNA-sequencing pipeline consisting of 3 steps: split, align, and merge. Optimization of all three steps yields a 40% increase in speed when executed using a single thread. However, when executed using 16 threads, we observe a 4-fold improvement over the original parallel implementation and more than an 8-fold improvement over the original single-threaded implementation. In contrast, optimizing only the alignment step results in just a 13% improvement over the original parallel workflow using 16 threads. AVAILABILITY AND IMPLEMENTATION Code (M.I.T. license), supporting scripts and Dockerfiles are available at https://github.com/BioDepot/LINCS_RNAseq_cpp and Docker images at https://hub.docker.com/r/biodepot/rnaseq-umi-cpp/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Wes Lloyd
- School of Engineering and Technology, Tacoma, WA, USA
| | | | | | - Yuguang Xiong
- Ichahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Eric Sobie
- Ichahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ka Yee Yeung
- School of Engineering and Technology, Tacoma, WA, USA
| |
Collapse
|
8
|
Liang X, Young WC, Hung LH, Raftery AE, Yeung KY. Integration of Multiple Data Sources for Gene Network Inference Using Genetic Perturbation Data. J Comput Biol 2019; 26:1113-1129. [PMID: 31009236 PMCID: PMC6786343 DOI: 10.1089/cmb.2019.0036] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
The inference of gene networks from large-scale human genomic data is challenging due to the difficulty in identifying correct regulators for each gene in a high-dimensional search space. We present a Bayesian approach integrating external data sources with knockdown data from human cell lines to infer gene regulatory networks. In particular, we assemble multiple data sources, including gene expression data, genome-wide binding data, gene ontology, and known pathways, and use a supervised learning framework to compute prior probabilities of regulatory relationships. We show that our integrated method improves the accuracy of inferred gene networks as well as extends some previous Bayesian frameworks both in theory and applications. We apply our method to two different human cell lines, namely skin melanoma cell line A375 and lung cancer cell line A549, to illustrate the capabilities of our method. Our results show that the improvement in performance could vary from cell line to cell line and that we might need to choose different external data sources serving as prior knowledge if we hope to obtain better accuracy for different cell lines.
Collapse
Affiliation(s)
- Xiao Liang
- Department of Computer Science, Virginia Tech, Blacksburg, Virginia
| | - William Chad Young
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, Washington
| | - Ling-Hong Hung
- School of Engineering and Technology, University of Washington, Tacoma, Washington
| | - Adrian E. Raftery
- Department of Statistics, University of Washington, Seattle, Washington
| | - Ka Yee Yeung
- School of Engineering and Technology, University of Washington, Tacoma, Washington
| |
Collapse
|
9
|
Young WC, Yeung KY, Raftery AE. Identifying Dynamical Time Series Model Parameters from Equilibrium Samples, with Application to Gene Regulatory Networks. STAT MODEL 2019; 19:444-465. [PMID: 33824624 DOI: 10.1177/1471082x18776577] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Gene regulatory network reconstruction is an essential task of genomics in order to further our understanding of how genes interact dynamically with each other. The most readily available data, however, are from steady state observations. These data are not as informative about the relational dynamics between genes as knockout or over-expression experiments, which attempt to control the expression of individual genes. We develop a new framework for network inference using samples from the equilibrium distribution of a vector autoregressive (VAR) time-series model which can be applied to steady state gene expression data. We explore the theoretical aspects of our method and apply the method to synthetic gene expression data generated using GeneNetWeaver.
Collapse
Affiliation(s)
- William Chad Young
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Ka Yee Yeung
- Institute of Technology, University of Washington, Tacoma, WA, USA
| | - Adrian E Raftery
- Department of Statistics, University of Washington, Seattle, WA, USA
| |
Collapse
|
10
|
Fourati S, Talla A, Mahmoudian M, Burkhart JG, Klén R, Henao R, Yu T, Aydın Z, Yeung KY, Ahsen ME, Almugbel R, Jahandideh S, Liang X, Nordling TEM, Shiga M, Stanescu A, Vogel R, Pandey G, Chiu C, McClain MT, Woods CW, Ginsburg GS, Elo LL, Tsalik EL, Mangravite LM, Sieberts SK. A crowdsourced analysis to identify ab initio molecular signatures predictive of susceptibility to viral infection. Nat Commun 2018; 9:4418. [PMID: 30356117 PMCID: PMC6200745 DOI: 10.1038/s41467-018-06735-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Accepted: 09/12/2018] [Indexed: 01/17/2023] Open
Abstract
The response to respiratory viruses varies substantially between individuals, and there are currently no known molecular predictors from the early stages of infection. Here we conduct a community-based analysis to determine whether pre- or early post-exposure molecular factors could predict physiologic responses to viral exposure. Using peripheral blood gene expression profiles collected from healthy subjects prior to exposure to one of four respiratory viruses (H1N1, H3N2, Rhinovirus, and RSV), as well as up to 24 h following exposure, we find that it is possible to construct models predictive of symptomatic response using profiles even prior to viral exposure. Analysis of predictive gene features reveal little overlap among models; however, in aggregate, these genes are enriched for common pathways. Heme metabolism, the most significantly enriched pathway, is associated with a higher risk of developing symptoms following viral exposure. This study demonstrates that pre-exposure molecular predictors can be identified and improves our understanding of the mechanisms of response to respiratory viruses.
Collapse
Affiliation(s)
- Slim Fourati
- Department of Pathology, School of Medicine, Case Western Reserve University, Cleveland, OH, 44106, USA
| | - Aarthi Talla
- Department of Pathology, School of Medicine, Case Western Reserve University, Cleveland, OH, 44106, USA
| | - Mehrad Mahmoudian
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, FI-20520, Turku, Finland
- Department of Future Technologies, University of Turku, FI-20014 Turku, Finland
| | - Joshua G Burkhart
- Department of Medical Informatics and Clinical Epidemiology, School of Medicine, Oregon Health & Science University, Portland, OR, 97239, USA
- Laboratory of Evolutionary Genetics, Institute of Ecology and Evolution, University of Oregon, Eugene, OR, 97403, USA
| | - Riku Klén
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, FI-20520, Turku, Finland
| | - Ricardo Henao
- Duke Center for Applied Genomics and Precision Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
- Department of Electrical and Computer Engineering, Duke University, Durham, NC, 27708, USA
| | - Thomas Yu
- Sage Bionetworks, Seattle, WA, 98121, USA
| | - Zafer Aydın
- Department of Computer Engineering, Abdullah Gul University, Kayseri, 38080, Turkey
| | - Ka Yee Yeung
- School of Engineering and Technology, University of Washington Tacoma, Tacoma, WA, 98402, USA
| | - Mehmet Eren Ahsen
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Reem Almugbel
- School of Engineering and Technology, University of Washington Tacoma, Tacoma, WA, 98402, USA
| | | | - Xiao Liang
- School of Engineering and Technology, University of Washington Tacoma, Tacoma, WA, 98402, USA
| | - Torbjörn E M Nordling
- Department of Mechanical Engineering, National Cheng Kung University, Tainan, 70101, Taiwan
| | - Motoki Shiga
- Department of Electrical, Electronic and Computer Engineering, Faculty of Engineering, Gifu University, Gifu, 501-1193, Japan
| | - Ana Stanescu
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Computer Science, University of West Georgia, Carrolton, GA, 30116, USA
| | - Robert Vogel
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- IBM T.J. Watson Research Center, Yorktown Heights, NY, 10598, USA
| | - Gaurav Pandey
- Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Christopher Chiu
- Section of Infectious Diseases and Immunity, Imperial College London, London, W12 0NN, UK
| | - Micah T McClain
- Duke Center for Applied Genomics and Precision Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
- Medical Service, Durham VA Health Care System, Durham, NC, 27705, USA
- Department of Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
| | - Christopher W Woods
- Duke Center for Applied Genomics and Precision Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
- Medical Service, Durham VA Health Care System, Durham, NC, 27705, USA
- Department of Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
| | - Geoffrey S Ginsburg
- Duke Center for Applied Genomics and Precision Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
- Department of Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
| | - Laura L Elo
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, FI-20520, Turku, Finland
| | - Ephraim L Tsalik
- Duke Center for Applied Genomics and Precision Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
- Department of Medicine, Duke University School of Medicine, Durham, NC, 27710, USA
- Emergency Medicine Service, Durham VA Health Care System, Durham, NC, 27705, USA
| | | | | |
Collapse
|
11
|
Abstract
Background Using software containers has become standard practice to reproducibly deploy and execute biomedical workflows on the cloud. However, some applications that contain time-consuming initialization steps will produce unnecessary costs for repeated executions. Findings We demonstrate that hot-starting from containers that have been frozen after the application has already begun execution can speed up bioinformatics workflows by avoiding repetitive initialization steps. We use an open-source tool called Checkpoint and Restore in Userspace (CRIU) to save the state of the containers as a collection of checkpoint files on disk after it has read in the indices. The resulting checkpoint files are migrated to the host, and CRIU is used to regenerate the containers in that ready-to-run hot-start state. As a proof-of-concept example, we create a hot-start container for the spliced transcripts alignment to a reference (STAR) aligner and deploy this container to align RNA sequencing data. We compare the performance of the alignment step with and without checkpoints on cloud platforms using local and network disks. Conclusions We demonstrate that hot-starting Docker containers from snapshots taken after repetitive initialization steps are completed significantly speeds up the execution of the STAR aligner on all experimental platforms, including Amazon Web Services, Microsoft Azure, and local virtual machines. Our method can be potentially employed in other bioinformatics applications in which a checkpoint can be inserted after a repetitive initialization phase.
Collapse
Affiliation(s)
- Pai Zhang
- School of Engineering and Technology, Campus Box 358426, 1900 Commerce Street, University of Washington, Tacoma, Washington 98402-3100, USA
| | - Ling-Hong Hung
- School of Engineering and Technology, Campus Box 358426, 1900 Commerce Street, University of Washington, Tacoma, Washington 98402-3100, USA
| | - Wes Lloyd
- School of Engineering and Technology, Campus Box 358426, 1900 Commerce Street, University of Washington, Tacoma, Washington 98402-3100, USA
| | - Ka Yee Yeung
- School of Engineering and Technology, Campus Box 358426, 1900 Commerce Street, University of Washington, Tacoma, Washington 98402-3100, USA
| |
Collapse
|
12
|
Abstract
Inferring genetic networks from genome-wide expression data is extremely demanding computationally. We have developed fastBMA, a distributed, parallel, and scalable implementation of Bayesian model averaging (BMA) for this purpose. fastBMA also includes a computationally efficient module for eliminating redundant indirect edges in the network by mapping the transitive reduction to an easily solved shortest-path problem. We evaluated the performance of fastBMA on synthetic data and experimental genome-wide time series yeast and human datasets. When using a single CPU core, fastBMA is up to 100 times faster than the next fastest method, LASSO, with increased accuracy. It is a memory-efficient, parallel, and distributed application that scales to human genome-wide expression data. A 10 000-gene regulation network can be obtained in a matter of hours using a 32-core cloud cluster (2 nodes of 16 cores). fastBMA is a significant improvement over its predecessor ScanBMA. It is more accurate and orders of magnitude faster than other fast network inference methods such as the 1 based on LASSO. The improved scalability allows it to calculate networks from genome scale data in a reasonable time frame. The transitive reduction method can improve accuracy in denser networks. fastBMA is available as code (M.I.T. license) from GitHub (https://github.com/lhhunghimself/fastBMA), as part of the updated networkBMA Bioconductor package (https://www.bioconductor.org/packages/release/bioc/html/networkBMA.html) and as ready-to-deploy Docker images (https://hub.docker.com/r/biodepot/fastbma/).
Collapse
Affiliation(s)
- Ling-Hong Hung
- Institute of Technology, University of Washington, Tacoma Campus, Box 358426, 1900 Commerce Street, Tacoma, WA 98402-3100, U.S.A
| | - Kaiyuan Shi
- Institute of Technology, University of Washington, Tacoma Campus, Box 358426, 1900 Commerce Street, Tacoma, WA 98402-3100, U.S.A
| | - Migao Wu
- Institute of Technology, University of Washington, Tacoma Campus, Box 358426, 1900 Commerce Street, Tacoma, WA 98402-3100, U.S.A
| | - William Chad Young
- Department of Statistics, University of Washington, Box 354322, Seattle, WA 98195-4322, U.S.A
| | - Adrian E. Raftery
- Department of Statistics, University of Washington, Box 354322, Seattle, WA 98195-4322, U.S.A
| | - Ka Yee Yeung
- Institute of Technology, University of Washington, Tacoma Campus, Box 358426, 1900 Commerce Street, Tacoma, WA 98402-3100, U.S.A
- Correspondence address. Ka Yee Yeung, Institute of Technology, University of Washington, Tacoma Campus, Box 358426, 1900 Commerce Street, Tacoma, WA 98402-3100, U.S.A.; Tel: 253-692-4924; Fax: 253-692-5862; E-mail:
| |
Collapse
|
13
|
Mittal V, Hung LH, Keswani J, Kristiyanto D, Lee SB, Yeung KY. GUIdock-VNC: using a graphical desktop sharing system to provide a browser-based interface for containerized software. Gigascience 2018; 6:1-6. [PMID: 28327936 PMCID: PMC5530313 DOI: 10.1093/gigascience/giw013] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2016] [Accepted: 12/16/2016] [Indexed: 11/30/2022] Open
Abstract
Background: Software container technology such as Docker can be used to package and distribute bioinformatics workflows consisting of multiple software implementations and dependencies. However, Docker is a command line–based tool, and many bioinformatics pipelines consist of components that require a graphical user interface. Results: We present a container tool called GUIdock-VNC that uses a graphical desktop sharing system to provide a browser-based interface for containerized software. GUIdock-VNC uses the Virtual Network Computing protocol to render the graphics within most commonly used browsers. We also present a minimal image builder that can add our proposed graphical desktop sharing system to any Docker packages, with the end result that any Docker packages can be run using a graphical desktop within a browser. In addition, GUIdock-VNC uses the Oauth2 authentication protocols when deployed on the cloud. Conclusions: As a proof-of-concept, we demonstrated the utility of GUIdock-noVNC in gene network inference. We benchmarked our container implementation on various operating systems and showed that our solution creates minimal overhead.
Collapse
|
14
|
Almugbel R, Hung LH, Hu J, Almutairy A, Ortogero N, Tamta Y, Yeung KY. Reproducible Bioconductor workflows using browser-based interactive notebooks and containers. J Am Med Inform Assoc 2018; 25:4-12. [PMID: 29092073 PMCID: PMC6381817 DOI: 10.1093/jamia/ocx120] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Revised: 08/31/2017] [Accepted: 09/28/2017] [Indexed: 11/14/2022] Open
Abstract
Objective Bioinformatics publications typically include complex software workflows that are difficult to describe in a manuscript. We describe and demonstrate the use of interactive software notebooks to document and distribute bioinformatics research. We provide a user-friendly tool, BiocImageBuilder, that allows users to easily distribute their bioinformatics protocols through interactive notebooks uploaded to either a GitHub repository or a private server. Materials and methods We present four different interactive Jupyter notebooks using R and Bioconductor workflows to infer differential gene expression, analyze cross-platform datasets, process RNA-seq data and KinomeScan data. These interactive notebooks are available on GitHub. The analytical results can be viewed in a browser. Most importantly, the software contents can be executed and modified. This is accomplished using Binder, which runs the notebook inside software containers, thus avoiding the need to install any software and ensuring reproducibility. All the notebooks were produced using custom files generated by BiocImageBuilder. Results BiocImageBuilder facilitates the publication of workflows with a point-and-click user interface. We demonstrate that interactive notebooks can be used to disseminate a wide range of bioinformatics analyses. The use of software containers to mirror the original software environment ensures reproducibility of results. Parameters and code can be dynamically modified, allowing for robust verification of published results and encouraging rapid adoption of new methods. Conclusion Given the increasing complexity of bioinformatics workflows, we anticipate that these interactive software notebooks will become as necessary for documenting software methods as traditional laboratory notebooks have been for documenting bench protocols, and as ubiquitous.
Collapse
Affiliation(s)
- Reem Almugbel
- Institute of Technology, University of Washington, Tacoma, WA, USA
| | - Ling-Hong Hung
- Institute of Technology, University of Washington, Tacoma, WA, USA
| | - Jiaming Hu
- Institute of Technology, University of Washington, Tacoma, WA, USA
| | - Abeer Almutairy
- Institute of Technology, University of Washington, Tacoma, WA, USA
| | - Nicole Ortogero
- Department of Clinical Investigation, Madigan Army Medical Center, Tacoma, WA, USA
| | - Yashaswi Tamta
- Institute of Technology, University of Washington, Tacoma, WA, USA
| | - Ka Yee Yeung
- Institute of Technology, University of Washington, Tacoma, WA, USA
| |
Collapse
|
15
|
Abstract
The NIH Library of Integrated Network-based Cellular Signatures (LINCS) contains gene expression data from over a million experiments, using Luminex Bead technology. Only 500 colors are used to measure the expression levels of the 1,000 landmark genes measured, and the data for the resulting pairs of genes are deconvolved. The raw data are sometimes inadequate for reliable deconvolution, leading to artifacts in the final processed data. These include the expression levels of paired genes being flipped or given the same value, and clusters of values that are not at the true expression level. We propose a new method called model-based clustering with data correction (MCDC) that is able to identify and correct these three kinds of artifacts simultaneously. We show that MCDC improves the resulting gene expression data in terms of agreement with external baselines, as well as improving results from subsequent analysis.
Collapse
Affiliation(s)
- William Chad Young
- Department of Statistics, University of Washington, Box 354322, Seattle, WA 98195
| | - Adrian E Raftery
- Department of Statistics, University of Washington, Box 354322, Seattle, WA 98195
| | - Ka Yee Yeung
- Institute of Technology, University of Washington Tacoma, Campus Box 358426, 1900 Commerce Street, Tacoma, WA 98402
| |
Collapse
|
16
|
Chung PH, Wong CW, Lai CK, Siu HK, Tsang DN, Yeung KY, Ip DK, Tam PK. A prospective interventional study to examine the effect of a silver alloy and hydrogel-coated catheter on the incidence of catheter-associated urinary tract infection. Hong Kong Med J 2017; 23:239-45. [PMID: 28211358 DOI: 10.12809/hkmj164906] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
INTRODUCTION Catheter-associated urinary tract infection is a major hospital-acquired infection. This study aimed to analyse the effect of a silver alloy and hydrogel-coated catheter on the occurrence of catheter-associated urinary tract infection. METHODS This was a 1-year prospective study conducted at a single centre in Hong Kong. Adult patients with an indwelling urinary catheter for longer than 24 hours were recruited. The incidence of catheter-associated urinary tract infection in patients with a conventional latex Foley catheter without hydrogel was compared with that in patients with a silver alloy and hydrogel-coated catheter. The most recent definition of urinary tract infection was based on the latest surveillance definition of the National Healthcare Safety Network managed by Centers for Disease Control and Prevention. RESULTS A total of 306 patients were recruited with a similar ratio between males and females. The mean (standard deviation) age was 81.1 (10.5) years. The total numbers of catheter-days were 4352 and 7474 in the silver-coated and conventional groups, respectively. The incidences of catheter-associated urinary tract infection per 1000 catheter-days were 6.4 and 9.4, respectively (P=0.095). There was a 31% reduction in the incidence of catheter-associated urinary tract infection per 1000 catheter-days in the silver-coated group. Escherichia coli was the most commonly involved pathogen (36.7%) of all cases. Subgroup analysis revealed that the protective effect of silver-coated catheter was more pronounced in long-term users as well as female patients with a respective 48% (P=0.027) and 42% (P=0.108) reduction in incidence of catheter-associated urinary tract infection. The mean catheterisation time per person was the longest in patients using a silver-coated catheter (17.0 days) compared with those using a conventional (10.8 days) or both types of catheter (13.6 days) [P=0.01]. CONCLUSIONS Silver alloy and hydrogel-coated catheters appear to be effective in preventing catheter-associated urinary tract infection based on the latest surveillance definition. The effect is perhaps more prominent in long-term users and female patients.
Collapse
Affiliation(s)
- P Hy Chung
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong
| | - C Wy Wong
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong
| | - C Kc Lai
- Department of Pathology, Queen Elizabeth Hospital, Jordan, Hong Kong
| | - H K Siu
- Chief Infection Control Officer's Office, Hospital Authority, Hong Kong
| | - D Nc Tsang
- Department of Pathology, Queen Elizabeth Hospital, Jordan, Hong Kong.,Chief Infection Control Officer's Office, Hospital Authority, Hong Kong
| | - K Y Yeung
- Infection Control Team, Central Nursing Department, Kowloon Hospital, Argyle Street, Hong Kong
| | - D Km Ip
- School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong
| | - P Kh Tam
- Department of Surgery, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong
| |
Collapse
|
17
|
Young WC, Raftery AE, Yeung KY. A posterior probability approach for gene regulatory network inference in genetic perturbation data. Math Biosci Eng 2016; 13:1241-1251. [PMID: 27775378 DOI: 10.3934/mbe.2016041] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Inferring gene regulatory networks is an important problem in systems biology. However, these networks can be hard to infer from experimental data because of the inherent variability in biological data as well as the large number of genes involved. We propose a fast, simple method for inferring regulatory relationships between genes from knockdown experiments in the NIH LINCS dataset by calculating posterior probabilities, incorporating prior information. We show that the method is able to find previously identified edges from TRANSFAC and JASPAR and discuss the merits and limitations of this approach.
Collapse
Affiliation(s)
- William Chad Young
- University of Washington, Department of Statistics, Box 354322, Seattle, WA 98195-4322, United States.
| | | | | |
Collapse
|
18
|
Hung LH, Kristiyanto D, Lee SB, Yeung KY. GUIdock: Using Docker Containers with a Common Graphics User Interface to Address the Reproducibility of Research. PLoS One 2016; 11:e0152686. [PMID: 27045593 PMCID: PMC4821530 DOI: 10.1371/journal.pone.0152686] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Accepted: 03/17/2016] [Indexed: 12/03/2022] Open
Abstract
Reproducibility is vital in science. For complex computational methods, it is often necessary, not just to recreate the code, but also the software and hardware environment to reproduce results. Virtual machines, and container software such as Docker, make it possible to reproduce the exact environment regardless of the underlying hardware and operating system. However, workflows that use Graphical User Interfaces (GUIs) remain difficult to replicate on different host systems as there is no high level graphical software layer common to all platforms. GUIdock allows for the facile distribution of a systems biology application along with its graphics environment. Complex graphics based workflows, ubiquitous in systems biology, can now be easily exported and reproduced on many different platforms. GUIdock uses Docker, an open source project that provides a container with only the absolutely necessary software dependencies and configures a common X Windows (X11) graphic interface on Linux, Macintosh and Windows platforms. As proof of concept, we present a Docker package that contains a Bioconductor application written in R and C++ called networkBMA for gene network inference. Our package also includes Cytoscape, a java-based platform with a graphical user interface for visualizing and analyzing gene networks, and the CyNetworkBMA app, a Cytoscape app that allows the use of networkBMA via the user-friendly Cytoscape interface.
Collapse
Affiliation(s)
- Ling-Hong Hung
- Institute of Technology, University of Washington, Tacoma, WA 98402, United States of America
| | - Daniel Kristiyanto
- Institute of Technology, University of Washington, Tacoma, WA 98402, United States of America
| | - Sung Bong Lee
- Institute of Technology, University of Washington, Tacoma, WA 98402, United States of America
| | - Ka Yee Yeung
- Institute of Technology, University of Washington, Tacoma, WA 98402, United States of America
- * E-mail:
| |
Collapse
|
19
|
Fronczuk M, Raftery AE, Yeung KY. CyNetworkBMA: a Cytoscape app for inferring gene regulatory networks. Source Code Biol Med 2015; 10:11. [PMID: 26566394 PMCID: PMC4642660 DOI: 10.1186/s13029-015-0043-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/22/2014] [Accepted: 10/31/2015] [Indexed: 12/31/2022]
Abstract
Background Inference of gene networks from expression data is an important problem in computational biology. Many algorithms have been proposed for solving the problem efficiently. However, many of the available implementations are programming libraries that require users to write code, which limits their accessibility. Results We have developed a tool called CyNetworkBMA for inferring gene networks from expression data that integrates with Cytoscape. Our application offers a graphical user interface for networkBMA, an efficient implementation of Bayesian Model Averaging methods for network construction. The client-server architecture of CyNetworkBMA makes it possible to distribute or centralize computation depending on user needs. Conclusions CyNetworkBMA is an easy-to-use tool that makes network inference accessible to non-programmers through seamless integration with Cytoscape. CyNetworkBMA is available on the Cytoscape App Store at http://apps.cytoscape.org/apps/cynetworkbma. Electronic supplementary material The online version of this article (doi:10.1186/s13029-015-0043-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Maciej Fronczuk
- Institute of Technology, University of Washington, Tacoma, 98402 WA USA
| | - Adrian E Raftery
- Department of Statistics, University of Washington, Seattle, 98195 WA USA
| | - Ka Yee Yeung
- Institute of Technology, University of Washington, Tacoma, 98402 WA USA
| |
Collapse
|
20
|
Becker PS, Schmitt MW, Loeb LA, Gu W, Wei Q, Xie Z, Carson AR, Martins T, Blau CA, Oehler V, Yeung KY. Correlation of genomic analysis by MyAML with chemotherapy drug sensitivity. J Clin Oncol 2015. [DOI: 10.1200/jco.2015.33.15_suppl.7080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
| | | | | | - Weiyi Gu
- University of Washington, Tacoma, WA
| | - Qi Wei
- University of Washington, Tacoma, WA
| | | | | | | | | | | | | |
Collapse
|
21
|
Young WC, Raftery AE, Yeung KY. Fast Bayesian inference for gene regulatory networks using ScanBMA. BMC Syst Biol 2014; 8:47. [PMID: 24742092 PMCID: PMC4006459 DOI: 10.1186/1752-0509-8-47] [Citation(s) in RCA: 64] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2013] [Accepted: 04/04/2014] [Indexed: 11/22/2022]
Abstract
Background Genome-wide time-series data provide a rich set of information for discovering gene regulatory relationships. As genome-wide data for mammalian systems are being generated, it is critical to develop network inference methods that can handle tens of thousands of genes efficiently, provide a systematic framework for the integration of multiple data sources, and yield robust, accurate and compact gene-to-gene relationships. Results We developed and applied ScanBMA, a Bayesian inference method that incorporates external information to improve the accuracy of the inferred network. In particular, we developed a new strategy to efficiently search the model space, applied data transformations to reduce the effect of spurious relationships, and adopted the g-prior to guide the search for candidate regulators. Our method is highly computationally efficient, thus addressing the scalability issue with network inference. The method is implemented as the ScanBMA function in the networkBMA Bioconductor software package. Conclusions We compared ScanBMA to other popular methods using time series yeast data as well as time-series simulated data from the DREAM competition. We found that ScanBMA produced more compact networks with a greater proportion of true positives than the competing methods. Specifically, ScanBMA generally produced more favorable areas under the Receiver-Operating Characteristic and Precision-Recall curves than other regression-based methods and mutual-information based methods. In addition, ScanBMA is competitive with other network inference methods in terms of running time.
Collapse
Affiliation(s)
| | | | - Ka Yee Yeung
- Department of Microbiology, University of Washington, Box 357735, 98195-7735, Seattle WA, USA.
| |
Collapse
|
22
|
Dickinson A, Yeung KY, Donoghue J, Baker MJ, Kelly RD, McKenzie M, Johns TG, St John JC. The regulation of mitochondrial DNA copy number in glioblastoma cells. Cell Death Differ 2013; 20:1644-53. [PMID: 23995230 PMCID: PMC3824586 DOI: 10.1038/cdd.2013.115] [Citation(s) in RCA: 95] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2013] [Revised: 07/10/2013] [Accepted: 07/22/2013] [Indexed: 01/07/2023] Open
Abstract
As stem cells undergo differentiation, mitochondrial DNA (mtDNA) copy number is strictly regulated in order that specialized cells can generate appropriate levels of adenosine triphosphate (ATP) through oxidative phosphorylation (OXPHOS) to undertake their specific functions. It is not understood whether tumor-initiating cells regulate their mtDNA in a similar manner or whether mtDNA is essential for tumorigenesis. We show that human neural stem cells (hNSCs) increased their mtDNA content during differentiation in a process that was mediated by a synergistic relationship between the nuclear and mitochondrial genomes and results in increased respiratory capacity. Differentiating multipotent glioblastoma cells failed to match the expansion in mtDNA copy number, patterns of gene expression and increased respiratory capacity observed in hNSCs. Partial depletion of glioblastoma cell mtDNA rescued mtDNA replication events and enhanced cell differentiation. However, prolonged depletion resulted in impaired mtDNA replication, reduced proliferation and induced the expression of early developmental and pro-survival markers including POU class 5 homeobox 1 (OCT4) and sonic hedgehog (SHH). The transfer of glioblastoma cells depleted to varying degrees of their mtDNA content into immunocompromised mice resulted in tumors requiring significantly longer to form compared with non-depleted cells. The number of tumors formed and the time to tumor formation was relative to the degree of mtDNA depletion. The tumors derived from mtDNA depleted glioblastoma cells recovered their mtDNA copy number as part of the tumor formation process. These outcomes demonstrate the importance of mtDNA to the initiation and maintenance of tumorigenesis in glioblastoma multiforme.
Collapse
Affiliation(s)
- A Dickinson
- 1] The Mitochondrial Genetics Group, Centre for Genetic Diseases, Monash Institute of Medical Research, Monash University, 27-31 Wright Street, Clayton, Victoria 3168, Australia [2] Molecular Basis of Metabolic Disease, Division of Metabolic and Vascular Health, Warwick Medical School, The University of Warwick, Clifford Bridge Road, Coventry, CV2 2DX, UK
| | | | | | | | | | | | | | | |
Collapse
|
23
|
Lo K, Raftery AE, Dombek KM, Zhu J, Schadt EE, Bumgarner RE, Yeung KY. Integrating external biological knowledge in the construction of regulatory networks from time-series expression data. BMC Syst Biol 2012; 6:101. [PMID: 22898396 PMCID: PMC3465231 DOI: 10.1186/1752-0509-6-101] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2012] [Accepted: 07/24/2012] [Indexed: 01/27/2023]
Abstract
BACKGROUND Inference about regulatory networks from high-throughput genomics data is of great interest in systems biology. We present a Bayesian approach to infer gene regulatory networks from time series expression data by integrating various types of biological knowledge. RESULTS We formulate network construction as a series of variable selection problems and use linear regression to model the data. Our method summarizes additional data sources with an informative prior probability distribution over candidate regression models. We extend the Bayesian model averaging (BMA) variable selection method to select regulators in the regression framework. We summarize the external biological knowledge by an informative prior probability distribution over the candidate regression models. CONCLUSIONS We demonstrate our method on simulated data and a set of time-series microarray experiments measuring the effect of a drug perturbation on gene expression levels, and show that it outperforms leading regression-based methods in the literature.
Collapse
Affiliation(s)
- Kenneth Lo
- Department of Microbiology, University of Washington, Box 358070, Seattle, WA, 98195, USA
| | - Adrian E Raftery
- Department of Statistics, University of Washington, Box 354320, Seattle, WA, 98195, USA
| | - Kenneth M Dombek
- Department of Biochemistry, University of Washington, Box 357350, Seattle, WA, 98195, USA
| | - Jun Zhu
- Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York, NY, 10029, USA
| | - Eric E Schadt
- Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York, NY, 10029, USA
| | - Roger E Bumgarner
- Department of Microbiology, University of Washington, Box 358070, Seattle, WA, 98195, USA
| | - Ka Yee Yeung
- Department of Microbiology, University of Washington, Box 358070, Seattle, WA, 98195, USA
| |
Collapse
|
24
|
Abstract
Network models are widely used in social sciences and genome sciences. The latent space model proposed by (Hoff et al. 2002), and extended by (Handcock et al. 2007) to incorporate clustering, provides a visually interpretable model-based spatial representation of relational data and takes account of several intrinsic network properties. Due to the structure of the likelihood function of the latent space model, the computational cost is of order O(N2), where N is the number of nodes. This makes it infeasible for large networks. In this paper, we propose an approximation of the log likelihood function. We adopt the case-control idea from epidemiology and construct a case-control likelihood which is an unbiased estimator of the full likelihood. Replacing the full likelihood by the case-control likelihood in the MCMC estimation of the latent space model reduces the computational time from O(N2) to O(N), making it feasible for large networks. We evaluate its performance using simulated and real data. We fit the model to a large protein-protein interaction data using the case-control likelihood and use the model fitted link probabilities to identify false positive links.
Collapse
Affiliation(s)
- Adrian E Raftery
- Department of Statistics, University of Washington, Seattle, Wash., USA
| | - Xiaoyue Niu
- Department of Statistics, University of Washington, Seattle, Wash., USA
| | - Peter D Hoff
- Department of Statistics, University of Washington, Seattle, Wash., USA
| | - Ka Yee Yeung
- Department of Statistics, University of Washington, Seattle, Wash., USA
| |
Collapse
|
25
|
Yeung KY, Gooley TA, Zhang A, Raftery AE, Radich JP, Oehler VG. Predicting relapse prior to transplantation in chronic myeloid leukemia by integrating expert knowledge and expression data. ACTA ACUST UNITED AC 2012; 28:823-30. [PMID: 22296787 DOI: 10.1093/bioinformatics/bts059] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
MOTIVATION Selecting a small number of signature genes for accurate classification of samples is essential for the development of diagnostic tests. However, many genes are highly correlated in gene expression data, and hence, many possible sets of genes are potential classifiers. Because treatment outcomes are poor in advanced chronic myeloid leukemia (CML), we hypothesized that expression of classifiers of advanced phase CML when detected in early CML [chronic phase (CP) CML], correlates with subsequent poorer therapeutic outcome. RESULTS We developed a method that integrates gene expression data with expert knowledge and predicted functional relationships using iterative Bayesian model averaging. Applying our integrated method to CML, we identified small sets of signature genes that are highly predictive of disease phases and that are more robust and stable than using expression data alone. The accuracy of our algorithm was evaluated using cross-validation on the gene expression data. We then tested the hypothesis that gene sets associated with advanced phase CML would predict relapse after allogeneic transplantation in 176 independent CP CML cases. Our gene signatures of advanced phase CML are predictive of relapse even after adjustment for known risk factors associated with transplant outcomes.
Collapse
Affiliation(s)
- K Y Yeung
- Department of Microbiology, University of Washington, Seattle, WA 98195, USA.
| | | | | | | | | | | |
Collapse
|
26
|
Zarbl H, Gallo MA, Glick J, Yeung KY, Vouros P. The vanishing zero revisited: thresholds in the age of genomics. Chem Biol Interact 2010; 184:273-8. [PMID: 20109442 DOI: 10.1016/j.cbi.2010.01.031] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2009] [Revised: 01/11/2010] [Accepted: 01/18/2010] [Indexed: 10/19/2022]
Abstract
The concept of the vanishing zero, which was first discussed 50 years ago in relation to pesticide residues in foods and food crops, focused on the unintended regulatory consequences created by ever-increasing sensitivity and selectivity of analytical methods, in conjunction with the ambiguous wording of legislation meant to protect public health. In the interim, the ability to detect xenobiotics in most substrates has increased from tens of parts per million to parts per trillion or less, challenging our ability to interpret the biological significance of exposures at the lowest detectable levels. As a result the focus of risk assessment, especially for potential carcinogens, has shifted from defining an acceptable level, to extrapolating from the best available analytical results. Analysis of gene expression profiles in exposed target cells using genomic technologies can identify biological pathways induced or repressed by the exposure as a function of dose and time. This treatise explores how toxicogenomic responses at low doses may inform risk assessment and risk management by defining thresholds for cellular responses linked to modes or mechanisms of toxicity at the molecular level.
Collapse
Affiliation(s)
- Helmut Zarbl
- Environmental and Occupational Health Sciences Institute, Robert Wood Johnson Medical School, University of Medicine and Dentistry of New Jersey, Piscataway, NJ 08854, USA.
| | | | | | | | | |
Collapse
|
27
|
Abstract
In this chapter, we discuss a number of approaches to network inference from large-scale functional genomics data. Our goal is to describe current methods that can be used to infer predictive networks. At present, one of the most effective methods to produce networks with predictive value is the Bayesian network approach. This approach was initially instantiated by Friedman et al. and further refined by Eric Schadt and his research group. The Bayesian network approach has the virtue of identifying predictive relationships between genes from a combination of expression and eQTL data. However, the approach does not provide a mechanistic bases for predictive relationships and is ultimately hampered by an inability to model feedback. A challenge for the future is to produce networks that are both predictive and provide mechanistic understanding. To do so, the methods described in several chapters of this book will need to be integrated. Other chapters of this book describe a number of methods to identify or predict network components such as physical interactions. At the end of this chapter, we speculate that some of the approaches from other chapters could be integrated and used to "annotate" the edges of the Bayesian networks. This would take the Bayesian networks one step closer to providing mechanistic "explanations" for the relationships between the network nodes.
Collapse
Affiliation(s)
- Roger E Bumgarner
- Department of Microbiology, University of Washington, Seattle, WA, USA
| | | |
Collapse
|
28
|
Annest A, Bumgarner RE, Raftery AE, Yeung KY. Iterative Bayesian Model Averaging: a method for the application of survival analysis to high-dimensional microarray data. BMC Bioinformatics 2009; 10:72. [PMID: 19245714 PMCID: PMC2657791 DOI: 10.1186/1471-2105-10-72] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2008] [Accepted: 02/26/2009] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Microarray technology is increasingly used to identify potential biomarkers for cancer prognostics and diagnostics. Previously, we have developed the iterative Bayesian Model Averaging (BMA) algorithm for use in classification. Here, we extend the iterative BMA algorithm for application to survival analysis on high-dimensional microarray data. The main goal in applying survival analysis to microarray data is to determine a highly predictive model of patients' time to event (such as death, relapse, or metastasis) using a small number of selected genes. Our multivariate procedure combines the effectiveness of multiple contending models by calculating the weighted average of their posterior probability distributions. Our results demonstrate that our iterative BMA algorithm for survival analysis achieves high prediction accuracy while consistently selecting a small and cost-effective number of predictor genes. RESULTS We applied the iterative BMA algorithm to two cancer datasets: breast cancer and diffuse large B-cell lymphoma (DLBCL) data. On the breast cancer data, the algorithm selected a total of 15 predictor genes across 84 contending models from the training data. The maximum likelihood estimates of the selected genes and the posterior probabilities of the selected models from the training data were used to divide patients in the test (or validation) dataset into high- and low-risk categories. Using the genes and models determined from the training data, we assigned patients from the test data into highly distinct risk groups (as indicated by a p-value of 7.26e-05 from the log-rank test). Moreover, we achieved comparable results using only the 5 top selected genes with 100% posterior probabilities. On the DLBCL data, our iterative BMA procedure selected a total of 25 genes across 3 contending models from the training data. Once again, we assigned the patients in the validation set to significantly distinct risk groups (p-value = 0.00139). CONCLUSION The strength of the iterative BMA algorithm for survival analysis lies in its ability to account for model uncertainty. The results from this study demonstrate that our procedure selects a small number of genes while eclipsing other methods in predictive performance, making it a highly accurate and cost-effective prognostic tool in the clinical setting.
Collapse
Affiliation(s)
- Amalia Annest
- Institute of Technology/Computing and Software Systems, Box 358426, University of Washington, Tacoma, WA 98402, USA
| | - Roger E Bumgarner
- Department of Microbiology, Box 358070, University of Washington, Seattle, WA 98195, USA
| | - Adrian E Raftery
- Department of Statistics, Box 354320, University of Washington, Seattle, WA 98195, USA
| | - Ka Yee Yeung
- Department of Microbiology, Box 358070, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
29
|
Chu VT, Gottardo R, Raftery AE, Bumgarner RE, Yeung KY. MeV+R: using MeV as a graphical user interface for Bioconductor applications in microarray analysis. Genome Biol 2008; 9:R118. [PMID: 18652698 PMCID: PMC2530872 DOI: 10.1186/gb-2008-9-7-r118] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2008] [Revised: 06/01/2008] [Accepted: 07/24/2008] [Indexed: 11/10/2022] Open
Abstract
We present MeV+R, an integration of the JAVA MultiExperiment Viewer program with Bioconductor packages. This integration of MultiExperiment Viewer and R is easily extensible to other R packages and provides users with point and click access to traditionally command line driven tools written in R. We demonstrate the ability to use MultiExperiment Viewer as a graphical user interface for Bioconductor applications in microarray data analysis by incorporating three Bioconductor packages, RAMA, BRIDGE and iterativeBMA.
Collapse
Affiliation(s)
- Vu T Chu
- Department of Microbiology, University of Washington, Seattle, WA 98195, USA
| | - Raphael Gottardo
- Department of Statistics, University of British Columbia, Vancouver, BC, V6T 1Z2, Canada
| | - Adrian E Raftery
- Department of Statistics, University of Washington, Seattle, WA 98195, USA
| | - Roger E Bumgarner
- Department of Microbiology, University of Washington, Seattle, WA 98195, USA
| | - Ka Yee Yeung
- Department of Microbiology, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
30
|
Abstract
We consider the problem of identifying differentially expressed genes under different conditions using gene expression microarrays. Because of the many steps involved in the experimental process, from hybridization to image analysis, cDNA microarray data often contain outliers. For example, an outlying data value could occur because of scratches or dust on the surface, imperfections in the glass, or imperfections in the array production. We develop a robust Bayesian hierarchical model for testing for differential expression. Errors are modeled explicitly using a t-distribution, which accounts for outliers. The model includes an exchangeable prior for the variances, which allows different variances for the genes but still shrinks extreme empirical variances. Our model can be used for testing for differentially expressed genes among multiple samples, and it can distinguish between the different possible patterns of differential expression when there are three or more samples. Parameter estimation is carried out using a novel version of Markov chain Monte Carlo that is appropriate when the model puts mass on subspaces of the full parameter space. The method is illustrated using two publicly available gene expression data sets. We compare our method to six other baseline and commonly used techniques, namely the t-test, the Bonferroni-adjusted t-test, significance analysis of microarrays (SAM), Efron's empirical Bayes, and EBarrays in both its lognormal-normal and gamma-gamma forms. In an experiment with HIV data, our method performed better than these alternatives, on the basis of between-replicate agreement and disagreement.
Collapse
Affiliation(s)
- Raphael Gottardo
- Department of Statistics, University of Washington, Box 354322, Seattle, Washington 98195, USA.
| | | | | | | |
Collapse
|
31
|
Liu X, Sivaganesan S, Yeung KY, Guo J, Bumgarner RE, Medvedovic M. Context-specific infinite mixtures for clustering gene expression profiles across diverse microarray dataset. Bioinformatics 2006; 22:1737-44. [PMID: 16709591 PMCID: PMC1617036 DOI: 10.1093/bioinformatics/btl184] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Identifying groups of co-regulated genes by monitoring their expression over various experimental conditions is complicated by the fact that such co-regulation is condition-specific. Ignoring the context-specific nature of co-regulation significantly reduces the ability of clustering procedures to detect co-expressed genes due to additional 'noise' introduced by non-informative measurements. RESULTS We have developed a novel Bayesian hierarchical model and corresponding computational algorithms for clustering gene expression profiles across diverse experimental conditions and studies that accounts for context-specificity of gene expression patterns. The model is based on the Bayesian infinite mixtures framework and does not require a priori specification of the number of clusters. We demonstrate that explicit modeling of context-specificity results in increased accuracy of the cluster analysis by examining the specificity and sensitivity of clusters in microarray data. We also demonstrate that probabilities of co-expression derived from the posterior distribution of clusterings are valid estimates of statistical significance of created clusters. AVAILABILITY The open-source package gimm is available at http://eh3.uc.edu/gimm.
Collapse
Affiliation(s)
- X Liu
- Department of Environmental Health, University of Cincinnati, 3223 Eden Avenue ML 56, Cincinnati, OH 45267, USA
| | | | | | | | | | | |
Collapse
|
32
|
|
33
|
Abstract
MOTIVATION Inner holes, artifacts and blank spots are common in microarray images, but current image analysis methods do not pay them enough attention. We propose a new robust model-based method for processing microarray images so as to estimate foreground and background intensities. The method starts with a very simple but effective automatic gridding method, and then proceeds in two steps. The first step applies model-based clustering to the distribution of pixel intensities, using the Bayesian Information Criterion (BIC) to choose the number of groups up to a maximum of three. The second step is spatial, finding the large spatially connected components in each cluster of pixels. The method thus combines the strengths of the histogram-based and spatial approaches. It deals effectively with inner holes in spots and with artifacts. It also provides a formal inferential basis for deciding when the spot is blank, namely when the BIC favors one group over two or three. RESULTS We apply our methods for gridding and segmentation to cDNA microarray images from an HIV infection experiment. In these experiments, our method had better stability across replicates than a fixed-circle segmentation method or the seeded region growing method in the SPOT software, without introducing noticeable bias when estimating the intensities of differentially expressed genes. AVAILABILITY spotSegmentation, an R language package implementing both the gridding and segmentation methods is available through the Bioconductor project (http://www.bioconductor.org). The segmentation method requires the contributed R package MCLUST for model-based clustering (http://cran.us.r-project.org). CONTACT fraley@stat.washington.edu.
Collapse
Affiliation(s)
- Qunhua Li
- Department of Statistics, Box 354322 University of Washington, Seattle, WA 98195, USA
| | | | | | | | | |
Collapse
|
34
|
Yeung KY, Bumgarner RE, Raftery AE. Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics 2005; 21:2394-402. [PMID: 15713736 DOI: 10.1093/bioinformatics/bti319] [Citation(s) in RCA: 200] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Selecting a small number of relevant genes for accurate classification of samples is essential for the development of diagnostic tests. We present the Bayesian model averaging (BMA) method for gene selection and classification of microarray data. Typical gene selection and classification procedures ignore model uncertainty and use a single set of relevant genes (model) to predict the class. BMA accounts for the uncertainty about the best set to choose by averaging over multiple models (sets of potentially overlapping relevant genes). RESULTS We have shown that BMA selects smaller numbers of relevant genes (compared with other methods) and achieves a high prediction accuracy on three microarray datasets. Our BMA algorithm is applicable to microarray datasets with any number of classes, and outputs posterior probabilities for the selected genes and models. Our selected models typically consist of only a few genes. The combination of high accuracy, small numbers of genes and posterior probabilities for the predictions should make BMA a powerful tool for developing diagnostics from expression data. AVAILABILITY The source codes and datasets used are available from our Supplementary website.
Collapse
Affiliation(s)
- Ka Yee Yeung
- Department of Microbiology, University of Washington, Seattle, WA 98195, USA.
| | | | | |
Collapse
|
35
|
Vanasse GJ, Winn RK, Rodov S, Zieske AW, Li JT, Tupper JC, Tang J, Raines EW, Peters MA, Yeung KY, Harlan JM. Bcl-2 Overexpression Leads to Increases in Suppressor of Cytokine Signaling-3 Expression in B Cells and De novo Follicular Lymphoma. Mol Cancer Res 2004. [DOI: 10.1158/1541-7786.620.2.11] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
The t(14;18)(q32;q21), resulting in deregulated expression of B-cell-leukemia/lymphoma-2 (Bcl-2), represents the genetic hallmark in human follicular lymphomas. Substantial evidence supports the hypothesis that the t(14;18) and Bcl-2 overexpression are necessary but not solely responsible for neoplastic transformation and require cooperating genetic derangements for neoplastic transformation to occur. To investigate genes that cooperate with Bcl-2 to influence cellular signaling pathways important for neoplastic transformation, we used oligonucleotide microarrays to determine differential gene expression patterns in CD19+ B cells isolated from Eμ-Bcl-2 transgenic mice and wild-type littermate control mice. Fifty-seven genes were induced and 94 genes were repressed by ≥2-fold in Eμ-Bcl-2 transgenic mice (P < 0.05). The suppressor of cytokine signaling-3 (SOCS3) gene was found to be overexpressed 5-fold in B cells from Eμ-Bcl-2 transgenic mice. Overexpression of Bcl-2 in both mouse embryo fibroblast-1 and hematopoietic cell lines resulted in induction of SOCS3 protein, suggesting a Bcl-2-associated mechanism underlying SOCS3 induction. Immunohistochemistry with SOCS3 antisera on tissue from a cohort of patients with de novo follicular lymphoma revealed marked overexpression of SOCS3 protein that, within the follicular center cell region, was limited to neoplastic follicular lymphoma cells and colocalized with Bcl-2 expression in 9 of 12 de novo follicular lymphoma cases examined. In contrast, SOCS3 protein expression was not detected in the follicular center cell region of benign hyperplastic tonsil tissue. These data suggest that Bcl-2 overexpression leads to the induction of activated signal transducer and activator of transcription 3 (STAT3) and to the induction of SOCS3, which may contribute to the pathogenesis of follicular lymphoma.
Collapse
Affiliation(s)
| | | | | | - Arthur W. Zieske
- 2Laboratory Medicine, Yale University School of Medicine, New Haven, Connecticut and Departments of
| | | | | | | | | | - Mette A. Peters
- 6Center for Expression Arrays, University of Washington, Seattle, Washington
| | - Ka Yee Yeung
- 6Center for Expression Arrays, University of Washington, Seattle, Washington
| | | |
Collapse
|
36
|
Vanasse GJ, Winn RK, Rodov S, Zieske AW, Li JT, Tupper JC, Tang J, Raines EW, Peters MA, Yeung KY, Harlan JM. Bcl-2 overexpression leads to increases in suppressor of cytokine signaling-3 expression in B cells and de novo follicular lymphoma. Mol Cancer Res 2004; 2:620-31. [PMID: 15561778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2023]
Abstract
The t(14;18)(q32;q21), resulting in deregulated expression of B-cell-leukemia/lymphoma-2 (Bcl-2), represents the genetic hallmark in human follicular lymphomas. Substantial evidence supports the hypothesis that the t(14;18) and Bcl-2 overexpression are necessary but not solely responsible for neoplastic transformation and require cooperating genetic derangements for neoplastic transformation to occur. To investigate genes that cooperate with Bcl-2 to influence cellular signaling pathways important for neoplastic transformation, we used oligonucleotide microarrays to determine differential gene expression patterns in CD19+ B cells isolated from Emu-Bcl-2 transgenic mice and wild-type littermate control mice. Fifty-seven genes were induced and 94 genes were repressed by > or =2-fold in Emu-Bcl-2 transgenic mice (P < 0.05). The suppressor of cytokine signaling-3 (SOCS3) gene was found to be overexpressed 5-fold in B cells from Emu-Bcl-2 transgenic mice. Overexpression of Bcl-2 in both mouse embryo fibroblast-1 and hematopoietic cell lines resulted in induction of SOCS3 protein, suggesting a Bcl-2-associated mechanism underlying SOCS3 induction. Immunohistochemistry with SOCS3 antisera on tissue from a cohort of patients with de novo follicular lymphoma revealed marked overexpression of SOCS3 protein that, within the follicular center cell region, was limited to neoplastic follicular lymphoma cells and colocalized with Bcl-2 expression in 9 of 12 de novo follicular lymphoma cases examined. In contrast, SOCS3 protein expression was not detected in the follicular center cell region of benign hyperplastic tonsil tissue. These data suggest that Bcl-2 overexpression leads to the induction of activated signal transducer and activator of transcription 3 (STAT3) and to the induction of SOCS3, which may contribute to the pathogenesis of follicular lymphoma.
Collapse
Affiliation(s)
- Gary J Vanasse
- Department of Internal Medicine, Yale University School of Medicine, 333 Cedar Street, WWW-403, Box 208021, New Haven, CT 06520, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Yeung KY, Medvedovic M, Bumgarner RE. From co-expression to co-regulation: how many microarray experiments do we need? Genome Biol 2004; 5:R48. [PMID: 15239833 PMCID: PMC463312 DOI: 10.1186/gb-2004-5-7-r48] [Citation(s) in RCA: 76] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2004] [Revised: 04/19/2004] [Accepted: 05/28/2004] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Cluster analysis is often used to infer regulatory modules or biological function by associating unknown genes with other genes that have similar expression patterns and known regulatory elements or functions. However, clustering results may not have any biological relevance. RESULTS We applied various clustering algorithms to microarray datasets with different sizes, and we evaluated the clustering results by determining the fraction of gene pairs from the same clusters that share at least one known common transcription factor. We used both yeast transcription factor databases (SCPD, YPD) and chromatin immunoprecipitation (ChIP) data to evaluate our clustering results. We showed that the ability to identify co-regulated genes from clustering results is strongly dependent on the number of microarray experiments used in cluster analysis and the accuracy of these associations plateaus at between 50 and 100 experiments on yeast data. Moreover, the model-based clustering algorithm MCLUST consistently outperforms more traditional methods in accurately assigning co-regulated genes to the same clusters on standardized data. CONCLUSIONS Our results are consistent with respect to independent evaluation criteria that strengthen our confidence in our results. However, when one compares ChIP data to YPD, the false-negative rate is approximately 80% using the recommended p-value of 0.001. In addition, we showed that even with large numbers of experiments, the false-positive rate may exceed the true-positive rate. In particular, even when all experiments are included, the best results produce clusters with only a 28% true-positive rate using known gene transcription factor interactions.
Collapse
Affiliation(s)
- Ka Yee Yeung
- Department of Microbiology, University of Washington, Seattle, WA 98195, USA
| | - Mario Medvedovic
- Center for Genome Information, Department of Environmental Health, University of Cincinnati Medical Center, Cincinnati, OH 45267, USA
| | - Roger E Bumgarner
- Department of Microbiology, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
38
|
Abstract
MOTIVATION Identifying patterns of co-expression in microarray data by cluster analysis has been a productive approach to uncovering molecular mechanisms underlying biological processes under investigation. Using experimental replicates can generally improve the precision of the cluster analysis by reducing the experimental variability of measurements. In such situations, Bayesian mixtures allow for an efficient use of information by precisely modeling between-replicates variability. RESULTS We developed different variants of Bayesian mixture based clustering procedures for clustering gene expression data with experimental replicates. In this approach, the statistical distribution of microarray data is described by a Bayesian mixture model. Clusters of co-expressed genes are created from the posterior distribution of clusterings, which is estimated by a Gibbs sampler. We define infinite and finite Bayesian mixture models with different between-replicates variance structures and investigate their utility by analyzing synthetic and the real-world datasets. Results of our analyses demonstrate that (1) improvements in precision achieved by performing only two experimental replicates can be dramatic when the between-replicates variability is high, (2) precise modeling of intra-gene variability is important for accurate identification of co-expressed genes and (3) the infinite mixture model with the 'elliptical' between-replicates variance structure performed overall better than any other method tested. We also introduce a heuristic modification to the Gibbs sampler based on the 'reverse annealing' principle. This modification effectively overcomes the tendency of the Gibbs sampler to converge to different modes of the posterior distribution when started from different initial positions. Finally, we demonstrate that the Bayesian infinite mixture model with 'elliptical' variance structure is capable of identifying the underlying structure of the data without knowing the 'correct' number of clusters. AVAILABILITY The MS Windows based program named Gaussian Infinite Mixture Modeling (GIMM) implementing the Gibbs sampler and corresponding C++ code are available at http://homepages.uc.edu/~medvedm/GIMM.htm SUPPLEMENTAL INFORMATION: http://expression.microslu.washington.edu/expression/kayee/medvedovic2003/medvedovic_bioinf2003.html
Collapse
Affiliation(s)
- M Medvedovic
- Department of Environmental Health, Center for Genome Information, University of Cincinnati Medical Center, 3223 Eden Avenue ML 56, Cincinnati, OH 45267-0056, USA.
| | | | | |
Collapse
|
39
|
Yeung KY, Bumgarner RE. Multiclass classification of microarray data with repeated measurements: application to cancer. Genome Biol 2003; 4:R83. [PMID: 14659020 PMCID: PMC329422 DOI: 10.1186/gb-2003-4-12-r83] [Citation(s) in RCA: 91] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2003] [Revised: 08/14/2003] [Accepted: 10/17/2003] [Indexed: 11/21/2022] Open
Abstract
Prediction of the diagnostic category of a tissue sample from its gene-expression profile and selection of relevant genes for class prediction have important applications in cancer research. Uncorrelated shrunken centroid and error-weighted, uncorrelated shrunken centroid algorithms have been developed that are applicable to microarray data with any number of classes. Prediction of the diagnostic category of a tissue sample from its gene-expression profile and selection of relevant genes for class prediction have important applications in cancer research. We have developed the uncorrelated shrunken centroid (USC) and error-weighted, uncorrelated shrunken centroid (EWUSC) algorithms that are applicable to microarray data with any number of classes. We show that removing highly correlated genes typically improves classification results using a small set of genes.
Collapse
Affiliation(s)
- Ka Yee Yeung
- Department of Microbiology, Box 358070, University of Washington, Seattle, WA 98195, USA
| | - Roger E Bumgarner
- Department of Microbiology, Box 358070, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
40
|
Yeung KY, Medvedovic M, Bumgarner RE. Clustering gene-expression data with repeated measurements. Genome Biol 2003; 4:R34. [PMID: 12734014 PMCID: PMC156590 DOI: 10.1186/gb-2003-4-5-r34] [Citation(s) in RCA: 136] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2002] [Revised: 02/11/2003] [Accepted: 03/07/2003] [Indexed: 11/26/2022] Open
Abstract
Clustering is a common methodology for the analysis of array data, and many research laboratories are generating array data with repeated measurements. We evaluated several clustering algorithms that incorporate repeated measurements, and show that algorithms that take advantage of repeated measurements yield more accurate and more stable clusters. In particular, we show that the infinite mixture model-based approach with a built-in error model produces superior results.
Collapse
Affiliation(s)
- Ka Yee Yeung
- Department of Microbiology, University of Washington, Seattle, WA 98195, USA
| | - Mario Medvedovic
- Center for Genome Information, Department of Environmental Health, University of Cincinnati Medical Center, 3223 Eden Ave. ML 56, Cincinnati, OH 45267-0056, USA
| | - Roger E Bumgarner
- Department of Microbiology, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
41
|
Barrett MT, Yeung KY, Ruzzo WL, Hsu L, Blount PL, Sullivan R, Zarbl H, Delrow J, Rabinovitch PS, Reid BJ. Transcriptional analyses of Barrett's metaplasia and normal upper GI mucosae. Neoplasia 2002; 4:121-8. [PMID: 11896567 PMCID: PMC1550324 DOI: 10.1038/sj.neo.7900221] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2001] [Accepted: 09/14/2001] [Indexed: 12/29/2022]
Abstract
Over the last two decades, the incidence of esophageal adenocarcinoma (EA) has increased dramatically in the US and Western Europe. It has been shown that EAs evolve from premalignant Barrett's esophagus (BE) tissue by a process of clonal expansion and evolution. However, the molecular phenotype of the premalignant metaplasia, and its relationship to those of the normal upper gastrointestinal (GI) mucosae, including gastric, duodenal, and squamous epithelium of the esophagus, has not been systematically characterized. Therefore, we used oligonucleotide-based microarrays to characterize gene expression profiles in each of these tissues. The similarity of BE to each of the normal tissues was compared using a series of computational approaches. Our analyses included esophageal squamous epithelium, which is present at the same anatomic site and exposed to similar conditions as Barrett's epithelium, duodenum that shares morphologic similarity to Barrett's epithelium, and adjacent gastric epithelium. There was a clear distinction among the expression profiles of gastric, duodenal, and squamous epithelium whereas the BE profiles showed considerable overlap with normal tissues. Furthermore, we identified clusters of genes that are specific to each of the tissues, to the Barrett's metaplastic epithelia, and a cluster of genes that was distinct between squamous and non-squamous epithelia.
Collapse
Affiliation(s)
- Michael T Barrett
- Division of Human Biology, Fred Hutchinson Cancer Research Center, Seattle WA 98109, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Abstract
BACKGROUND At least 1 million people worldwide have retinitis pigmentosa (RP), making it relatively common among the inherited forms of blindness. Mutations in many genes may cause RP. The most common known mutation, Pro347Leu in rhodopsin, is found in no more than about 1% of unrelated patients, implying the impracticality of a diagnostic test which would screen only for a few, common mutation sites. CONCLUSIONS Ongoing discovery and study of RP genes makes it feasible to consider a molecular diagnostic test which would screen coding regions of all known RP genes by a mutation detection method such as conformation-sensitive gel electrophoresis followed by sequencing. The parallel development of RP genetic knowledge and treatments such as gene therapy will make such tests both possible and necessary.
Collapse
Affiliation(s)
- K Y Yeung
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong, China
| | | | | | | | | | | |
Collapse
|
43
|
Abstract
MOTIVATION Clustering is a useful exploratory technique for the analysis of gene expression data. Many different heuristic clustering algorithms have been proposed in this context. Clustering algorithms based on probability models offer a principled alternative to heuristic algorithms. In particular, model-based clustering assumes that the data is generated by a finite mixture of underlying probability distributions such as multivariate normal distributions. The issues of selecting a 'good' clustering method and determining the 'correct' number of clusters are reduced to model selection problems in the probability framework. Gaussian mixture models have been shown to be a powerful tool for clustering in many applications. RESULTS We benchmarked the performance of model-based clustering on several synthetic and real gene expression data sets for which external evaluation criteria were available. The model-based approach has superior performance on our synthetic data sets, consistently selecting the correct model and the number of clusters. On real expression data, the model-based approach produced clusters of quality comparable to a leading heuristic clustering algorithm, but with the key advantage of suggesting the number of clusters and an appropriate model. We also explored the validity of the Gaussian mixture assumption on different transformations of real data. We also assessed the degree to which these real gene expression data sets fit multivariate Gaussian distributions both before and after subjecting them to commonly used data transformations. Suitably chosen transformations seem to result in reasonable fits. AVAILABILITY MCLUST is available at http://www.stat.washington.edu/fraley/mclust. The software for the diagonal model is under development. CONTACT kayee@cs.washington.edu. SUPPLEMENTARY INFORMATION http://www.cs.washington.edu/homes/kayee/model.
Collapse
Affiliation(s)
- K Y Yeung
- Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195, USA.
| | | | | | | | | |
Collapse
|
44
|
Abstract
AIM To determine the pattern of rhodopsin mutations in Chinese retinitis pigmentosa (RP) patients. METHODS The rhodopsin gene was examined in 101 RP patients and 190 controls from Hong Kong. RESULTS Three coding changes were identified: Pro347Leu, Ala299Ser, and 5211delC. Each protein sequence alteration was found in one patient. Ala299Ser also existed in two controls. CONCLUSION The C-terminal nonsense mutation may cause mis-sorting of rhodopsin protein. The finding of controls with Ala299Ser suggests this is only the third missense alteration reported that does not cause RP. The expected frequency of rhodopsin mutations in RP is <7% (2/101=2.0%, 95% confidence interval: 0.2%-7.0%).
Collapse
Affiliation(s)
- W M Chan
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong, China
| | | | | | | | | | | | | |
Collapse
|
45
|
Abstract
MOTIVATION There is a great need to develop analytical methodology to analyze and to exploit the information contained in gene expression data. Because of the large number of genes and the complexity of biological networks, clustering is a useful exploratory technique for analysis of gene expression data. Other classical techniques, such as principal component analysis (PCA), have also been applied to analyze gene expression data. Using different data analysis techniques and different clustering algorithms to analyze the same data set can lead to very different conclusions. Our goal is to study the effectiveness of principal components (PCs) in capturing cluster structure. Specifically, using both real and synthetic gene expression data sets, we compared the quality of clusters obtained from the original data to the quality of clusters obtained after projecting onto subsets of the principal component axes. RESULTS Our empirical study showed that clustering with the PCs instead of the original variables does not necessarily improve, and often degrades, cluster quality. In particular, the first few PCs (which contain most of the variation in the data) do not necessarily capture most of the cluster structure. We also showed that clustering with PCs has different impact on different algorithms and different similarity metrics. Overall, we would not recommend PCA before clustering except in special circumstances.
Collapse
Affiliation(s)
- K Y Yeung
- Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195, USA.
| | | |
Collapse
|
46
|
Baum L, Chan WM, Yeung KY, Lam DS, Kwok AK, Pang CP. RP1 in Chinese: Eight novel variants and evidence that truncation of the extreme C-terminal does not cause retinitis pigmentosa. Hum Mutat 2001; 17:436. [PMID: 11317367 DOI: 10.1002/humu.1127] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Heterozygous truncating mutations in the RP1 gene cause approximately 7% of autosomal dominant retinitis pigmentosa (RP) cases. To examine the role of RP1 mutations in RP, we screened 101 unrelated Chinese RP patients (unselected for mode of inheritance) and 190 elderly normal control subjects for sequence changes in the coding exons for the 2156 amino acid RP1 protein. One patient had a mutation, thus RP1 mutations cause about 0.0% to 5.4% (95% confidence interval) of all RP among Chinese. The mutation was R677X, the most common found in Americans. Five other known sequence changes were found. In addition, nine novel sequence alterations were identified: 746G>A (R249H), 1437G>T (M479I), 2116G>C (G706R), 3024G>A (Q1008Q), 3188G>A (Q1063R), 5797C>T (R1933X), 6423A>G (I2141M), and the variants 6542C>T and 6676T>A, both in the 3' untranslated region. One control subject and three members of a non-RP family were heterozygous for R1933X, which is therefore likely to be a non-disease-causing variant. The most C-terminal truncation previously reported was due to Tyr1053 (1-bp del) and occurred in RP patients. Thus the presence of a normal level of at least part of RP1 between amino acids 1052 and 1933 appears necessary to prevent RP. Hum Mutat 17:436, 2001.
Collapse
Affiliation(s)
- L Baum
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong
| | | | | | | | | | | |
Collapse
|
47
|
Yeung KY, Barrett M, Delrow J, Blount P, Reid B, Rabinovitch P. Transcriptional analysis of Barrett's epithelium and normal gastrointestinal tissues. Nat Genet 2001. [DOI: 10.1038/87376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
48
|
Abstract
MOTIVATION Many clustering algorithms have been proposed for the analysis of gene expression data, but little guidance is available to help choose among them. We provide a systematic framework for assessing the results of clustering algorithms. Clustering algorithms attempt to partition the genes into groups exhibiting similar patterns of variation in expression level. Our methodology is to apply a clustering algorithm to the data from all but one experimental condition. The remaining condition is used to assess the predictive power of the resulting clusters-meaningful clusters should exhibit less variation in the remaining condition than clusters formed by chance. RESULTS We successfully applied our methodology to compare six clustering algorithms on four gene expression data sets. We found our quantitative measures of cluster quality to be positively correlated with external standards of cluster quality.
Collapse
Affiliation(s)
- K Y Yeung
- Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195, USA
| | | | | |
Collapse
|
49
|
Ahlgren JD, Ellison NM, Gottlieb RJ, Laluna F, Lokich JJ, Sinclair PR, Ueno W, Wampler GL, Yeung KY, Alt D. Hormonal palliation of chemoresistant ovarian cancer: three consecutive phase II trials of the Mid-Atlantic Oncology Program. J Clin Oncol 1993; 11:1957-68. [PMID: 7691999 DOI: 10.1200/jco.1993.11.10.1957] [Citation(s) in RCA: 82] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
PURPOSE To evaluate the efficacy of three hormonal manipulations in the palliation of chemoresistant ovarian cancer, and to analyze the results in the light of other clinical trials. PATIENTS AND METHODS Three sequential phase II trials were performed in patients with refractory epithelial ovarian carcinoma, using high-dose megestrol acetate (800 mg/d for 30 days, then 400 mg/d), high-dose tamoxifen (80 mg/d for 30 days, then 40 mg/d), and aminoglutethimide (1 g/d plus tapering doses of hydrocortisone). Results were compared with those described in the world literature from trials of the same or similar agents. RESULTS No responses were seen among 30 assessable patients treated with megestrol acetate, and most (but not all) similar trials have reported low response rates. Five responses (17%) were seen among 29 patients treated with tamoxifen. Two responses exceeded 5 years in duration. No responses were seen among 15 patients treated with aminoglutethimide. CONCLUSION Antiestrogen therapy may offer the possibility of useful and, occasionally, long-term palliation of refractory epithelial ovarian carcinoma, with little toxicity. There may be a trend toward a dose-response effect, which represents a suitable topic for a future prospective trial.
Collapse
Affiliation(s)
- J D Ahlgren
- Division of Hematology/Oncology, George Washington University Medical Center, Washington, DC 20037
| | | | | | | | | | | | | | | | | | | |
Collapse
|
50
|
Abstract
The standard membrane filtration method of the UK has been modified in order to improve its specificity for enumerating Escherichia coli in the subtropical waters of Hong Kong. This involves incorporating into the membrane lauryl sulphate (mLS) method either an in situ urease test (the mLS-UA method), or an in situ beta-glucuronidase test (the mLS-GUD method). The false-positive errors of the mLS-UA and mLS-GUD methods are low, ranging from 3-5%. A comparison between the membrane filtration (mLS-UA) method and the multiple tube technique in testing E. coli in subtropical beach-waters has demonstrated that the former can give much more precise counts, and is the method of choice for such a purpose. The mLS-GUD method, for which automated counting of E. coli colonies is possible, is a good alternative to mLS-UA in routine enumeration of this bacterial indicator in environmental waters.
Collapse
Affiliation(s)
- W H Cheung
- Environmental Protection Department, Southorn Centre, Wanchai, Hong Kong
| | | | | | | |
Collapse
|