1
|
Lee AY, Ewing AD, Ellrott K, Hu Y, Houlahan KE, Bare JC, Espiritu SMG, Huang V, Dang K, Chong Z, Caloian C, Yamaguchi TN, Kellen MR, Chen K, Norman TC, Friend SH, Guinney J, Stolovitzky G, Haussler D, Margolin AA, Stuart JM, Boutros PC. Combining accurate tumor genome simulation with crowdsourcing to benchmark somatic structural variant detection. Genome Biol 2018; 19:188. [PMID: 30400818 PMCID: PMC6219177 DOI: 10.1186/s13059-018-1539-5] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2018] [Accepted: 09/12/2018] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND The phenotypes of cancer cells are driven in part by somatic structural variants. Structural variants can initiate tumors, enhance their aggressiveness, and provide unique therapeutic opportunities. Whole-genome sequencing of tumors can allow exhaustive identification of the specific structural variants present in an individual cancer, facilitating both clinical diagnostics and the discovery of novel mutagenic mechanisms. A plethora of somatic structural variant detection algorithms have been created to enable these discoveries; however, there are no systematic benchmarks of them. Rigorous performance evaluation of somatic structural variant detection methods has been challenged by the lack of gold standards, extensive resource requirements, and difficulties arising from the need to share personal genomic information. RESULTS To facilitate structural variant detection algorithm evaluations, we create a robust simulation framework for somatic structural variants by extending the BAMSurgeon algorithm. We then organize and enable a crowdsourced benchmarking within the ICGC-TCGA DREAM Somatic Mutation Calling Challenge (SMC-DNA). We report here the results of structural variant benchmarking on three different tumors, comprising 204 submissions from 15 teams. In addition to ranking methods, we identify characteristic error profiles of individual algorithms and general trends across them. Surprisingly, we find that ensembles of analysis pipelines do not always outperform the best individual method, indicating a need for new ways to aggregate somatic structural variant detection approaches. CONCLUSIONS The synthetic tumors and somatic structural variant detection leaderboards remain available as a community benchmarking resource, and BAMSurgeon is available at https://github.com/adamewing/bamsurgeon .
Collapse
Affiliation(s)
- Anna Y Lee
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Adam D Ewing
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA.,Mater Research Institute, University of Queensland, Woolloongabba, QLD, Australia
| | - Kyle Ellrott
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA.,Computational Biology Program, Oregon Health & Science University, Portland, OR, USA
| | - Yin Hu
- Sage Bionetworks, Seattle, WA, USA
| | | | | | | | - Vincent Huang
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | | | - Zechen Chong
- Department of Bioinformatics and Computational Biology, University of Texas MD Anderson Cancer Center, Houston, TX, USA.,Department of Genetics, University of Alabama at Birmingham, Birmingham, AL, USA.,Informatics Institute, University of Alabama at Birmingham, Birmingham, AL, USA
| | | | | | | | | | - Ken Chen
- Department of Bioinformatics and Computational Biology, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | | | | | | | - Gustavo Stolovitzky
- IBM Computational Biology Center, T.J.Watson Research Center, Yorktown Heights, NY, USA
| | - David Haussler
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Adam A Margolin
- Computational Biology Program, Oregon Health & Science University, Portland, OR, USA. .,Sage Bionetworks, Seattle, WA, USA.
| | - Joshua M Stuart
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA.
| | - Paul C Boutros
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada. .,Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada. .,Department of Pharmacology and Toxicology, University of Toronto, Toronto, Ontario, Canada.
| |
Collapse
|
2
|
Sendorek DH, Caloian C, Ellrott K, Bare JC, Yamaguchi TN, Ewing AD, Houlahan KE, Norman TC, Margolin AA, Stuart JM, Boutros PC. Germline contamination and leakage in whole genome somatic single nucleotide variant detection. BMC Bioinformatics 2018; 19:28. [PMID: 29385983 PMCID: PMC5793408 DOI: 10.1186/s12859-018-2046-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2017] [Accepted: 01/24/2018] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND The clinical sequencing of cancer genomes to personalize therapy is becoming routine across the world. However, concerns over patient re-identification from these data lead to questions about how tightly access should be controlled. It is not thought to be possible to re-identify patients from somatic variant data. However, somatic variant detection pipelines can mistakenly identify germline variants as somatic ones, a process called "germline leakage". The rate of germline leakage across different somatic variant detection pipelines is not well-understood, and it is uncertain whether or not somatic variant calls should be considered re-identifiable. To fill this gap, we quantified germline leakage across 259 sets of whole-genome somatic single nucleotide variant (SNVs) predictions made by 21 teams as part of the ICGC-TCGA DREAM Somatic Mutation Calling Challenge. RESULTS The median somatic SNV prediction set contained 4325 somatic SNVs and leaked one germline polymorphism. The level of germline leakage was inversely correlated with somatic SNV prediction accuracy and positively correlated with the amount of infiltrating normal cells. The specific germline variants leaked differed by tumour and algorithm. To aid in quantitation and correction of leakage, we created a tool, called GermlineFilter, for use in public-facing somatic SNV databases. CONCLUSIONS The potential for patient re-identification from leaked germline variants in somatic SNV predictions has led to divergent open data access policies, based on different assessments of the risks. Indeed, a single, well-publicized re-identification event could reshape public perceptions of the values of genomic data sharing. We find that modern somatic SNV prediction pipelines have low germline-leakage rates, which can be further reduced, especially for cloud-sharing, using pre-filtering software.
Collapse
Affiliation(s)
- Dorota H. Sendorek
- Informatics & Biocomputing Program, Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario M5G 0A3 Canada
| | - Cristian Caloian
- Informatics & Biocomputing Program, Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario M5G 0A3 Canada
| | - Kyle Ellrott
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA USA
- Computational Biology Program, Oregon Health & Science University, Portland, OR USA
| | | | - Takafumi N. Yamaguchi
- Informatics & Biocomputing Program, Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario M5G 0A3 Canada
| | - Adam D. Ewing
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA USA
- Mater Research Institute, University of Queensland, Woolloongabba, Queensland Australia
| | - Kathleen E. Houlahan
- Informatics & Biocomputing Program, Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario M5G 0A3 Canada
| | | | - Adam A. Margolin
- Sage Bionetworks, Seattle, WA USA
- Computational Biology Program, Oregon Health & Science University, Portland, OR USA
- Department of Biomedical Engineering, Oregon Health & Science University, Portland, OR USA
| | - Joshua M. Stuart
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA USA
| | - Paul C. Boutros
- Informatics & Biocomputing Program, Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario M5G 0A3 Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario Canada
- Department of Pharmacology & Toxicology, University of Toronto, Toronto, Ontario Canada
| |
Collapse
|
3
|
Zhou FL, Guinney J, Wang T, Bare JC, Norman TC, Bot B, Shen L, Winner KK, Friend SH, Abdallah K, Stolovitzky GA, Xie Y, Costello J. Use of crowdsourced research to develop a prognostic model for first-line metastatic castrate resistant prostate cancer (mCRPC). J Clin Oncol 2016. [DOI: 10.1200/jco.2016.34.2_suppl.180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
180 Background: Project Data Sphere, LLC (PDS) and Sage Bionetworks/DREAM have completed the “Prostate Cancer DREAM Challenge” (Challenge), a crowdsourced competition, using historical prostate cancer clinical trial data from PDS. The Challenge aimed to improve prognostic models for overall survival (OS) and to explore predictive models for treatment toxicity in mCRPC patients. Methods: Control arms of 4 randomized phase III trials (total 2,070 patients) were used as training and validation data sets for the Challenge: ASCENT2, MAINSAIL, VENICE and ENTHUSE33. All subjects were first line mCRPC patients receiving docetaxel treatment. Curated baseline clinical covariates (demographics, comorbidity, prior treatment, laboratory, lesion and vital signs) were modeled along with raw clinical data tables. The primary purpose of the Challenge was to develop a prognostic model for OS (SubChallenge 1). The models were scored using concordance index and integrated area under receiver operator curve (iAUC) from 6-30 months. The published mCRPC OS model of Halabi, et al., JCO, 2014, was used as the benchmark. Results: The Challenge attracted over 160 active participants who formed 50 teams that submitted final models for SubChallenge 1. Median iAUC was 0.76 (0.67-0.78) with a maximum score of 0.792. Over half (n = 35) of these models exceeded the published benchmark (0.743 iAUC). Teams explored new methodologies such as model-based imputation and machine learning techniques to develop the best performing models. Many leveraged raw clinical data sets to create their own covariates and expanded beyond existing prognostic models. Conclusions: The Challenge externally validated Halabi’s first line prognostic model. New prognostic models were proposed and validated with significant improvements over the benchmark. Further analyses are needed to examine the winning models for new prognostic factors and to validate them using additional trial data from PDS. The Challenge drove interest from cross-disciplinary teams of global experts to explore and enhance their technical abilities using real clinical data whilst serving as a vehicle to accelerate medical innovation.
Collapse
Affiliation(s)
- Fang Liz Zhou
- Sanofi US, North America Medical Affairs, Bridgewater, NJ
| | | | - Tao Wang
- The University of Texas Southwestern Medical Center, Dallas, TX
| | | | | | | | | | - Kimberly Kanigel Winner
- Department of Pharmacology, Computational Biosciences Program, University of Colorado Anschutz Medical Campus, Aurora, CO
| | | | | | | | - Yang Xie
- The University of Texas Southwestern Medical Center, Dallas, TX
| | - James Costello
- University of Colorado Anschutz Medical Campus, Aurora, CO
| |
Collapse
|
4
|
Eduati F, Mangravite LM, Wang T, Tang H, Bare JC, Huang R, Norman T, Kellen M, Menden MP, Yang J, Zhan X, Zhong R, Xiao G, Xia M, Abdo N, Kosyk O, Friend S, Dearry A, Simeonov A, Tice RR, Rusyn I, Wright FA, Stolovitzky G, Xie Y, Saez-Rodriguez J. Erratum: Prediction of human population responses to toxic compounds by a collaborative competition. Nat Biotechnol 2015; 33:1109. [PMID: 26448092 PMCID: PMC7608305 DOI: 10.1038/nbt1015-1109a] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
5
|
Boutros PC, Ewing AD, Ellrott K, Norman TC, Dang KK, Hu Y, Kellen MR, Suver C, Bare JC, Stein LD, Spellman PT, Stolovitzky G, Friend SH, Margolin AA, Stuart JM. Global optimization of somatic variant identification in cancer genomes with a global community challenge. Nat Genet 2014; 46:318-319. [PMID: 24675517 DOI: 10.1038/ng.2932] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Paul C Boutros
- Informatics and Biocomputing Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.,Department of Pharmacology and Toxicology, University of Toronto, Toronto, Ontario, Canada
| | - Adam D Ewing
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, California, USA
| | - Kyle Ellrott
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, California, USA
| | | | | | - Yin Hu
- Sage Bionetworks, Seattle, Washington, USA
| | | | | | | | - Lincoln D Stein
- Informatics and Biocomputing Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Paul T Spellman
- Department of Molecular and Medical Genetics, Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon, USA
| | - Gustavo Stolovitzky
- IBM Computational Biology Center, T.J. Watson Research Center, Yorktown Heights, New York, USA
| | | | | | - Joshua M Stuart
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, California, USA
| |
Collapse
|
6
|
Turkarslan S, Wurtmann EJ, Wu WJ, Jiang N, Bare JC, Foley K, Reiss DJ, Novichkov P, Baliga NS. Network portal: a database for storage, analysis and visualization of biological networks. Nucleic Acids Res 2013; 42:D184-90. [PMID: 24271392 PMCID: PMC3964938 DOI: 10.1093/nar/gkt1190] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
The ease of generating high-throughput data has enabled investigations into organismal complexity at the systems level through the inference of networks of interactions among the various cellular components (genes, RNAs, proteins and metabolites). The wider scientific community, however, currently has limited access to tools for network inference, visualization and analysis because these tasks often require advanced computational knowledge and expensive computing resources. We have designed the network portal (http://networks.systemsbiology.net) to serve as a modular database for the integration of user uploaded and public data, with inference algorithms and tools for the storage, visualization and analysis of biological networks. The portal is fully integrated into the Gaggle framework to seamlessly exchange data with desktop and web applications and to allow the user to create, save and modify workspaces, and it includes social networking capabilities for collaborative projects. While the current release of the database contains networks for 13 prokaryotic organisms from diverse phylogenetic clades (4678 co-regulated gene modules, 3466 regulators and 9291 cis-regulatory motifs), it will be rapidly populated with prokaryotic and eukaryotic organisms as relevant data become available in public repositories and through user input. The modular architecture, simple data formats and open API support community development of the portal.
Collapse
Affiliation(s)
- Serdar Turkarslan
- Institute for Systems Biology, Seattle, WA 98109, USA and Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Turkarslan S, Reiss DJ, Gibbins G, Su WL, Pan M, Bare JC, Plaisier CL, Baliga NS. Niche adaptation by expansion and reprogramming of general transcription factors. Mol Syst Biol 2011; 7:554. [PMID: 22108796 PMCID: PMC3261711 DOI: 10.1038/msb.2011.87] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2011] [Accepted: 10/25/2011] [Indexed: 02/01/2023] Open
Abstract
Numerous lineage-specific expansions of the transcription factor B (TFB) family in archaea suggests an important role for expanded TFBs in encoding environment-specific gene regulatory programs. Given the characteristics of hypersaline lakes, the unusually large numbers of TFBs in halophilic archaea further suggests that they might be especially important in rapid adaptation to the challenges of a dynamically changing environment. Motivated by these observations, we have investigated the implications of TFB expansions by correlating sequence variations, regulation, and physical interactions of all seven TFBs in Halobacterium salinarum NRC-1 to their fitness landscapes, functional hierarchies, and genetic interactions across 2488 experiments covering combinatorial variations in salt, pH, temperature, and Cu stress. This systems analysis has revealed an elegant scheme in which completely novel fitness landscapes are generated by gene conversion events that introduce subtle changes to the regulation or physical interactions of duplicated TFBs. Based on these insights, we have introduced a synthetically redesigned TFB and altered the regulation of existing TFBs to illustrate how archaea can rapidly generate novel phenotypes by simply reprogramming their TFB regulatory network.
Collapse
Affiliation(s)
| | - David J Reiss
- Baliga Lab, Institute for Systems Biology, Seattle, WA, USA
| | | | - Wan Lin Su
- Baliga Lab, Institute for Systems Biology, Seattle, WA, USA
| | - Min Pan
- Baliga Lab, Institute for Systems Biology, Seattle, WA, USA
| | | | | | - Nitin S Baliga
- Baliga Lab, Institute for Systems Biology, Seattle, WA, USA
- Department of Microbiology, University of Washington, Seattle, WA, USA
- Department of Biology, Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
| |
Collapse
|
8
|
Yoon SH, Reiss DJ, Bare JC, Tenenbaum D, Pan M, Slagel J, Moritz RL, Lim S, Hackett M, Menon AL, Adams MWW, Barnebey A, Yannone SM, Leigh JA, Baliga NS. Parallel evolution of transcriptome architecture during genome reorganization. Genome Res 2011; 21:1892-904. [PMID: 21750103 DOI: 10.1101/gr.122218.111] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Assembly of genes into operons is generally viewed as an important process during the continual adaptation of microbes to changing environmental challenges. However, the genome reorganization events that drive this process are also the roots of instability for existing operons. We have determined that there exists a statistically significant trend that correlates the proportion of genes encoded in operons in archaea to their phylogenetic lineage. We have further characterized how microbes deal with operon instability by mapping and comparing transcriptome architectures of four phylogenetically diverse extremophiles that span the range of operon stabilities observed across archaeal lineages: a photoheterotrophic halophile (Halobacterium salinarum NRC-1), a hydrogenotrophic methanogen (Methanococcus maripaludis S2), an acidophilic and aerobic thermophile (Sulfolobus solfataricus P2), and an anaerobic hyperthermophile (Pyrococcus furiosus DSM 3638). We demonstrate how the evolution of transcriptional elements (promoters and terminators) generates new operons, restores the coordinated regulation of translocated, inverted, and newly acquired genes, and introduces completely novel regulation for even some of the most conserved operonic genes such as those encoding subunits of the ribosome. The inverse correlation (r=-0.92) between the proportion of operons with such internally located transcriptional elements and the fraction of conserved operons in each of the four archaea reveals an unprecedented view into varying stages of operon evolution. Importantly, our integrated analysis has revealed that organisms adapted to higher growth temperatures have lower tolerance for genome reorganization events that disrupt operon structures.
Collapse
Affiliation(s)
- Sung Ho Yoon
- Institute for Systems Biology, Seattle, Washington 98109, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Plaisier CL, Bare JC, Baliga NS. miRvestigator: web application to identify miRNAs responsible for co-regulated gene expression patterns discovered through transcriptome profiling. Nucleic Acids Res 2011; 39:W125-31. [PMID: 21602264 PMCID: PMC3125776 DOI: 10.1093/nar/gkr374] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Transcriptome profiling studies have produced staggering numbers of gene co-expression signatures for a variety of biological systems. A significant fraction of these signatures will be partially or fully explained by miRNA-mediated targeted transcript degradation. miRvestigator takes as input lists of co-expressed genes from Caenorhabditis elegans, Drosophila melanogaster, G. gallus, Homo sapiens, Mus musculus or Rattus norvegicus and identifies the specific miRNAs that are likely to bind to 3′ un-translated region (UTR) sequences to mediate the observed co-regulation. The novelty of our approach is the miRvestigator hidden Markov model (HMM) algorithm which systematically computes a similarity P-value for each unique miRNA seed sequence from the miRNA database miRBase to an overrepresented sequence motif identified within the 3′-UTR of the query genes. We have made this miRNA discovery tool accessible to the community by integrating our HMM algorithm with a proven algorithm for de novo discovery of miRNA seed sequences and wrapping these algorithms into a user-friendly interface. Additionally, the miRvestigator web server also produces a list of putative miRNA binding sites within 3′-UTRs of the query transcripts to facilitate the design of validation experiments. The miRvestigator is freely available at http://mirvestigator.systemsbiology.net.
Collapse
|
10
|
Tenenbaum D, Bare JC, Baliga NS. GTC: A web server for integrating systems biology data with web tools and desktop applications. Source Code Biol Med 2010; 5:7. [PMID: 20626906 PMCID: PMC2917411 DOI: 10.1186/1751-0473-5-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/14/2010] [Accepted: 07/13/2010] [Indexed: 11/10/2022]
Abstract
Abstract
Gaggle Tool Creator (GTC) is a web application which provides access to public annotation, interaction, orthology, and genomic data for hundreds of organisms, and enables instant analysis of the data using many popular web-based and desktop applications.
Collapse
|
11
|
Bare JC, Koide T, Reiss DJ, Tenenbaum D, Baliga NS. Integration and visualization of systems biology data in context of the genome. BMC Bioinformatics 2010; 11:382. [PMID: 20642854 PMCID: PMC2912892 DOI: 10.1186/1471-2105-11-382] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2010] [Accepted: 07/19/2010] [Indexed: 01/05/2023] Open
Abstract
Background High-density tiling arrays and new sequencing technologies are generating rapidly increasing volumes of transcriptome and protein-DNA interaction data. Visualization and exploration of this data is critical to understanding the regulatory logic encoded in the genome by which the cell dynamically affects its physiology and interacts with its environment. Results The Gaggle Genome Browser is a cross-platform desktop program for interactively visualizing high-throughput data in the context of the genome. Important features include dynamic panning and zooming, keyword search and open interoperability through the Gaggle framework. Users may bookmark locations on the genome with descriptive annotations and share these bookmarks with other users. The program handles large sets of user-generated data using an in-process database and leverages the facilities of SQL and the R environment for importing and manipulating data. A key aspect of the Gaggle Genome Browser is interoperability. By connecting to the Gaggle framework, the genome browser joins a suite of interconnected bioinformatics tools for analysis and visualization with connectivity to major public repositories of sequences, interactions and pathways. To this flexible environment for exploring and combining data, the Gaggle Genome Browser adds the ability to visualize diverse types of data in relation to its coordinates on the genome. Conclusions Genomic coordinates function as a common key by which disparate biological data types can be related to one another. In the Gaggle Genome Browser, heterogeneous data are joined by their location on the genome to create information-rich visualizations yielding insight into genome organization, transcription and its regulation and, ultimately, a better understanding of the mechanisms that enable the cell to dynamically respond to its environment.
Collapse
Affiliation(s)
- J Christopher Bare
- Institute for Systems Biology, 1441 N 34th Street, Seattle, WA 98103, USA
| | | | | | | | | |
Collapse
|
12
|
Bare JC, Shannon PT, Schmid AK, Baliga NS. The Firegoose: two-way integration of diverse data from different bioinformatics web resources with desktop applications. BMC Bioinformatics 2007; 8:456. [PMID: 18021453 PMCID: PMC2211326 DOI: 10.1186/1471-2105-8-456] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2007] [Accepted: 11/19/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Information resources on the World Wide Web play an indispensable role in modern biology. But integrating data from multiple sources is often encumbered by the need to reformat data files, convert between naming systems, or perform ongoing maintenance of local copies of public databases. Opportunities for new ways of combining and re-using data are arising as a result of the increasing use of web protocols to transmit structured data. RESULTS The Firegoose, an extension to the Mozilla Firefox web browser, enables data transfer between web sites and desktop tools. As a component of the Gaggle integration framework, Firegoose can also exchange data with Cytoscape, the R statistical package, Multiexperiment Viewer (MeV), and several other popular desktop software tools. Firegoose adds the capability to easily use local data to query KEGG, EMBL STRING, DAVID, and other widely-used bioinformatics web sites. Query results from these web sites can be transferred to desktop tools for further analysis with a few clicks. Firegoose acquires data from the web by screen scraping, microformats, embedded XML, or web services. We define a microformat, which allows structured information compatible with the Gaggle to be embedded in HTML documents. We demonstrate the capabilities of this software by performing an analysis of the genes activated in the microbe Halobacterium salinarum NRC-1 in response to anaerobic environments. Starting with microarray data, we explore functions of differentially expressed genes by combining data from several public web resources and construct an integrated view of the cellular processes involved. CONCLUSION The Firegoose incorporates Mozilla Firefox into the Gaggle environment and enables interactive sharing of data between diverse web resources and desktop software tools without maintaining local copies. Additional web sites can be incorporated easily into the framework using the scripting platform of the Firefox browser. Performing data integration in the browser allows the excellent search and navigation capabilities of the browser to be used in combination with powerful desktop tools.
Collapse
Affiliation(s)
- J Christopher Bare
- Institute for Systems Biology, 1441 N 34th Street, Seattle, WA 98103, USA.
| | | | | | | |
Collapse
|
13
|
Pennington DW, Bare JC. Comparison of chemical screening and ranking approaches: the waste minimization prioritization tool versus toxic equivalency potentials. Risk Anal 2001; 21:897-912. [PMID: 11798125 DOI: 10.1111/0272-4332.215160] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
Chemical screening in the United States is often conducted using scoring and ranking methodologies. Linked models accounting for chemical fate, exposure, and toxicological effects are generally preferred in Europe and in product Life Cycle Assessment. For the first time, a comparison is presented in this article of two of the prominent, but structurally different methodologies adopted to help screen and rank chemicals and chemical emissions data. Results for 250 chemicals are presented, with a focus on 12 chemicals of interest in the United Nations Environment Programme's Persistent Organic Pollutants global treaty negotiations. These results help to illustrate the significance of described structural differences and to assess the correlation between the methodologies. The scope of the comparison was restricted here to human health, although the insights would be equally useful in the context of the health of ecosystems. Illustrating the current types of chemical screening and emissions comparison approaches, the relative significance of the scenario and structural differences of the Waste Minimization Prioritization Tool (WMPT) and the Toxic Equivalency Potential (TEP) methodologies are analyzed. The WMPT facilitates comparison in terms of key physical-chemical properties. Measures for Persistence, Bioaccumulation, and Toxicity (PBT) are calculated. Each PBT measure is scored and then these scores are added to provide a single measure of relative concern. TEPs account for chemical fate, multipathway exposure, and toxicity using a model-based approach. This model structure is sometimes considered to provide a less subjective representation of environmental mechanisms, and, hence, an improved basis for screening. Nevertheless, a strong relationship exists between the two approaches and both have their limitations.
Collapse
Affiliation(s)
- D W Pennington
- Life Cycle Group for Sustainable Development, Ecole Polytechnique Fédérale de Lausanne, Switzerland
| | | |
Collapse
|
14
|
Bare JC, Cicmanec J, Cabezas H. Developments in the application of impact assessment methodologies for pollution prevention. Drug Chem Toxicol 1997; 20:411-7. [PMID: 9433668 DOI: 10.3109/01480549709003897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Pollution prevention requires the assessment of various multimedia environmental impacts to ensure that the alternative selected most closely represents the environmental goals and priorities of the facility. While some facility's environmental policies are easy to assess (e.g., reduce TRI emissions), others require a more sophisticated assessment methodology (e.g., select the most environmentally-friendly manufacturing process). Chemical environmental impact assessment for pollution prevention can be very effective, but can only provide a scientifically defensible decision point if the methodologies, analytical tools, and data quality are consistent with the environmental goals and priorities expressed. In many assessments completed in the past, the assessment methodologies have been overly simplistic when compared to the environmental goals projected. More sophisticated chemical impact assessment methodologies have not been used in the past for a variety of reasons, including: poor study design, poor data quality, inadequate funding, inadequate computer systems and databases, and practitioner's lack of understanding. This paper will describe a broad range of assessment methodologies, the steps involved in various evaluations, various resources available to conduct pollution prevention impact assessments, and on-going methodology development.
Collapse
Affiliation(s)
- J C Bare
- National Risk Management Research Laboratory, U.S. Environmental Protection Agency, Cincinnati, OH 45268, USA
| | | | | |
Collapse
|
15
|
|