1
|
Tanoli Z, Seemab U, Scherer A, Wennerberg K, Tang J, Vähä-Koskela M. Exploration of databases and methods supporting drug repurposing: a comprehensive survey. Brief Bioinform 2021; 22:1656-1678. [PMID: 32055842 PMCID: PMC7986597 DOI: 10.1093/bib/bbaa003] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 12/09/2019] [Indexed: 02/07/2023] Open
Abstract
Drug development involves a deep understanding of the mechanisms of action and possible side effects of each drug, and sometimes results in the identification of new and unexpected uses for drugs, termed as drug repurposing. Both in case of serendipitous observations and systematic mechanistic explorations, confirmation of new indications for a drug requires hypothesis building around relevant drug-related data, such as molecular targets involved, and patient and cellular responses. These datasets are available in public repositories, but apart from sifting through the sheer amount of data imposing computational bottleneck, a major challenge is the difficulty in selecting which databases to use from an increasingly large number of available databases. The database selection is made harder by the lack of an overview of the types of data offered in each database. In order to alleviate these problems and to guide the end user through the drug repurposing efforts, we provide here a survey of 102 of the most promising and drug-relevant databases reported to date. We summarize the target coverage and types of data available in each database and provide several examples of how multi-database exploration can facilitate drug repurposing.
Collapse
Affiliation(s)
- Ziaurrehman Tanoli
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Finland
| | - Umair Seemab
- Haartman Institute, University of Helsinki, Finland
| | - Andreas Scherer
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Finland
| | - Krister Wennerberg
- Biotech Research & Innovation Centre (BRIC), University of Copenhagen, Denmark
| | - Jing Tang
- Faculty of medicine, University of Helsinki, Finland
| | - Markus Vähä-Koskela
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Finland
| |
Collapse
|
2
|
Dwivedi DK, Sahu A, Dighade SJ, Agrawal RK. Design, synthesis, and antimicrobial evaluation of some nifuroxazide analogs against nosocomial infection. J Heterocycl Chem 2020. [DOI: 10.1002/jhet.3891] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Deepak K. Dwivedi
- Department of Pharmaceutical SciencesDr. Harisingh Gour Central University Sagar India
- Department of Pharmaceutical ChemistryInstitute of Pharmaceutical Education and Research Wardha India
| | - Adarsh Sahu
- Department of Pharmaceutical SciencesDr. Harisingh Gour Central University Sagar India
| | - Sachin J. Dighade
- Department of Pharmaceutical ChemistryInstitute of Pharmaceutical Education and Research Wardha India
| | - Ram Kishore Agrawal
- Department of Pharmaceutical SciencesDr. Harisingh Gour Central University Sagar India
| |
Collapse
|
3
|
Farouk R, SayedElahl M. Microarray spot segmentation algorithm based on integro-differential operator. EGYPTIAN INFORMATICS JOURNAL 2019. [DOI: 10.1016/j.eij.2019.04.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
4
|
Chun S, Muthu M, Gopal J, Paul D, Kim DH, Gansukh E, Anthonydhason V. The unequivocal preponderance of biocomputation in clinical virology. RSC Adv 2018; 8:17334-17345. [PMID: 35539262 PMCID: PMC9080393 DOI: 10.1039/c8ra00888d] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Accepted: 03/14/2018] [Indexed: 11/22/2022] Open
Abstract
Bioinformatics and computer based data simulation and modeling are captivating biological research, delivering great results already and promising to deliver more. As biological research is a complex, intricate, diverse field, any available support is gladly taken. With recent outbreaks and epidemics, pathogens are a constant threat to the global economy and security. Virus related plagues are somehow the most difficult to handle. Biocomputation has provided appreciable help in resolving clinical virology related issues. This review, for the first time, surveys the current status of the role of computation in virus related research. Advances made in the fields of clinical virology, antiviral drug design, viral immunology and viral oncology, through input from biocomputation, have been discussed. The amount of progress made and the software platforms available are consolidated in this review. The limitations of computation based methods are presented. Finally, the challenges facing the future of biocomputation in clinical virology are speculated upon. Biocomputation in clinical virology.![]()
Collapse
Affiliation(s)
- Sechul Chun
- Department of Environmental Health Science
- Konkuk University
- Seoul 143-701
- Korea
| | - Manikandan Muthu
- Department of Environmental Health Science
- Konkuk University
- Seoul 143-701
- Korea
| | - Judy Gopal
- Department of Environmental Health Science
- Konkuk University
- Seoul 143-701
- Korea
| | - Diby Paul
- Environmental Microbiology
- Department of Environmental Engineering
- Konkuk University
- Seoul 143-701
- Korea
| | - Doo Hwan Kim
- Department of Environmental Health Science
- Konkuk University
- Seoul 143-701
- Korea
| | - Enkhtaivan Gansukh
- Department of Environmental Health Science
- Konkuk University
- Seoul 143-701
- Korea
| | - Vimala Anthonydhason
- Department of Biotechnology
- Indian Institute of Technology-Madras
- Chennai 600036
- India
| |
Collapse
|
5
|
Analysis of a single Helicobacter pylori strain over a 10-year period in a primate model. Int J Med Microbiol 2015; 305:392-403. [PMID: 25804332 DOI: 10.1016/j.ijmm.2015.03.002] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2014] [Revised: 01/30/2015] [Accepted: 03/01/2015] [Indexed: 12/18/2022] Open
Abstract
Helicobacter pylori from different individuals exhibits substantial genetic diversity. However, the kinetics of bacterial diversification after infection with a single strain is poorly understood. We investigated evolution of H. pylori following long-term infection in the primate stomach; Rhesus macaques were infected with H. pylori strain USU101 and then followed for 10 years. H. pylori was regularly cultured from biopsies, and single colony isolates were analyzed. At 1-year, DNA fingerprinting showed that all output isolates were identical to the input strain; however, at 5-years, different H. pylori fingerprints were observed. Microarray-based comparative genomic hybridization revealed that long term persistence of USU101 in the macaque stomach was associated with specific whole gene changes. Further detailed investigation showed that levels of the BabA protein were dramatically reduced within weeks of infection. The molecular mechanisms behind this reduction were shown to include phase variation and gene loss via intragenomic rearrangement, suggesting strong selective pressure against BabA expression in the macaque model. Notably, although there is apparently strong selective pressure against babA, babA is required for establishment of infection in this model as a strain in which babA was deleted was unable to colonize experimentally infected macaques.
Collapse
|
6
|
In silico identification of regulatory motifs in co-expressed genes under osmotic stress representing their co-regulation. ACTA ACUST UNITED AC 2015. [DOI: 10.1016/j.plgene.2015.01.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
7
|
Wu H, Fujiwara T, Yamamoto Y, Bolleman J, Yamaguchi A. BioBenchmark Toyama 2012: an evaluation of the performance of triple stores on biological data. J Biomed Semantics 2014; 5:32. [PMID: 25089180 PMCID: PMC4118313 DOI: 10.1186/2041-1480-5-32] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Accepted: 04/27/2014] [Indexed: 12/21/2022] Open
Abstract
Background Biological databases vary enormously in size and data complexity, from small databases that contain a few million Resource Description Framework (RDF) triples to large databases that contain billions of triples. In this paper, we evaluate whether RDF native stores can be used to meet the needs of a biological database provider. Prior evaluations have used synthetic data with a limited database size. For example, the largest BSBM benchmark uses 1 billion synthetic e-commerce knowledge RDF triples on a single node. However, real world biological data differs from the simple synthetic data much. It is difficult to determine whether the synthetic e-commerce data is efficient enough to represent biological databases. Therefore, for this evaluation, we used five real data sets from biological databases. Results We evaluated five triple stores, 4store, Bigdata, Mulgara, Virtuoso, and OWLIM-SE, with five biological data sets, Cell Cycle Ontology, Allie, PDBj, UniProt, and DDBJ, ranging in size from approximately 10 million to 8 billion triples. For each database, we loaded all the data into our single node and prepared the database for use in a classical data warehouse scenario. Then, we ran a series of SPARQL queries against each endpoint and recorded the execution time and the accuracy of the query response. Conclusions Our paper shows that with appropriate configuration Virtuoso and OWLIM-SE can satisfy the basic requirements to load and query biological data less than 8 billion or so on a single node, for the simultaneous access of 64 clients. OWLIM-SE performs best for databases with approximately 11 million triples; For data sets that contain 94 million and 590 million triples, OWLIM-SE and Virtuoso perform best. They do not show overwhelming advantage over each other; For data over 4 billion Virtuoso works best. 4store performs well on small data sets with limited features when the number of triples is less than 100 million, and our test shows its scalability is poor; Bigdata demonstrates average performance and is a good open source triple store for middle-sized (500 million or so) data set; Mulgara shows a little of fragility.
Collapse
Affiliation(s)
- Hongyan Wu
- Database Center for Life Science, Research Organization of Information and Systems, 178-4-4 Wakashiba, Kashiwa, Chiba 277-0871, Japan
| | | | - Yasunori Yamamoto
- Database Center for Life Science, Research Organization of Information and Systems, 178-4-4 Wakashiba, Kashiwa, Chiba 277-0871, Japan
| | - Jerven Bolleman
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel Servet, 1211 Geneva 4, Switzerland
| | - Atsuko Yamaguchi
- Database Center for Life Science, Research Organization of Information and Systems, 178-4-4 Wakashiba, Kashiwa, Chiba 277-0871, Japan
| |
Collapse
|
8
|
Romeo MJ, Espina V, Lowenthal M, Espina BH, Petricoin EF, Liotta LA. CSF proteome: a protein repository for potential biomarker identification. Expert Rev Proteomics 2014; 2:57-70. [PMID: 15966853 DOI: 10.1586/14789450.2.1.57] [Citation(s) in RCA: 103] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Proteomic analysis is not limited to the analysis of serum or tissues. Synovial, peritoneal, pericardial and cerebrospinal fluid represent unique proteomes for disease diagnosis and prognosis. In particular, cerebrospinal fluid serves as a rich source of putative biomarkers that are not solely limited to neurologic disorders. Peptides, proteolytic fragments and antibodies are capable of crossing the blood-brain barrier, thus providing a repository of pathologic information. Proteomic technologies such as immunoblotting, isoelectric focusing, 2D gel electrophoresis and mass spectrometry have proven useful for deciphering this unique proteome. Cerebrospinal fluid proteins are generally less abundant than their corresponding serum counterparts, necessitating the development and use of sensitive analytical techniques. This review highlights some of the promising areas of cerebrospinal fluid proteomic research and their clinical applications.
Collapse
Affiliation(s)
- Martin J Romeo
- Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, 9000 Rockville Pike, Bethesda, MD 20892, USA.
| | | | | | | | | | | |
Collapse
|
9
|
Kuczenski RS, Aggarwal K, Lee KH. Improved understanding of gene expression regulation using systems biology. Expert Rev Proteomics 2014; 2:915-24. [PMID: 16307520 DOI: 10.1586/14789450.2.6.915] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
This article reviews the current state of systems biology approaches, including the experimental tools used to generate 'omic' data and computational frameworks to interpret this data. Through illustrative examples, systems biology approaches to understand gene expression and gene expression regulation are discussed. Some of the challenges facing this field and the future opportunities in the systems biology era are highlighted.
Collapse
Affiliation(s)
- Robert S Kuczenski
- Cornell University, School of Chemical & Biomolecular Engineering, 120 Olin Hall, Ithaca, NY 14853, USA.
| | | | | |
Collapse
|
10
|
Yu D, Kim M, Xiao G, Hwang TH. Review of biological network data and its applications. Genomics Inform 2013; 11:200-10. [PMID: 24465231 PMCID: PMC3897847 DOI: 10.5808/gi.2013.11.4.200] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Revised: 11/20/2013] [Accepted: 11/21/2013] [Indexed: 12/16/2022] Open
Abstract
Studying biological networks, such as protein-protein interactions, is key to understanding complex biological activities. Various types of large-scale biological datasets have been collected and analyzed with high-throughput technologies, including DNA microarray, next-generation sequencing, and the two-hybrid screening system, for this purpose. In this review, we focus on network-based approaches that help in understanding biological systems and identifying biological functions. Accordingly, this paper covers two major topics in network biology: reconstruction of gene regulatory networks and network-based applications, including protein function prediction, disease gene prioritization, and network-based genome-wide association study.
Collapse
Affiliation(s)
- Donghyeon Yu
- Department of Clinical Sciences, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Minsoo Kim
- Department of Clinical Sciences, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Guanghua Xiao
- Department of Clinical Sciences, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Tae Hyun Hwang
- Department of Clinical Sciences, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| |
Collapse
|
11
|
Cowles KN, Moser TS, Siryaporn A, Nyakudarika N, Dixon W, Turner JJ, Gitai Z. The putative Poc complex controls two distinct Pseudomonas aeruginosa polar motility mechanisms. Mol Microbiol 2013; 90:923-38. [PMID: 24102920 DOI: 10.1111/mmi.12403] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/13/2013] [Indexed: 11/27/2022]
Abstract
Each Pseudomonas aeruginosa cell localizes two types of motility structures, a single flagellum and one or two clusters of type IV pili, to the cell poles. Previous studies suggested that these motility structures arrive at the pole through distinct mechanisms. Here we performed a swimming motility screen to identify polar flagellum localization factors and discovered three genes homologous to the TonB/ExbB/ExbD complex that have defects in both flagella-mediated swimming and pilus-mediated twitching motility. We found that deletion of tonB3, PA2983 or PA2982 led to non-polar localization of the flagellum and FlhF, which was thought to sit at the top of the flagellar localization hierarchy. Surprisingly, these mutants also exhibited pronounced changes in pilus formation or localization, indicating that these proteins may co-ordinate both the pilus and flagellum motility systems. Thus, we have renamed PA2983 and PA2982, pocA and pocB, respectively, for polar organelle co-ordinator to reflect this function. Our results suggest that TonB3, PocA and PocB may form a membrane-associated complex, which we term the Poc complex. These proteins do not exhibit polar localization themselves, but are required for increased expression of pilus genes upon surface association, indicating that they regulate motility structures through either localization or transcriptional mechanisms.
Collapse
Affiliation(s)
- Kimberly N Cowles
- Department of Molecular Biology, Princeton University, Princeton, NJ, 08544, USA
| | | | | | | | | | | | | |
Collapse
|
12
|
A predicted functional gene network for the plant pathogen Phytophthora infestans as a framework for genomic biology. BMC Genomics 2013; 14:483. [PMID: 23865555 PMCID: PMC3734169 DOI: 10.1186/1471-2164-14-483] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2013] [Accepted: 07/15/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Associations between proteins are essential to understand cell biology. While this complex interplay between proteins has been studied in model organisms, it has not yet been described for the oomycete late blight pathogen Phytophthora infestans. RESULTS We present an integrative probabilistic functional gene network that provides associations for 37 percent of the predicted P. infestans proteome. Our method unifies available genomic, transcriptomic and comparative genomic data into a single comprehensive network using a Bayesian approach. Enrichment of proteins residing in the same or related subcellular localization validates the biological coherence of our predictions. The network serves as a framework to query existing genomic data using network-based methods, which thus far was not possible in Phytophthora. We used the network to study the set of interacting proteins that are encoded by genes co-expressed during sporulation. This identified potential novel roles for proteins in spore formation through their links to proteins known to be involved in this process such as the phosphatase Cdc14. CONCLUSIONS The functional association network represents a novel genome-wide data source for P. infestans that also acts as a framework to interrogate other system-wide data. In both capacities it will improve our understanding of the complex biology of P. infestans and related oomycete pathogens.
Collapse
|
13
|
Giannakeas N, Karvelis PS, Exarchos TP, Kalatzis FG, Fotiadis DI. Segmentation of microarray images using pixel classification—Comparison with clustering-based methods. Comput Biol Med 2013; 43:705-16. [DOI: 10.1016/j.compbiomed.2013.03.003] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2012] [Revised: 07/26/2012] [Accepted: 03/14/2013] [Indexed: 11/16/2022]
|
14
|
Billiau K, Sprenger H, Schudoma C, Walther D, K Hl KI. Data management pipeline for plant phenotyping in a multisite project. FUNCTIONAL PLANT BIOLOGY : FPB 2012; 39:948-957. [PMID: 32480844 DOI: 10.1071/fp12009] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2012] [Accepted: 06/22/2012] [Indexed: 05/26/2023]
Abstract
In plant breeding, plants have to be characterised precisely, consistently and rapidly by different people at several field sites within defined time spans. For a meaningful data evaluation and statistical analysis, standardised data storage is required. Data access must be provided on a long-term basis and be independent of organisational barriers without endangering data integrity or intellectual property rights. We discuss the associated technical challenges and demonstrate adequate solutions exemplified in a data management pipeline for a project to identify markers for drought tolerance in potato. This project involves 11 groups from academia and breeding companies, 11 sites and four analytical platforms. Our data warehouse concept combines central data storage in databases and a file server and integrates existing and specialised database solutions for particular data types with new, project-specific databases. The strict use of controlled vocabularies and the application of web-access technologies proved vital to the successful data exchange between diverse institutes and data management concepts and infrastructures. By presenting our data management system and making the software available, we aim to support related phenotyping projects.
Collapse
Affiliation(s)
- Kenny Billiau
- Max Planck Institute of Molecular Plant Physiology, Am Muehlenberg 1, 14476 Potsdam OT Golm, Germany
| | - Heike Sprenger
- Max Planck Institute of Molecular Plant Physiology, Am Muehlenberg 1, 14476 Potsdam OT Golm, Germany
| | - Christian Schudoma
- Max Planck Institute of Molecular Plant Physiology, Am Muehlenberg 1, 14476 Potsdam OT Golm, Germany
| | - Dirk Walther
- Max Planck Institute of Molecular Plant Physiology, Am Muehlenberg 1, 14476 Potsdam OT Golm, Germany
| | - Karin I K Hl
- Max Planck Institute of Molecular Plant Physiology, Am Muehlenberg 1, 14476 Potsdam OT Golm, Germany
| |
Collapse
|
15
|
Doherty KM, Pride LD, Lukose J, Snydsman BE, Charles R, Pramanik A, Muller EG, Botstein D, Moore CW. Loss of a 20S proteasome activator in Saccharomyces cerevisiae downregulates genes important for genomic integrity, increases DNA damage, and selectively sensitizes cells to agents with diverse mechanisms of action. G3 (BETHESDA, MD.) 2012; 2:943-59. [PMID: 22908043 PMCID: PMC3411250 DOI: 10.1534/g3.112.003376] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2012] [Accepted: 06/18/2012] [Indexed: 01/23/2023]
Abstract
Cytoprotective functions of a 20S proteasome activator were investigated. Saccharomyces cerevisiae Blm10 and human 20S proteasome activator 200 (PA200) are homologs. Comparative genome-wide analyses of untreated diploid cells lacking Blm10 and growing at steady state at defined growth rates revealed downregulation of numerous genes required for accurate chromosome structure, assembly and repair, and upregulation of a specific subset of genes encoding protein-folding chaperones. Blm10 loss or truncation of the Ubp3/Blm3 deubiquitinating enzyme caused massive chromosomal damage and cell death in homozygous diploids after phleomycin treatments, indicating that Blm10 and Ubp3/Blm3 function to stabilize the genome and protect against cell death. Diploids lacking Blm10 also were sensitized to doxorubicin, hydroxyurea, 5-fluorouracil, rapamycin, hydrogen peroxide, methyl methanesulfonate, and calcofluor. Fluorescently tagged Blm10 localized in nuclei, with enhanced fluorescence after DNA replication. After DNA damage that caused a classic G2/M arrest, fluorescence remained diffuse, with evidence of nuclear fragmentation in some cells. Protective functions of Blm10 did not require the carboxyl-terminal region that makes close contact with 20S proteasomes, indicating that protection does not require this contact or the truncated Blm10 can interact with the proteasome apart from this region. Without its carboxyl-terminus, Blm10((-339aa)) localized to nuclei in untreated, nonproliferating (G(0)) cells, but not during G(1) S, G(2), and M. The results indicate Blm10 functions in protective mechanisms that include the machinery that assures proper assembly of chromosomes. These essential guardian functions have implications for ubiquitin-independent targeting in anticancer therapy. Targeting Blm10/PA200 together with one or more of the upregulated chaperones or a conventional treatment could be efficacious.
Collapse
Affiliation(s)
- Kevin M. Doherty
- Department of Microbiology and Immunology, City University of New York Sophie Davis School of Biomedical Education, City College, New York, New York 10031-9101
- The Graduate Center Program in Biochemistry, City University of New York, New York, New York 10016-4309
| | - Leah D. Pride
- Department of Microbiology and Immunology, City University of New York Sophie Davis School of Biomedical Education, City College, New York, New York 10031-9101
- Department of Biochemistry, City College, City University of New York, New York, New York 10031-9101
| | - James Lukose
- Department of Microbiology and Immunology, City University of New York Sophie Davis School of Biomedical Education, City College, New York, New York 10031-9101
| | - Brian E. Snydsman
- Department of Biochemistry, University of Washington, Seattle, Washington 98195-7350
| | - Ronald Charles
- Department of Microbiology and Immunology, City University of New York Sophie Davis School of Biomedical Education, City College, New York, New York 10031-9101
| | - Ajay Pramanik
- Department of Microbiology and Immunology, City University of New York Sophie Davis School of Biomedical Education, City College, New York, New York 10031-9101
| | - Eric G. Muller
- Department of Biochemistry, University of Washington, Seattle, Washington 98195-7350
| | - David Botstein
- Lewis-Sigler Institute for Integrative Genomics and Department of Molecular Biology, Princeton University, Princeton, New Jersey 08544-1004, and
| | - Carol Wood Moore
- Department of Microbiology and Immunology, City University of New York Sophie Davis School of Biomedical Education, City College, New York, New York 10031-9101
- Graduate Center Programs in Biochemistry and Biology, City University of New York, New York, New York 10016-4309
| |
Collapse
|
16
|
Giannakeas N, Fotiadis DI. Image Processing and Machine Learning Techniques for the Segmentation of cDNA Microarray Images. Mach Learn 2012. [DOI: 10.4018/978-1-60960-818-7.ch406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Microarray technology allows the comprehensive measurement of the expression level of many genes simultaneously on a common substrate. Typical applications of microarrays include the quantification of expression profiles of a system under different experimental conditions, or expression profile comparisons of two systems for one or more conditions. Microarray image analysis is a crucial step in the analysis of microarray data. In this chapter an extensive overview of the segmentation of the microarray image is presented. Methods already presented in the literature are classified into two main categories:methods which are based on image processing techniques and those which are based on Machine learning techniques. A novel classification-based application for the segmentation is also presented to demonstrate efficiency.
Collapse
|
17
|
Frequent Pattern Discovery in Multiple Biological Networks: Patterns and Algorithms. STATISTICS IN BIOSCIENCES 2011. [DOI: 10.1007/s12561-011-9047-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
18
|
Sinha AU, Merrill E, Armstrong SA, Clark TW, Das S. eXframe: reusable framework for storage, analysis and visualization of genomics experiments. BMC Bioinformatics 2011; 12:452. [PMID: 22103807 PMCID: PMC3235155 DOI: 10.1186/1471-2105-12-452] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2011] [Accepted: 11/21/2011] [Indexed: 11/19/2022] Open
Abstract
Background Genome-wide experiments are routinely conducted to measure gene expression, DNA-protein interactions and epigenetic status. Structured metadata for these experiments is imperative for a complete understanding of experimental conditions, to enable consistent data processing and to allow retrieval, comparison, and integration of experimental results. Even though several repositories have been developed for genomics data, only a few provide annotation of samples and assays using controlled vocabularies. Moreover, many of them are tailored for a single type of technology or measurement and do not support the integration of multiple data types. Results We have developed eXframe - a reusable web-based framework for genomics experiments that provides 1) the ability to publish structured data compliant with accepted standards 2) support for multiple data types including microarrays and next generation sequencing 3) query, analysis and visualization integration tools (enabled by consistent processing of the raw data and annotation of samples) and is available as open-source software. We present two case studies where this software is currently being used to build repositories of genomics experiments - one contains data from hematopoietic stem cells and another from Parkinson's disease patients. Conclusion The web-based framework eXframe offers structured annotation of experiments as well as uniform processing and storage of molecular data from microarray and next generation sequencing platforms. The framework allows users to query and integrate information across species, technologies, measurement types and experimental conditions. Our framework is reusable and freely modifiable - other groups or institutions can deploy their own custom web-based repositories based on this software. It is interoperable with the most important data formats in this domain. We hope that other groups will not only use eXframe, but also contribute their own useful modifications.
Collapse
Affiliation(s)
- Amit U Sinha
- Department of Pediatric Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA 02115, USA
| | | | | | | | | |
Collapse
|
19
|
DENG NING, DUAN HUILONG. AUTOMATED MICROARRAY IMAGE GRIDDING USING IMAGE PROJECTION VECTORS COUPLED WITH POWER SPECTRUM MODEL. INT J PATTERN RECOGN 2011. [DOI: 10.1142/s021800141000810x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Microarray technology has been increasingly recognized as a powerful means for monitoring the expression levels of thousands of genes simultaneously. Microarray image processing is an essential aspect of microarray experiment, of which gridding is thought to be the most important step of spot recognition. Many times, microarray image gridding requires assisted intervention to achieve the acceptable accuracy. In this paper, an automatic microarray image gridding algorithm was presented by using image projection vectors together with power spectrum model. For obtaining grid position, the image projection vectors were utilized by adequately considering the grid parameters. On the other hand, as a preprocessing procedure of microarray gridding, detection of the grid rotation was involved in our study by using power spectrum analyses of the image projection vectors. Our approach has been evaluated by three different microarray datasets. Experimental comparisons with up-to-date approaches by using both synthetic and real image data are demonstrated. The gridding result was shown to be very accurate, and able to provide correct gridding dataset for the downstream microarray analyses. In summary, our study demonstrated the combination of image projection vectors with power spectrum model as a powerful strategy for microarray image gridding.
Collapse
Affiliation(s)
- NING DENG
- Department of Biomedical Engineering, Key Laboratory for Biomedical Engineering of Ministry of Education, Zhejiang University, Hangzhou 310027, P. R. China
| | - HUILONG DUAN
- Department of Biomedical Engineering, Key Laboratory for Biomedical Engineering of Ministry of Education, Zhejiang University, Hangzhou 310027, P. R. China
| |
Collapse
|
20
|
van Berlo RJP, Wessels LFA, De Ridder D, Reinders MJT. PROTEIN COMPLEX PREDICTION USING AN INTEGRATIVE BIOINFORMATICS APPROACH. J Bioinform Comput Biol 2011; 5:839-64. [PMID: 17787059 DOI: 10.1142/s0219720007002953] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2006] [Revised: 03/22/2007] [Accepted: 03/22/2007] [Indexed: 11/18/2022]
Abstract
Since protein complexes play a crucial role in biological cells, one of the major goals in bioinformatics is the elucidation of protein complexes. A general approach is to build a prediction rule based on multiple data sources, e.g. gene expression data and protein interaction data, to assess the likelihood of two proteins having complex association.a We critically revisit the step of predictor construction, i.e. the determination of a proper training set, an optimal classifier, and, most importantly, an optimal feature set. We use an exhaustive set of features, which includes the 2hop-feature as introduced by Wong et al.23 for predicting synthetic sick or lethal interactions. Post-processing of the likelihoods of protein interaction is then required to extract protein complexes. We propose a new protocol for combining these likelihood estimates. The protocol interprets the probabilities of complex association as output by the prediction rule as distances and employs hierarchical clustering to find groups of interacting proteins. In contrast to the computationally expensive search-and-score approach of Sharan et al.,19 this protocol is very fast and can be applied to fully connected graphs. The protocol identifies trusted protein complexes with high confidence. We show that the 2hop-feature is relevant for predicting protein complexes. Furthermore, several interesting hypotheses about new protein complexes have been generated. For example, our approach linked the protein FYV4 to the mitochondrial ribosomal subunit. Interestingly, it is known that this protein is located in the mitochondrion, but its biological role is unknown. Vid22 and YGR071C were also linked, which corresponds to the new TAP data of Krogan et al.14
Collapse
Affiliation(s)
- Rogier J P van Berlo
- Information and Communication Theory Group, Delft University of Technology, Mekelweg 4, Delft, Zuid-Holland, 2628 CD, The Netherlands.
| | | | | | | |
Collapse
|
21
|
Bianchi FT, Camera P, Ala U, Imperiale D, Migheli A, Boda E, Tempia F, Berto G, Bosio Y, Oddo S, LaFerla FM, Taraglio S, Dotti CG, Di Cunto F. The collagen chaperone HSP47 is a new interactor of APP that affects the levels of extracellular beta-amyloid peptides. PLoS One 2011; 6:e22370. [PMID: 21829458 PMCID: PMC3145648 DOI: 10.1371/journal.pone.0022370] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2011] [Accepted: 06/27/2011] [Indexed: 01/08/2023] Open
Abstract
Alzheimer disease (AD) is a neurodegenerative disorder characterized by progressive decline of cognitive function that represents one of the most dramatic medical challenges for the aging population. Aβ peptides, generated by processing of the Amyloid Precursor Protein (APP), are thought to play a central role in the pathogenesis of AD. However, the network of physical and functional interactions that may affect their production and deposition is still poorly understood. The use of a bioinformatic approach based on human/mouse conserved coexpression allowed us to identify a group of genes that display an expression profile strongly correlated with APP. Among the most prominent candidates, we investigated whether the collagen chaperone HSP47 could be functionally correlated with APP. We found that HSP47 accumulates in amyloid deposits of two different mouse models and of some AD patients, is capable to physically interact with APP and can be relocalized by APP overexpression. Notably, we found that it is possible to reduce the levels of secreted Aβ peptides by reducing the expression of HSP47 or by interfering with its activity via chemical inhibitors. Our data unveil HSP47 as a new functional interactor of APP and imply it as a potential target for preventing the formation and/or growth amyloid plaques.
Collapse
Affiliation(s)
- Federico T. Bianchi
- Department of Genetics, Biology and Biochemistry, Molecular Biotechnology Center, University of Torino, Torino, Italy
| | - Paola Camera
- Department of Genetics, Biology and Biochemistry, Molecular Biotechnology Center, University of Torino, Torino, Italy
| | - Ugo Ala
- Department of Genetics, Biology and Biochemistry, Molecular Biotechnology Center, University of Torino, Torino, Italy
| | | | | | - Enrica Boda
- Department of Neurosciences, University of Torino, Torino, Italy
| | - Filippo Tempia
- Department of Neurosciences, University of Torino, Torino, Italy
| | - Gaia Berto
- Department of Genetics, Biology and Biochemistry, Molecular Biotechnology Center, University of Torino, Torino, Italy
| | - Ylenia Bosio
- Department of Genetics, Biology and Biochemistry, Molecular Biotechnology Center, University of Torino, Torino, Italy
| | - Salvatore Oddo
- Department of Physiology, University of Texas Health Science Center, San Antonio, Texas, United States of America
| | - Frank M. LaFerla
- Department of Neurobiology and Behavior, Institute for Memory Impairments and Neurological Disorders, University of California Irvine, Irvine, California, United States of America
| | | | - Carlos G. Dotti
- VIB Department of Molecular and Developmental Genetics and Katholieke Universiteit Leuven, Department of Human Genetics, Leuven, Belgium
| | - Ferdinando Di Cunto
- Department of Genetics, Biology and Biochemistry, Molecular Biotechnology Center, University of Torino, Torino, Italy
- * E-mail:
| |
Collapse
|
22
|
Rutherford ST, van Kessel JC, Shao Y, Bassler BL. AphA and LuxR/HapR reciprocally control quorum sensing in vibrios. Genes Dev 2011; 25:397-408. [PMID: 21325136 DOI: 10.1101/gad.2015011] [Citation(s) in RCA: 198] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Bacteria cycle between periods when they perform individual behaviors and periods when they perform group behaviors. These transitions are controlled by a cell-cell communication process called quorum sensing, in which extracellular signal molecules, called autoinducers (AIs), are released, accumulate, and are synchronously detected by a group of bacteria. AI detection results in community-wide changes in gene expression, enabling bacteria to collectively execute behaviors such as bioluminescence, biofilm formation, and virulence factor production. In this study, we show that the transcription factor AphA is a master regulator of quorum sensing that operates at low cell density (LCD) in Vibrio harveyi and Vibrio cholerae. In contrast, LuxR (V. harveyi)/HapR (V. cholerae) is the master regulator that operates at high cell density (HCD). At LCD, redundant small noncoding RNAs (sRNAs) activate production of AphA, and AphA and the sRNAs repress production of LuxR/HapR. Conversely, at HCD, LuxR/HapR represses aphA. This network architecture ensures maximal AphA production at LCD and maximal LuxR/HapR production at HCD. Microarray analyses reveal that 300 genes are regulated by AphA at LCD in V. harveyi, a subset of which is also controlled by LuxR. We propose that reciprocal gradients of AphA and LuxR/HapR establish the quorum-sensing LCD and HCD gene expression patterns, respectively.
Collapse
Affiliation(s)
- Steven T Rutherford
- Department of Molecular Biology, Princeton University, Princeton, New Jersey 08544, USA
| | | | | | | |
Collapse
|
23
|
|
24
|
Jung Y, Seo HJ, Park YR, Kim JH, Bien SJ, Kim JH. Standard-based Integration of Heterogeneous Large-scale DNA Microarray Data for Improving Reusability. Genomics Inform 2011. [DOI: 10.5808/gi.2011.9.1.019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
|
25
|
Chervitz SA, Deutsch EW, Field D, Parkinson H, Quackenbush J, Rocca-Serra P, Sansone SA, Stoeckert CJ, Taylor CF, Taylor R, Ball CA. Data standards for Omics data: the basis of data sharing and reuse. Methods Mol Biol 2011; 719:31-69. [PMID: 21370078 PMCID: PMC4152841 DOI: 10.1007/978-1-61779-027-0_2] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
To facilitate sharing of Omics data, many groups of scientists have been working to establish the relevant data standards. The main components of data sharing standards are experiment description standards, data exchange standards, terminology standards, and experiment execution standards. Here we provide a survey of existing and emerging standards that are intended to assist the free and open exchange of large-format data.
Collapse
|
26
|
Brysbaert G, Pellay FX, Noth S, Benecke A. Quality assessment of transcriptome data using intrinsic statistical properties. GENOMICS PROTEOMICS & BIOINFORMATICS 2010; 8:57-71. [PMID: 20451162 PMCID: PMC5054119 DOI: 10.1016/s1672-0229(10)60006-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
In view of potential application to biomedical diagnosis, tight transcriptome data quality control is compulsory. Usually, quality control is achieved using labeling and hybridization controls added at different stages throughout the processing of the biologic RNA samples. These control measures, however, only reflect the performance of the individual technical manipulations during the entire process and have no bearing as to the continued integrity of the RNA sample itself. Here we demonstrate that intrinsic statistical properties of the resulting transcriptome data signal and signal-variance distributions and their invariance can be identified independently of the animal species studied and the labeling protocol used. From these invariant properties we have developed a data model, the parameters of which can be estimated from individual experiments and used to compute relative quality measures based on similarity with large reference datasets. These quality measures add supplementary, non-redundant information to standard quality control estimates based on spike-in and hybridization controls, and are exploitable in data analysis. A software application for analyzing datasets as well as a reference dataset for AB1700 arrays are provided. They should allow AB1700 users to easily integrate this method into their analysis pipeline, and might instigate similar developments for other transcriptome platforms.
Collapse
Affiliation(s)
- Guillaume Brysbaert
- Institut des Hautes Etudes Scientifiques & Institut de Recherche Interdisciplinaire (CNRS USR3078, Université de Lille1), 91440 Bures-sur-Yvette, France
| | | | | | | |
Collapse
|
27
|
Penkett CJ, Bähler J. Navigating public microarray databases. Comp Funct Genomics 2010; 5:471-9. [PMID: 18629145 PMCID: PMC2447434 DOI: 10.1002/cfg.427] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2004] [Revised: 08/12/2004] [Accepted: 08/12/2004] [Indexed: 11/17/2022] Open
Abstract
With the ever-escalating amount of data being produced by genome-wide microarray
studies, it is of increasing importance that these data are captured in public databases
so that researchers can use this information to complement and enhance their own
studies. Many groups have set up databases of expression data, ranging from large
repositories, which are designed to comprehensively capture all published data,
through to more specialized databases. The public repositories, such as ArrayExpress
at the European Bioinformatics Institute contain complete datasets in raw format in
addition to processed data, whilst the specialist databases tend to provide downstream
analysis of normalized data from more focused studies and data sources. Here we
provide a guide to the use of these public microarray resources.
Collapse
Affiliation(s)
- Christopher J Penkett
- The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
| | | |
Collapse
|
28
|
ROCK: a breast cancer functional genomics resource. Breast Cancer Res Treat 2010; 124:567-72. [PMID: 20563840 DOI: 10.1007/s10549-010-0945-5] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2010] [Accepted: 05/08/2010] [Indexed: 12/20/2022]
Abstract
The clinical and pathological heterogeneity of breast cancer has instigated efforts to stratify breast cancer sub-types according to molecular profiles. These profiling efforts are now being augmented by large-scale functional screening of breast tumour cell lines, using approaches such as RNA interference. We have developed ROCK ( rock.icr.ac.uk ) to provide a unique, publicly accessible resource for the integration of breast cancer functional and molecular profiling datasets. ROCK provides a simple online interface for the navigation and cross-correlation of gene expression, aCGH and RNAi screen data. It enables the interrogation of gene lists in the context of statistically analysed functional genomic datasets, interaction networks, pathways, GO terms, mutations and drug targets. The interface also provides interactive visualisations of datasets and interaction networks. ROCK collates data from a wealth of breast cancer molecular profiling and functional screening studies into a single portal, where analysed and annotated results can be accessed at the level of a gene, sample or study. We believe that portals such as ROCK will not only afford researchers rapid access to profiling data, but also aid the integration of different data types, thus enhancing the discovery of novel targets and biomarkers for breast cancer.
Collapse
|
29
|
Han W, Nicolau M, Noh DY, Jeffrey SS. Characterization of molecular subtypes of Korean breast cancer: an ethnically and clinically distinct population. Int J Oncol 2010; 37:51-9. [PMID: 20514396 DOI: 10.3892/ijo_00000652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
We aimed to investigate the molecular characteristics of Korean breast cancer. A cDNA microarray study (>42k clones) was performed on 69 breast cancers and three normal breast tissues. The subjects had a high percentage of HER-2 expression, hormone receptor negativity, and young onset. Molecular subtypes according to gene expression profiles were determined and their correlations to the clinicopathologic characteristics and patients outcome were analyzed. The tumors were subdivided into luminal-, normal breast-like, ERBB2+, and basal-like subtypes according to the correlations to the previously described intrinsic genes and five centroids. Only a few tumors were highly correlated to the luminal B and normal-like centroids. The high grade tumors with high p53 and Ki-67 were found more commonly in non-luminal tumors. Distant recurrence-free survival was worse in ERBB2+ and basal-like subgroups than luminal tumors. In an unsupervised clustering with 864 genes, many interesting gene clusters were observed, some of which had not been previously described. Although the Korean breast cancers showed generally similar molecular phenotypes as Western studies, some distinct gene expression patterns and their association to clinical outcomes were observed.
Collapse
Affiliation(s)
- Wonshik Han
- Departement of Surgery, and Cancer Research Institute, Seoul National University College of Medicine, Seoul, Korea
| | | | | | | |
Collapse
|
30
|
Gupta G, Liu A, Ghosh J. Automated hierarchical density shaving: a robust automated clustering and visualization framework for large biological data sets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2010; 7:223-237. [PMID: 20431143 DOI: 10.1109/tcbb.2008.32] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
A key application of clustering data obtained from sources such as microarrays, protein mass spectroscopy, and phylogenetic profiles is the detection of functionally related genes. Typically, only a small number of functionally related genes cluster into one or more groups, and the rest need to be ignored. For such situations, we present Automated Hierarchical Density Shaving (Auto-HDS), a framework that consists of a fast hierarchical density-based clustering algorithm and an unsupervised model selection strategy. Auto-HDS can automatically select clusters of different densities, present them in a compact hierarchy, and rank individual clusters using an innovative stability criteria. Our framework also provides a simple yet powerful 2D visualization of the hierarchy of clusters that is useful for further interactive exploration. We present results on Gasch and Lee microarray data sets to show the effectiveness of our methods. Additional results on other biological data are included in the supplemental material.
Collapse
|
31
|
Tagmount A, Wang M, Lindquist E, Tanaka Y, Teranishi KS, Sunagawa S, Wong M, Stillman JH. The porcelain crab transcriptome and PCAD, the porcelain crab microarray and sequence database. PLoS One 2010; 5:e9327. [PMID: 20174471 PMCID: PMC2824831 DOI: 10.1371/journal.pone.0009327] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2009] [Accepted: 01/27/2010] [Indexed: 01/11/2023] Open
Abstract
Background With the emergence of a completed genome sequence of the freshwater crustacean Daphnia pulex, construction of genomic-scale sequence databases for additional crustacean sequences are important for comparative genomics and annotation. Porcelain crabs, genus Petrolisthes, have been powerful crustacean models for environmental and evolutionary physiology with respect to thermal adaptation and understanding responses of marine organisms to climate change. Here, we present a large-scale EST sequencing and cDNA microarray database project for the porcelain crab Petrolisthes cinctipes. Methodology/Principal Findings A set of ∼30K unique sequences (UniSeqs) representing ∼19K clusters were generated from ∼98K high quality ESTs from a set of tissue specific non-normalized and mixed-tissue normalized cDNA libraries from the porcelain crab Petrolisthes cinctipes. Homology for each UniSeq was assessed using BLAST, InterProScan, GO and KEGG database searches. Approximately 66% of the UniSeqs had homology in at least one of the databases. All EST and UniSeq sequences along with annotation results and coordinated cDNA microarray datasets have been made publicly accessible at the Porcelain Crab Array Database (PCAD), a feature-enriched version of the Stanford and Longhorn Array Databases. Conclusions/Significance The EST project presented here represents the third largest sequencing effort for any crustacean, and the largest effort for any crab species. Our assembly and clustering results suggest that our porcelain crab EST data set is equally diverse to the much larger EST set generated in the Daphnia pulex genome sequencing project, and thus will be an important resource to the Daphnia research community. Our homology results support the pancrustacea hypothesis and suggest that Malacostraca may be ancestral to Branchiopoda and Hexapoda. Our results also suggest that our cDNA microarrays cover as much of the transcriptome as can reasonably be captured in EST library sequencing approaches, and thus represent a rich resource for studies of environmental genomics.
Collapse
Affiliation(s)
- Abderrahmane Tagmount
- Romberg Tiburon Center and Department of Biology, San Francisco State University, Tiburon, California, United States of America
| | - Mei Wang
- Department of Energy Joint Genome Institute, Walnut Creek, California, United States of America
| | - Erika Lindquist
- Department of Energy Joint Genome Institute, Walnut Creek, California, United States of America
| | - Yoshihiro Tanaka
- Romberg Tiburon Center and Department of Biology, San Francisco State University, Tiburon, California, United States of America
| | - Kristen S. Teranishi
- Romberg Tiburon Center and Department of Biology, San Francisco State University, Tiburon, California, United States of America
| | - Shinichi Sunagawa
- School of Natural Sciences, University of California Merced, Merced, California, United States of America
| | - Mike Wong
- Center for Computing in the Life Sciences, San Francisco State University, San Francisco, California, United States of America
| | - Jonathon H. Stillman
- Romberg Tiburon Center and Department of Biology, San Francisco State University, Tiburon, California, United States of America
- Department of Integrative Biology, University of California, Berkeley, California, United States of America
- * E-mail:
| |
Collapse
|
32
|
Wilflingseder J, Kainz A, Mühlberger I, Perco P, Langer R, Kristo I, Mayer B, Oberbauer R. Impaired metabolism in donor kidney grafts after steroid pretreatment. Transpl Int 2010; 23:796-804. [PMID: 20149158 DOI: 10.1111/j.1432-2277.2010.01053.x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Summary We recently showed in a randomized control trial that steroid pretreatment of the deceased organ donor suppressed inflammation in the transplant organ but did not reduce the rate or duration of delayed graft function (DGF). This study sought to elucidate such of those factors that caused DGF in the steroid-treated subjects. Genome-wide gene expression profiles were used from 20 steroid-pretreated donor-organs and were analyzed on the level of regulatory protein-protein interaction networks. Significance analysis of microarrays (SAM) yielded 63 significantly down-regulated sequences associated with DGF that could be functionally categorized according to Protein ANalysis THrough Evolutionary Relationships ontologies into two main biologic processes: transport (P < 0.001) and metabolism (P < 0.001). The identified genes suggest hypoxia as the cause of DGF, which cannot be counterbalanced by steroid treatment. Our data showed that molecular pathways affected by ischemia such as transport and metabolism are associated with DGF. Potential interventional targeted therapy based on these findings includes peroxisome proliferator-activated receptor agonists or caspase inhibitors.
Collapse
|
33
|
Abstract
DNA microarray profiles are plagued by the issue of large number of variables but small number of samples and are often notorious for their low signal-to-noise ratio for clinical applications. Therefore, a great need for meta-analysis techniques is emerging to yield more valid and informative results than each experiment separately. By exploring the power of several studies in one single analysis, meta-analysis of many cancer gene-profiling data increases the statistical power to detect differentially expressed genes and allows assessment of heterogeneity. OrderedList is such a method that was specially proposed for cancer gene expression data meta-analysis. It is superior to other methods in that it does not rely on strong effects of differential gene expression in a single study but on consistent regulated genes across multiple studies. This chapter introduces the R implementation of this methodology on real data sets to identify biomarkers for adenocarcinoma lung cancer.
Collapse
Affiliation(s)
- Xinan Yang
- Division of Bioinformatics, State Key Laboratory of Bioelectronics (Chien-Shiung Wu Laboratory), Southeast University, Nanjing, China.
| | | |
Collapse
|
34
|
Celton M, Malpertuy A, Lelandais G, de Brevern AG. Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments. BMC Genomics 2010; 11:15. [PMID: 20056002 PMCID: PMC2827407 DOI: 10.1186/1471-2164-11-15] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2009] [Accepted: 01/07/2010] [Indexed: 11/17/2022] Open
Abstract
Background Microarray technologies produced large amount of data. In a previous study, we have shown the interest of k-Nearest Neighbour approach for restoring the missing gene expression values, and its positive impact of the gene clustering by hierarchical algorithm. Since, numerous replacement methods have been proposed to impute missing values (MVs) for microarray data. In this study, we have evaluated twelve different usable methods, and their influence on the quality of gene clustering. Interestingly we have used several datasets, both kinetic and non kinetic experiments from yeast and human. Results We underline the excellent efficiency of approaches proposed and implemented by Bo and co-workers and especially one based on expected maximization (EM_array). These improvements have been observed also on the imputation of extreme values, the most difficult predictable values. We showed that the imputed MVs have still important effects on the stability of the gene clusters. The improvement on the clustering obtained by hierarchical clustering remains limited and, not sufficient to restore completely the correct gene associations. However, a common tendency can be found between the quality of the imputation method and the gene cluster stability. Even if the comparison between clustering algorithms is a complex task, we observed that k-means approach is more efficient to conserve gene associations. Conclusions More than 6.000.000 independent simulations have assessed the quality of 12 imputation methods on five very different biological datasets. Important improvements have so been done since our last study. The EM_array approach constitutes one efficient method for restoring the missing expression gene values, with a lower estimation error level. Nonetheless, the presence of MVs even at a low rate is a major factor of gene cluster instability. Our study highlights the need for a systematic assessment of imputation methods and so of dedicated benchmarks. A noticeable point is the specific influence of some biological dataset.
Collapse
Affiliation(s)
- Magalie Celton
- INSERM UMR-S 726, Equipe de Bioinformatique Génomique et Moléculaire, DSIMB, Université Paris Diderot-Paris 7, 2 place Jussieu, Paris, France
| | | | | | | |
Collapse
|
35
|
Huang CW, Lin CY, Huang HY, Liu HW, Chen YJ, Shih DF, Chen HY, Juan CC, Ker CG, Huang CYF, Li CF, Shiue YL. CKS1B overexpression implicates clinical aggressiveness of hepatocellular carcinomas but not p27(Kip1) protein turnover: an independent prognosticator with potential p27 (Kip1)-independent oncogenic attributes? Ann Surg Oncol 2009; 17:907-22. [PMID: 19866239 DOI: 10.1245/s10434-009-0779-8] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2009] [Indexed: 12/25/2022]
Abstract
BACKGROUND Through data mining the Stanford Microarray Database, the CKS1B transcript was found to be frequently upregulated in hepatocellular carcinomas (HCCs) with low alpha-fetal protein (AFP) expression. Together with SKP2, CKS1B is known to implicate p27(Kip1) protein turnover promoting cell-cycle progression. METHODS CKS1B, p27(Kip1), and SKP2 were immunostained in 75 HCCs and correlated with clinicopathological features, local recurrence-free survival (LRFS), and overall survival (OS). Silencing of CKS1B and SKP2 with interference short-hairpin RNA (shRNA) was performed in SK-Hep1 and Hep-3B cell lines. RESULTS Immunohistochemically, increased CKS1B and SKP2, and attenuated p27(Kip1) were all associated with tumor multiplicity (P < 0.05) and increasing American Joint Committee on Cancer (AJCC) stage (P < 0.05). Overexpression of CKS1B significantly correlated with advanced Okuda stages (P = 0.048) and SKP2 overexpression (P = 0.047). Neither CKS1B nor SKP2 was inversely related to p27(Kip1), which was reinforced by no alteration in p27(Kip1) abundance in HCC-derived cells with CKS1B or SKP2 silencing. Both CKS1B overexpression (P = 0.0011 and P = 0.0017) and p27(Kip1) attenuation (P = 0.0079 and P = 0.0085) were predictive of OS and LRFS, respectively, while SKP2 overexpression was associated with worse OS alone (P = 0.0043). Combined assessment of CKS1B and p27(Kip1) was able to robustly distinguish three prognostically different groups (P < 0.0001). In multivariate comparison, CKS1B overexpression represented the strongest independent adverse prognosticator [OS, P = 0.0235, hazard ratio (HR): 4.193; LRFS, P = 0.0204, HR: 4.262], followed by p27(Kip1) attenuation (OS, P = 0.0320, HR: 2.553; LRFS, P = 0.0262, HR: 2.533). CONCLUSIONS CKS1B protein overexpression in HCCs is implicated in clinical aggressiveness but not in p27(Kip1) turnover, implying presence of p27(Kip1)-independent oncogenic attributes. The combined assessment of CKS1B and p27(Kip1) immunoexpressions effectively risk-stratifies HCCs with different prognoses, which may aid in the management of this deadly malignancy.
Collapse
Affiliation(s)
- Ching-Wen Huang
- Department of Surgery, Yuan's General Hospital, Kaohsiung, Taiwan
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Hulsman M, Reinders MJT, de Ridder D. Evolutionary optimization of kernel weights improves protein complex comembership prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2009; 6:427-437. [PMID: 19644171 DOI: 10.1109/tcbb.2008.137] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
In recent years, more and more high-throughput data sources useful for protein complex prediction have become available (e.g., gene sequence, mRNA expression, and interactions). The integration of these different data sources can be challenging. Recently, it has been recognized that kernel-based classifiers are well suited for this task. However, the different kernels (data sources) are often combined using equal weights. Although several methods have been developed to optimize kernel weights, no large-scale example of an improvement in classifier performance has been shown yet. In this work, we employ an evolutionary algorithm to determine weights for a larger set of kernels by optimizing a criterion based on the area under the ROC curve. We show that setting the right kernel weights can indeed improve performance. We compare this to the existing kernel weight optimization methods (i.e., (regularized) optimization of the SVM criterion or aligning the kernel with an ideal kernel) and find that these do not result in a significant performance improvement and can even cause a decrease in performance. Results also show that an expert approach of assigning high weights to features with high individual performance is not necessarily the best strategy.
Collapse
Affiliation(s)
- Marc Hulsman
- Information and Communication Theory Group, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Mekelweg 4, 2628 CD Delft, The Netherlands.
| | | | | |
Collapse
|
37
|
Carrera J, Rodrigo G, Jaramillo A. Towards the automated engineering of a synthetic genome. MOLECULAR BIOSYSTEMS 2009; 5:733-43. [PMID: 19562112 DOI: 10.1039/b904400k] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
The development of the technology to synthesize new genomes and to introduce them into hosts with inactivated wild-type chromosome opens the door to new horizons in synthetic biology. Here it is of outmost importance to harness the ability of using computational design to predict and optimize a synthetic genome before attempting its synthesis. The methodology to computationally design a genome is based on an optimization that computationally mimics genome evolution. The biggest bottleneck lies on the use of an appropriate fitness function. This fitness function, usually cell growth, relies on the ability to quantitatively model the biochemical networks of the cell at the genome scale using parameters inferred from high-throughput data. Computational methods integrating such models in a common multilayer design platform can be used to automatically engineer synthetic genomes under physiological specifications. We describe the current state-of-the-art on automated methods for engineering or re-engineering synthetic genomes. We restrict ourselves to global models of metabolism, transcription and DNA structure. Although we are still far from the de novo computational genome design, it is important to collect all relevant work towards this goal. Finally, we discuss future perspectives about the practicability of an automated methodology for such computational design of synthetic genomes.
Collapse
Affiliation(s)
- Javier Carrera
- Instituto de Biología Molecular y Celular de Plantas, Consejo Superior de Investigaciones Científicas-UPV, 46022 València, Spain
| | | | | |
Collapse
|
38
|
Li G, Che D, Xu Y. A universal operon predictor for prokaryotic genomes. J Bioinform Comput Biol 2009; 7:19-38. [PMID: 19226658 DOI: 10.1142/s0219720009003984] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2007] [Revised: 02/21/2008] [Accepted: 04/22/2008] [Indexed: 11/18/2022]
Abstract
Identification of operons at the genome scale of prokaryotic organisms represents a key step in deciphering of their transcriptional regulation machinery, biological pathways, and networks. While numerous computational methods have been shown to be effective in predicting operons for well-studied organisms such as Escherichia coli K12 and Bacillus subtilis 168, these methods generally do not generalize well to genomes other than the ones used to train the methods, or closely related genomes because they rely on organism-specific information. Several methods have been explored to address this problem through utilizing only genomic structural information conserved across multiple organisms, but they all suffer from the issue of low prediction sensitivity. In this paper, we report a novel operon prediction method that is applicable to any prokaryotic genome with high prediction accuracy. The key idea of the method is to predict operons through identification of conserved gene clusters across multiple genomes and through deriving a key parameter relevant to the distribution of intergenic distances in genomes. We have implemented this method using a graph-theoretic approach, to calculate a set of maximum gene clusters in the target genome that are conserved across multiple reference genomes. Our computational results have shown that this method has higher prediction sensitivity as well as specificity than most of the published methods. We have carried out a preliminary study on operons unique to archaea and bacteria, respectively, and derived a number of interesting new insights about operons between these two kingdoms. The software and predicted operons of 365 prokaryotic genomes are available at http://csbl.bmb.uga.edu/~dongsheng/UNIPOP.
Collapse
Affiliation(s)
- Guojun Li
- CSBL, Department of Biochemistry and Molecular Biology, Department of Computer Science, University of Georgia, Athens, GA 30602, USA.
| | | | | |
Collapse
|
39
|
Bhardwaj N, Lu H. Co-expression among constituents of a motif in the protein-protein interaction network. J Bioinform Comput Biol 2009; 7:1-17. [PMID: 19226657 DOI: 10.1142/s0219720009003959] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2008] [Revised: 09/19/2008] [Accepted: 09/22/2008] [Indexed: 11/18/2022]
Abstract
Almost all cellular functions are the results of well-coordinated interactions between various proteins. A more connected hub or motif in the interaction network is expected to be more important, and any perturbation in this motif would be more damaging to the smooth performance of the related functions. Thus, some coherent robustness of these hubs has to be derived. Here, we provide the global evidence that interaction hubs obtain their robustness against uneven protein concentrations through co-expression of the constituents, and that the degree of co-expression correlates strongly with the complexity of the embedded motif. We calculated the gene expression correlations between the proteins embedded in 3-, 4-, 5-, and 6-node interaction motifs of increasing complexities, and compared them to those between proteins from random motifs of similar complexities. We find that as the connectedness of these motifs increases, there is higher co-expression between the constituent proteins. For example, when the expression correlation is 0.7, the kernel density of the correlation increases from 0.152 for 4-node motifs with three edges to 0.403 for 4-node cliques. This implies that the robustness of the interaction system emerges from a proportionate synchronicity among the constituents of the motif via co-expression. We further show that such biological coherence via co-expression of component proteins can be reinforced by integrating conservation data in the analysis. For example, with addition of evolutionary information from other genomes, the ratio of kernel density for interaction and random data in the case of 5- and 6-node cliques in yeast increases from 37.8 to 123 and 98.4 to 1300, respectively, given that the expression correlation is 0.8. Our results show that genes whose products are involved in motifs have transcription and translation properties that minimize the noise in final protein concentrations, compared to random sets of genes.
Collapse
Affiliation(s)
- Nitin Bhardwaj
- Bioinformatics Program, University of Illinois at Chicago, 820 S. Woods Street, Room 103, Chicago, IL 60607, USA.
| | | |
Collapse
|
40
|
Holbein S, Wengi A, Decourty L, Freimoser FM, Jacquier A, Dichtl B. Cordycepin interferes with 3' end formation in yeast independently of its potential to terminate RNA chain elongation. RNA (NEW YORK, N.Y.) 2009; 15:837-49. [PMID: 19324962 PMCID: PMC2673080 DOI: 10.1261/rna.1458909] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Cordycepin (3' deoxyadenosine) is a biologically active compound that, when incorporated during RNA synthesis in vitro, provokes chain termination due to the absence of a 3' hydroxyl moiety. We were interested in the effects mediated by this drug in vivo and analyzed its impact on RNA metabolism of yeast. Our results support the view that cordycepin-triphosphate (CoTP) is the toxic component that is limiting cell growth through inhibition of RNA synthesis. Unexpectedly, cordycepin treatment modulated 3' end heterogeneity of ACT1 and ASC1 mRNAs and rapidly induced extended transcripts derived from CYH2 and NEL025c loci. Moreover, cordycepin ameliorated the growth defects of poly(A) polymerase mutants and the pap1-1 mutation neutralized the effects of the drug on gene expression. Our observations are consistent with an epistatic relationship between poly(A) polymerase function and cordycepin action and suggest that a major mode of cordycepin activity reduces 3' end formation efficiency independently of its potential to terminate RNA chain elongation. Finally, chemical-genetic profiling revealed genome-wide pathways linked to cordycepin activity and identified novel genes involved in poly(A) homeostasis.
Collapse
Affiliation(s)
- Sandra Holbein
- Institute of Molecular Biology, University of Zürich, CH-8057 Zürich, Switzerland
| | | | | | | | | | | |
Collapse
|
41
|
Heath LS, Sioson AA. Semantics of multimodal network models. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2009; 6:271-280. [PMID: 19407351 DOI: 10.1109/tcbb.2007.70242] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
A multimodal network (MMN) is a novel graph-theoretic formalism designed to capture the structure of biological networks and to represent relationships derived from multiple biological databases. MMNs generalize the standard notions of graphs and hypergraphs, which are the bases of current diagrammatic representations of biological phenomena, and incorporate the concept of mode. Each vertex of an MMN is a biological entity, a biot, while each modal hyperedge is a typed relationship, where the type is given by the mode of the hyperedge. The semantics of each modal hyperedge e is given through denotational semantics, where a valuation function fe defines the relationship among the values of the vertices incident on e. The meaning of an MMN is denoted in terms of the semantics of a hyperedge sequence. A companion paper defines MMNs and concentrates on the structural aspects of MMNs. This paper develops MMN denotational semantics when used as a representation of the semantics of biological networks and discusses applications of MMNs in managing complex biological data.
Collapse
Affiliation(s)
- Lenwood S Heath
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061-0106, USA.
| | | |
Collapse
|
42
|
Wang H, Kakaradov B, Collins SR, Karotki L, Fiedler D, Shales M, Shokat KM, Walther TC, Krogan NJ, Koller D. A complex-based reconstruction of the Saccharomyces cerevisiae interactome. Mol Cell Proteomics 2009; 8:1361-81. [PMID: 19176519 PMCID: PMC2690481 DOI: 10.1074/mcp.m800490-mcp200] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Most cellular processes are performed by proteomic units that interact with each other. These units are often stoichiometrically stable complexes comprised of several proteins. To obtain a faithful view of the protein interactome we must view it in terms of these basic units (complexes and proteins) and the interactions between them. This study makes two contributions toward this goal. First, it provides a new algorithm for reconstruction of stable complexes from a variety of heterogeneous biological assays; our approach combines state-of-the-art machine learning methods with a novel hierarchical clustering algorithm that allows clusters to overlap. We demonstrate that our approach constructs over 40% more known complexes than other recent methods and that the complexes it produces are more biologically coherent even compared with the reference set. We provide experimental support for some of our novel predictions, identifying both a new complex involved in nutrient starvation and a new component of the eisosome complex. Second, we provide a high accuracy algorithm for the novel problem of predicting transient interactions involving complexes. We show that our complex level network, which we call ComplexNet, provides novel insights regarding the protein-protein interaction network. In particular, we reinterpret the finding that “hubs” in the network are enriched for being essential, showing instead that essential proteins tend to be clustered together in essential complexes and that these essential complexes tend to be large.
Collapse
Affiliation(s)
- Haidong Wang
- Computer Science Department, Stanford University, Stanford, California 94305, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Chervitz SA, Parkinson H, Fostel JM, Causton HC, Sanson SA, Deutsch EW, Field D, Taylor CF, Rocca-Serra P, White J, Stoeckert CJ. Standards for Functional Genomics. Bioinformatics 2009. [DOI: 10.1007/978-0-387-92738-1_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
|
44
|
Wilflingseder J, Perco P, Kainz A, Korbély R, Mayer B, Oberbauer R. Biocompatibility of haemodialysis membranes determined by gene expression of human leucocytes: a crossover study. Eur J Clin Invest 2008; 38:918-24. [PMID: 19021716 DOI: 10.1111/j.1365-2362.2008.02050.x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
BACKGROUND Biocompatibility of haemodialysis membranes is the most important quality criteria to enable long-term dialysis without major harmful effects. This study sought to evaluate the differences of genomic signatures derived from peripheral blood mononuclear cells (PBMC) in patients undergoing haemodialysis treatment using two different dialyser membranes: one semi-synthetic and one full-synthetic membrane. DESIGN Microarray experiments were conducted in PBMCs of four stable haemodialysis patients before and after dialysis comparing semi-synthetic (Hemophan GFS Plus 16) and full-synthetic (Hemoflow FX80) dialysis membranes, respectively. Genes differentially expressed when comparing the two different membranes used were analysed in order to elucidate the underlying molecular mechanisms affecting PBMCs in the course of dialysis treatment. RESULTS One hundred and seventy-two genes were identified as up-regulated after treatment with semi-synthetic membranes when compared to full-synthetic membranes. These genes could be assigned to processes including immunity and defence, signal transduction, and apoptosis. Dialysis with a full-synthetic membrane, on the other hand, led to an activation of 72 genes that were mainly involved in cell cycle and cell cycle control. CONCLUSION The over-representation of genes belonging to immunity/defence, signal transduction, and apoptosis as found with semi-synthetic membranes suggests that full-synthetic membranes are more biocompatible than semi-synthetic membranes.
Collapse
|
45
|
Giannakeas N, Fotiadis DI. An automated method for gridding and clustering-based segmentation of cDNA microarray images. Comput Med Imaging Graph 2008; 33:40-9. [PMID: 19046850 DOI: 10.1016/j.compmedimag.2008.10.003] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2008] [Revised: 09/18/2008] [Accepted: 10/06/2008] [Indexed: 10/21/2022]
Abstract
Microarrays are widely used to quantify gene expression levels. Microarray image analysis is one of the tools, which are necessary when dealing with vast amounts of biological data. In this work we propose a new method for the automated analysis of microarray images. The proposed method consists of two stages: gridding and segmentation. Initially, the microarray images are preprocessed using template matching, and block and spot finding takes place. Then, the non-expressed spots are detected and a grid is fit on the image using a Voronoi diagram. In the segmentation stage, K-means and Fuzzy C means (FCM) clustering are employed. The proposed method was evaluated using images from the Stanford Microarray Database (SMD). The results that are presented in the segmentation stage show the efficiency of our Fuzzy C means-based work compared to the two already developed K-means-based methods. The proposed method can handle images with artefacts and it is fully automated.
Collapse
Affiliation(s)
- Nikolaos Giannakeas
- Laboratory of Biological Chemistry, Medical School, University of Ioannina, Ioannina, Greece
| | | |
Collapse
|
46
|
Abstract
The revolution in high throughput biology experiments producing genome-scale data has heightened the challenge of integrating functional genomics data. Data integration is essential for making reliable inferences from functional genomics data, as the datasets are neither error-free nor comprehensive. However, there are two major hurdles in data integration: heterogeneity and correlation of the data to be integrated. These problems can be circumvented by quantitative testing of all data in the same unified scoring scheme, and by using integration methods appropriate for handling correlated data. This chapter describes such a functional genomics data integration method designed to estimate the "functional coupling" between genes, applied to the baker's yeast Saccharomyces cerevisiae. The integrated dataset outperforms individual functional genomics datasets in both accuracy and coverage, leading to more reliable and comprehensive predictions of gene function. The approach is easily applied to multicellular organisms, including human.
Collapse
|
47
|
Wilflingseder J, Kainz A, Perco P, Korbely R, Mayer B, Oberbauer R. Molecular predictors for anaemia after kidney transplantation. Nephrol Dial Transplant 2008; 24:1015-23. [DOI: 10.1093/ndt/gfn683] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
48
|
Tomlinson C, Thimma M, Alexandrakis S, Castillo T, Dennis JL, Brooks A, Bradley T, Turnbull C, Blaveri E, Barton G, Chiba N, Maratou K, Soutter P, Aitman T, Game L. MiMiR--an integrated platform for microarray data sharing, mining and analysis. BMC Bioinformatics 2008; 9:379. [PMID: 18801157 PMCID: PMC2572073 DOI: 10.1186/1471-2105-9-379] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2008] [Accepted: 09/18/2008] [Indexed: 11/10/2022] Open
Abstract
Background Despite considerable efforts within the microarray community for standardising data format, content and description, microarray technologies present major challenges in managing, sharing, analysing and re-using the large amount of data generated locally or internationally. Additionally, it is recognised that inconsistent and low quality experimental annotation in public data repositories significantly compromises the re-use of microarray data for meta-analysis. MiMiR, the Microarray data Mining Resource was designed to tackle some of these limitations and challenges. Here we present new software components and enhancements to the original infrastructure that increase accessibility, utility and opportunities for large scale mining of experimental and clinical data. Results A user friendly Online Annotation Tool allows researchers to submit detailed experimental information via the web at the time of data generation rather than at the time of publication. This ensures the easy access and high accuracy of meta-data collected. Experiments are programmatically built in the MiMiR database from the submitted information and details are systematically curated and further annotated by a team of trained annotators using a new Curation and Annotation Tool. Clinical information can be annotated and coded with a clinical Data Mapping Tool within an appropriate ethical framework. Users can visualise experimental annotation, assess data quality, download and share data via a web-based experiment browser called MiMiR Online. All requests to access data in MiMiR are routed through a sophisticated middleware security layer thereby allowing secure data access and sharing amongst MiMiR registered users prior to publication. Data in MiMiR can be mined and analysed using the integrated EMAAS open source analysis web portal or via export of data and meta-data into Rosetta Resolver data analysis package. Conclusion The new MiMiR suite of software enables systematic and effective capture of extensive experimental and clinical information with the highest MIAME score, and secure data sharing prior to publication. MiMiR currently contains more than 150 experiments corresponding to over 3000 hybridisations and supports the Microarray Centre's large microarray user community and two international consortia. The MiMiR flexible and scalable hardware and software architecture enables secure warehousing of thousands of datasets, including clinical studies, from microarray and potentially other -omics technologies.
Collapse
Affiliation(s)
- Chris Tomlinson
- Microarray Centre, MRC Clinical Sciences Centre and Imperial College, Hammersmith Hospital, London, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Kim WK, Krumpelman C, Marcotte EM. Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy. Genome Biol 2008; 9 Suppl 1:S5. [PMID: 18613949 PMCID: PMC2447539 DOI: 10.1186/gb-2008-9-s1-s5] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
The complete set of mouse genes, as with the set of human genes, is still largely uncharacterized, with many pieces of experimental evidence accumulating regarding the activities and expression of the genes, but the majority of genes as yet still of unknown function. Within the context of the MouseFunc competition, we developed and applied two distinct large-scale data mining approaches to infer the functions (Gene Ontology annotations) of mouse genes from experimental observations from available functional genomics, proteomics, comparative genomics, and phenotypic data. The two strategies — the first using classifiers to map features to annotations, the second propagating annotations from characterized genes to uncharacterized genes along edges in a network constructed from the features — offer alternative and possibly complementary approaches to providing functional annotations. Here, we re-implement and evaluate these approaches and their combination for their ability to predict the proper functional annotations of genes in the MouseFunc data set. We show that, when controlling for the same set of input features, the network approach generally outperformed a naïve Bayesian classifier approach, while their combination offers some improvement over either independently. We make our observations of predictive performance on the MouseFunc competition hold-out set, as well as on a ten-fold cross-validation of the MouseFunc data. Across all 1,339 annotated genes in the MouseFunc test set, the median predictive power was quite strong (median area under a receiver operating characteristic plot of 0.865 and average precision of 0.195), indicating that a mining-based strategy with existing data is a promising path towards discovering mammalian gene functions. As one product of this work, a high-confidence subset of the functional mouse gene network was produced — spanning >70% of mouse genes with >1.6 million associations — that is predictive of mouse (and therefore often human) gene function and functional associations. The network should be generally useful for mammalian gene functional analyses, such as for predicting interactions, inferring functional connections between genes and pathways, and prioritizing candidate genes. The network and all predictions are available on the worldwide web.
Collapse
Affiliation(s)
- Wan Kyu Kim
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas at Austin, Speedway, Austin, Texas 78712, USA
| | | | | |
Collapse
|
50
|
Tan MP, Smith EN, Broach JR, Floudas CA. Microarray data mining: a novel optimization-based approach to uncover biologically coherent structures. BMC Bioinformatics 2008; 9:268. [PMID: 18538024 PMCID: PMC2442101 DOI: 10.1186/1471-2105-9-268] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2007] [Accepted: 06/06/2008] [Indexed: 11/16/2022] Open
Abstract
Background DNA microarray technology allows for the measurement of genome-wide expression patterns. Within the resultant mass of data lies the problem of analyzing and presenting information on this genomic scale, and a first step towards the rapid and comprehensive interpretation of this data is gene clustering with respect to the expression patterns. Classifying genes into clusters can lead to interesting biological insights. In this study, we describe an iterative clustering approach to uncover biologically coherent structures from DNA microarray data based on a novel clustering algorithm EP_GOS_Clust. Results We apply our proposed iterative algorithm to three sets of experimental DNA microarray data from experiments with the yeast Saccharomyces cerevisiae and show that the proposed iterative approach improves biological coherence. Comparison with other clustering techniques suggests that our iterative algorithm provides superior performance with regard to biological coherence. An important consequence of our approach is that an increasing proportion of genes find membership in clusters of high biological coherence and that the average cluster specificity improves. Conclusion The results from these clustering experiments provide a robust basis for extracting motifs and trans-acting factors that determine particular patterns of expression. In addition, the biological coherence of the clusters is iteratively assessed independently of the clustering. Thus, this method will not be severely impacted by functional annotations that are missing, inaccurate, or sparse.
Collapse
Affiliation(s)
- Meng P Tan
- Department of Chemical Engineering, Princeton University, NJ, USA.
| | | | | | | |
Collapse
|