1
|
Askari A, Kota S, Ferrell H, Swamy S, Goodman K, Okoro C, Spruell Crenshaw I, Hernandez D, Oliphant T, Badrayani A, Ellington A, Stovall G. UTexas Aptamer Database: the collection and long-term preservation of aptamer sequence information. Nucleic Acids Res 2024; 52:D351-D359. [PMID: 37904593 PMCID: PMC10767891 DOI: 10.1093/nar/gkad959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Revised: 09/29/2023] [Accepted: 10/13/2023] [Indexed: 11/01/2023] Open
Abstract
A growing interest in aptamer research, as evidenced by the increase in aptamer publications over the years, has led to calls for a go-to site for aptamer information. A comprehensive, publicly available aptamer dataset, which may be a repository for aptamer data, standardize aptamer reporting, and generate opportunities to expand current research in the field, could meet such a demand. There have been several attempts to create aptamer databases; however, most have been abandoned or removed entirely from public view. Inspired by previous efforts, we have published the UTexas Aptamer Database, https://sites.utexas.edu/aptamerdatabase, which includes a publicly available aptamer dataset and a searchable database containing a subset of all aptamer data collected to date (1990-2022). The dataset contains aptamer sequences, binding and selection information. The information is regularly reviewed internally to ensure accuracy and consistency across all entries. To support the continued curation and review of aptamer sequence information, we have implemented sustaining mechanisms, including researcher training protocols, an aptamer submission form, data stored separately from the database platform, and a growing team of researchers committed to updating the database. Currently, the UTexas Aptamer Database is the largest in terms of the number of aptamer sequences with 1,443 internally reviewed aptamer records.
Collapse
Affiliation(s)
- Ali Askari
- Freshman Research Initiative, The University of Texas, Austin, TX 78712, USA
| | - Sumedha Kota
- Freshman Research Initiative, The University of Texas, Austin, TX 78712, USA
| | - Hailey Ferrell
- Freshman Research Initiative, The University of Texas, Austin, TX 78712, USA
| | - Shriya Swamy
- Freshman Research Initiative, The University of Texas, Austin, TX 78712, USA
| | - Kayla S Goodman
- Freshman Research Initiative, The University of Texas, Austin, TX 78712, USA
| | - Christine C Okoro
- Freshman Research Initiative, The University of Texas, Austin, TX 78712, USA
| | | | - Daniela K Hernandez
- Freshman Research Initiative, The University of Texas, Austin, TX 78712, USA
| | - Taylor E Oliphant
- Freshman Research Initiative, The University of Texas, Austin, TX 78712, USA
| | - Akshata A Badrayani
- Freshman Research Initiative, The University of Texas, Austin, TX 78712, USA
| | - Andrew D Ellington
- Institute for Molecular Biosciences, The University of Texas, Austin, TX 78712, USA
- Center for Systems and Synthetic Biology, The University of Texas, Austin, TX 78712, USA
| | - Gwendolyn M Stovall
- Freshman Research Initiative, The University of Texas, Austin, TX 78712, USA
- Institute for Molecular Biosciences, The University of Texas, Austin, TX 78712, USA
- High School Research Initiative, The University of Texas, Austin, TX 78712, USA
| |
Collapse
|
2
|
Martorelli I, Helwerda LS, Kerkvliet J, Gomes SIF, Nuytinck J, van der Werff CRA, Ramackers GJ, Gultyaev AP, Merckx VSFT, Verbeek FJ. Fungal metabarcoding data integration framework for the MycoDiversity DataBase (MDDB). J Integr Bioinform 2020; 17:jib-2019-0046. [PMID: 32463383 PMCID: PMC7734503 DOI: 10.1515/jib-2019-0046] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Accepted: 04/20/2020] [Indexed: 11/15/2022] Open
Abstract
Fungi have crucial roles in ecosystems, and are important associates for many organisms. They are adapted to a wide variety of habitats, however their global distribution and diversity remains poorly documented. The exponential growth of DNA barcode information retrieved from the environment is assisting considerably the traditional ways for unraveling fungal diversity and detection. The raw DNA data in association to environmental descriptors of metabarcoding studies are made available in public sequence read archives. While this is potentially a valuable source of information for the investigation of Fungi across diverse environmental conditions, the annotation used to describe environment is heterogenous. Moreover, a uniform processing pipeline still needs to be applied to the available raw DNA data. Hence, a comprehensive framework to analyses these data in a large context is still lacking. We introduce the MycoDiversity DataBase, a database which includes public fungal metabarcoding data of environmental samples for the study of biodiversity patterns of Fungi. The framework we propose will contribute to our understanding of fungal biodiversity and aims to become a valuable source for large-scale analyses of patterns in space and time, in addition to assisting evolutionary and ecological research on Fungi.
Collapse
Affiliation(s)
- Irene Martorelli
- Leiden Institute of Advanced Computer Science (LIACS), Leiden University, Leiden, The Netherlands
- Understanding Evolution, Naturalis Biodiversity Center, Leiden, The Netherlands
| | - Leon S. Helwerda
- Leiden Institute of Advanced Computer Science (LIACS), Leiden University, Leiden, The Netherlands
| | - Jesse Kerkvliet
- Understanding Evolution, Naturalis Biodiversity Center, Leiden, The Netherlands
| | - Sofia I. F. Gomes
- Understanding Evolution, Naturalis Biodiversity Center, Leiden, The Netherlands
| | - Jorinde Nuytinck
- Understanding Evolution, Naturalis Biodiversity Center, Leiden, The Netherlands
| | | | - Guus J. Ramackers
- Leiden Institute of Advanced Computer Science (LIACS), Leiden University, Leiden, The Netherlands
| | - Alexander P. Gultyaev
- Leiden Institute of Advanced Computer Science (LIACS), Leiden University, Leiden, The Netherlands
| | - Vincent S. F. T. Merckx
- Understanding Evolution, Naturalis Biodiversity Center, Leiden, The Netherlands
- Department of Evolutionary and Population Biology, Institute for Biodiversity and Ecosystem Dynamics, University of Amsterdam, Amsterdam, The Netherlands
| | - Fons J. Verbeek
- Leiden Institute of Advanced Computer Science (LIACS), Leiden University, Leiden, The Netherlands
| |
Collapse
|
3
|
Das R, Keep B, Washington P, Riedel-Kruse IH. Scientific Discovery Games for Biomedical Research. Annu Rev Biomed Data Sci 2019; 2:253-279. [PMID: 34308269 DOI: 10.1146/annurev-biodatasci-072018-021139] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
Over the past decade, scientific discovery games (SDGs) have emerged as a viable approach for biomedical research, engaging hundreds of thousands of volunteer players and resulting in numerous scientific publications. After describing the origins of this novel research approach, we review the scientific output of SDGs across molecular modeling, sequence alignment, neuroscience, pathology, cellular biology, genomics, and human cognition. We find compelling results and technical innovations arising in problem-oriented games such as Foldit and Eterna and in data-oriented games such as EyeWire and Project Discovery. We discuss emergent properties of player communities shared across different projects, including the diversity of communities and the extraordinary contributions of some volunteers, such as paper writing. Finally, we highlight connections to artificial intelligence, biological cloud laboratories, new game genres, science education, and open science that may drive the next generation of SDGs.
Collapse
Affiliation(s)
- Rhiju Das
- Department of Biochemistry and Department of Physics, Stanford University, Stanford, California 94305, USA
| | - Benjamin Keep
- Department of Learning Sciences, Stanford University, Stanford, California 94305, USA
| | - Peter Washington
- Department of Bioengineering, Stanford University, Stanford, California 94305, USA
| | | |
Collapse
|
4
|
C L B, S Nair A. Benchmark Dataset for Whole Genome Sequence Compression. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1228-1236. [PMID: 27214907 DOI: 10.1109/tcbb.2016.2568186] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
UNLABELLED The research in DNA data compression lacks a standard dataset to test out compression tools specific to DNA. This paper argues that the current state of achievement in DNA compression is unable to be benchmarked in the absence of such scientifically compiled whole genome sequence dataset and proposes a benchmark dataset using multistage sampling procedure. Considering the genome sequence of organisms available in the National Centre for Biotechnology and Information (NCBI) as the universe, the proposed dataset selects 1,105 prokaryotes, 200 plasmids, 164 viruses, and 65 eukaryotes. This paper reports the results of using three established tools on the newly compiled dataset and show that their strength and weakness are evident only with a comparison based on the scientifically compiled benchmark dataset. AVAILABILITY The sample dataset and the respective links are available @ https://sourceforge.net/projects/benchmarkdnacompressiondataset/.
Collapse
|
5
|
Cai Y, Li P, Li XW, Zhao J, Chen H, Yang Q, Hu H. Converting Panax ginseng DNA and chemical fingerprints into two-dimensional barcode. J Ginseng Res 2017; 41:339-346. [PMID: 28701875 PMCID: PMC5489764 DOI: 10.1016/j.jgr.2016.06.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Revised: 06/22/2016] [Accepted: 06/29/2016] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND In this study, we investigated how to convert the Panax ginseng DNA sequence code and chemical fingerprints into a two-dimensional code. In order to improve the compression efficiency, GATC2Bytes and digital merger compression algorithms are proposed. METHODS HPLC chemical fingerprint data of 10 groups of P. ginseng from Northeast China and the internal transcribed spacer 2 (ITS2) sequence code as the DNA sequence code were ready for conversion. In order to convert such data into a two-dimensional code, the following six steps were performed: First, the chemical fingerprint characteristic data sets were obtained through the inflection filtering algorithm. Second, precompression processing of such data sets is undertaken. Third, precompression processing was undertaken with the P. ginseng DNA (ITS2) sequence codes. Fourth, the precompressed chemical fingerprint data and the DNA (ITS2) sequence code were combined in accordance with the set data format. Such combined data can be compressed by Zlib, an open source data compression algorithm. Finally, the compressed data generated a two-dimensional code called a quick response code (QR code). RESULTS Through the abovementioned converting process, it can be found that the number of bytes needed for storing P. ginseng chemical fingerprints and its DNA (ITS2) sequence code can be greatly reduced. After GTCA2Bytes algorithm processing, the ITS2 compression rate reaches 75% and the chemical fingerprint compression rate exceeds 99.65% via filtration and digital merger compression algorithm processing. Therefore, the overall compression ratio even exceeds 99.36%. The capacity of the formed QR code is around 0.5k, which can easily and successfully be read and identified by any smartphone. CONCLUSION P. ginseng chemical fingerprints and its DNA (ITS2) sequence code can form a QR code after data processing, and therefore the QR code can be a perfect carrier of the authenticity and quality of P. ginseng information. This study provides a theoretical basis for the development of a quality traceability system of traditional Chinese medicine based on a two-dimensional code.
Collapse
Affiliation(s)
- Yong Cai
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, Macao
- Information Technology College of Beijing Normal University Zhuhai Campus, Zhuhai City, China
| | - Peng Li
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, Macao
| | - Xi-Wen Li
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, Macao
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, China
| | - Jing Zhao
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, Macao
| | - Hai Chen
- Information Technology College of Beijing Normal University Zhuhai Campus, Zhuhai City, China
| | - Qing Yang
- State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu, Sichuan, China
| | - Hao Hu
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, Macao
- State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu, Sichuan, China
| |
Collapse
|
6
|
Santhosh R, Satheesh SN, Gurusaran M, Michael D, Sekar K, Jeyakanthan J. NIMS: a database on nucleobase compounds and their interactions in macromolecular structures. J Appl Crystallogr 2016. [DOI: 10.1107/s1600576716006208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
The intense exploration of nucleotide-binding protein structures has created a whirlwind in the field of structural biology and bioinformatics. This has led to the conception and birth of NIMS. This database is a collection of detailed data on the nucleobases, nucleosides and nucleotides, along with their analogues as well as the protein structures to which they bind. Interaction details such as the interacting residues and all associated values have been made available. As a pioneering step, the diffraction precision index for protein structures, the atomic uncertainty for each atom, and the computed errors on the interatomic distances and angles are available in the database. Apart from the above, provision has been made to visualize the three-dimensional structures of both ligands and protein–ligand structures and their interactions inJmolas well asJSmol. One of the salient features of NIMS is that it has been interfaced with a user-friendly and query-based efficient search engine. It was conceived and developed with the aim of serving a significant section of researchers working in the area of protein and nucleobase complexes. NIMS is freely available online at http://iris.physics.iisc.ernet.in/nims and it is hoped that it will prove to be an invaluable asset.
Collapse
|
7
|
Abstract
UNLABELLED Metagenomic data, which contains sequenced DNA reads of uncultured microbial species from environmental samples, provide a unique opportunity to thoroughly analyze microbial species that have never been identified before. Reconstructing 16S ribosomal RNA, a phylogenetic marker gene, is usually required to analyze the composition of the metagenomic data. However, massive volume of dataset, high sequence similarity between related species, skewed microbial abundance and lack of reference genes make 16S rRNA reconstruction difficult. Generic de novo assembly tools are not optimized for assembling 16S rRNA genes. In this work, we introduce a targeted rRNA assembly tool, REAGO (REconstruct 16S ribosomal RNA Genes from metagenOmic data). It addresses the above challenges by combining secondary structure-aware homology search, zproperties of rRNA genes and de novo assembly. Our experimental results show that our tool can correctly recover more rRNA genes than several popular generic metagenomic assembly tools and specially designed rRNA construction tools. AVAILABILITY AND IMPLEMENTATION The source code of REAGO is freely available at https://github.com/chengyuan/reago.
Collapse
Affiliation(s)
- Cheng Yuan
- Computer Science and Engineering, Michigan State Univerisity, 428 South Shaw Rd East Lansing, MI 48824, USA and Center for Microbial Ecology, Michigan State University, East Lansing, MI 48824, USA
| | - Jikai Lei
- Computer Science and Engineering, Michigan State Univerisity, 428 South Shaw Rd East Lansing, MI 48824, USA and Center for Microbial Ecology, Michigan State University, East Lansing, MI 48824, USA
| | - James Cole
- Computer Science and Engineering, Michigan State Univerisity, 428 South Shaw Rd East Lansing, MI 48824, USA and Center for Microbial Ecology, Michigan State University, East Lansing, MI 48824, USA
| | - Yanni Sun
- Computer Science and Engineering, Michigan State Univerisity, 428 South Shaw Rd East Lansing, MI 48824, USA and Center for Microbial Ecology, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
8
|
|
9
|
AL-Rawajfah OM, Aloush S, Hewitt JB. Use of Electronic Health-Related Datasets in Nursing and Health-Related Research. West J Nurs Res 2014; 37:952-83. [DOI: 10.1177/0193945914558426] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Datasets of gigabyte size are common in medical sciences. There is increasing consensus that significant untapped knowledge lies hidden in these large datasets. This review article aims to discuss Electronic Health-Related Datasets (EHRDs) in terms of types, features, advantages, limitations, and possible use in nursing and health-related research. Major scientific databases, MEDLINE, ScienceDirect, and Scopus, were searched for studies or review articles regarding using EHRDs in research. A total number of 442 articles were located. After application of study inclusion criteria, 113 articles were included in the final review. EHRDs were categorized into Electronic Administrative Health-Related Datasets and Electronic Clinical Health-Related Datasets. Subcategories of each major category were identified. EHRDs are invaluable assets for nursing the health-related research. Advanced research skills such as using analytical softwares, advanced statistical procedures, dealing with missing data and missing variables will maximize the efficient utilization of EHRDs in research.
Collapse
|
10
|
A self-adaptive intelligent single-particle optimizer compression algorithm. Neural Comput Appl 2014. [DOI: 10.1007/s00521-014-1609-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
11
|
Kolesnikov N, Hastings E, Keays M, Melnichuk O, Tang YA, Williams E, Dylag M, Kurbatova N, Brandizi M, Burdett T, Megy K, Pilicheva E, Rustici G, Tikhonov A, Parkinson H, Petryszak R, Sarkans U, Brazma A. ArrayExpress update--simplifying data submissions. Nucleic Acids Res 2014; 43:D1113-6. [PMID: 25361974 PMCID: PMC4383899 DOI: 10.1093/nar/gku1057] [Citation(s) in RCA: 499] [Impact Index Per Article: 49.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The ArrayExpress Archive of Functional Genomics Data (http://www.ebi.ac.uk/arrayexpress) is an international functional genomics database at the European Bioinformatics Institute (EMBL-EBI) recommended by most journals as a repository for data supporting peer-reviewed publications. It contains data from over 7000 public sequencing and 42 000 array-based studies comprising over 1.5 million assays in total. The proportion of sequencing-based submissions has grown significantly over the last few years and has doubled in the last 18 months, whilst the rate of microarray submissions is growing slightly. All data in ArrayExpress are available in the MAGE-TAB format, which allows robust linking to data analysis and visualization tools and standardized analysis. The main development over the last two years has been the release of a new data submission tool Annotare, which has reduced the average submission time almost 3-fold. In the near future, Annotare will become the only submission route into ArrayExpress, alongside MAGE-TAB format-based pipelines. ArrayExpress is a stable and highly accessed resource. Our future tasks include automation of data flows and further integration with other EMBL-EBI resources for the representation of multi-omics data.
Collapse
Affiliation(s)
- Nikolay Kolesnikov
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Emma Hastings
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Maria Keays
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Olga Melnichuk
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Y Amy Tang
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Eleanor Williams
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Miroslaw Dylag
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Natalja Kurbatova
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Marco Brandizi
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Tony Burdett
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Karyn Megy
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Ekaterina Pilicheva
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Gabriella Rustici
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK School of Biological Sciences, Cambridge Systems Biology Centre, Tennis Court Road, Cambridge, CB2 1QR, UK
| | - Andrew Tikhonov
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Helen Parkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Robert Petryszak
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Ugis Sarkans
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Alvis Brazma
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| |
Collapse
|
12
|
Papanikolaou N, Pavlopoulos GA, Pafilis E, Theodosiou T, Schneider R, Satagopam VP, Ouzounis CA, Eliopoulos AG, Promponas VJ, Iliopoulos I. BioTextQuest(+): a knowledge integration platform for literature mining and concept discovery. ACTA ACUST UNITED AC 2014; 30:3249-56. [PMID: 25100685 DOI: 10.1093/bioinformatics/btu524] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
SUMMARY The iterative process of finding relevant information in biomedical literature and performing bioinformatics analyses might result in an endless loop for an inexperienced user, considering the exponential growth of scientific corpora and the plethora of tools designed to mine PubMed(®) and related biological databases. Herein, we describe BioTextQuest(+), a web-based interactive knowledge exploration platform with significant advances to its predecessor (BioTextQuest), aiming to bridge processes such as bioentity recognition, functional annotation, document clustering and data integration towards literature mining and concept discovery. BioTextQuest(+) enables PubMed and OMIM querying, retrieval of abstracts related to a targeted request and optimal detection of genes, proteins, molecular functions, pathways and biological processes within the retrieved documents. The front-end interface facilitates the browsing of document clustering per subject, the analysis of term co-occurrence, the generation of tag clouds containing highly represented terms per cluster and at-a-glance popup windows with information about relevant genes and proteins. Moreover, to support experimental research, BioTextQuest(+) addresses integration of its primary functionality with biological repositories and software tools able to deliver further bioinformatics services. The Google-like interface extends beyond simple use by offering a range of advanced parameterization for expert users. We demonstrate the functionality of BioTextQuest(+) through several exemplary research scenarios including author disambiguation, functional term enrichment, knowledge acquisition and concept discovery linking major human diseases, such as obesity and ageing. AVAILABILITY The service is accessible at http://bioinformatics.med.uoc.gr/biotextquest. CONTACT g.pavlopoulos@gmail.com or georgios.pavlopoulos@esat.kuleuven.be SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nikolas Papanikolaou
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| | - Georgios A Pavlopoulos
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| | - Evangelos Pafilis
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| | - Theodosios Theodosiou
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| | - Reinhard Schneider
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| | - Venkata P Satagopam
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| | - Christos A Ouzounis
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| | - Aristides G Eliopoulos
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| | - Vasilis J Promponas
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| | - Ioannis Iliopoulos
- Division of Basic Sciences, University of Crete, Medical School, Heraklion 71110, Greece, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, 7, avenue des Hauts-Fourneaux, L-4362 Esch sur Alzette, Luxembourg, Biological Computation & Process Laboratory (BCPL), Chemical Process & Energy Resources Institute (CPERI), Centre for Research & Technology Hellas (CERTH), PO Box 361, GR-57001 Thessalonica, Greece, Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology Hellas, 70013 Heraklion, Crete, Greece and Department of Biological Sciences, Bioinformatics Research Laboratory, University of Cyprus, PO Box 20537, CY 1678, Nicosia, Cyprus
| |
Collapse
|
13
|
Alderson RG, De Ferrari L, Mavridis L, McDonagh JL, Mitchell JBO, Nath N. Enzyme informatics. Curr Top Med Chem 2014; 12:1911-23. [PMID: 23116471 DOI: 10.2174/156802612804547353] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2012] [Revised: 09/12/2012] [Accepted: 09/15/2012] [Indexed: 12/18/2022]
Abstract
Over the last 50 years, sequencing, structural biology and bioinformatics have completely revolutionised biomolecular science, with millions of sequences and tens of thousands of three dimensional structures becoming available. The bioinformatics of enzymes is well served by, mostly free, online databases. BRENDA describes the chemistry, substrate specificity, kinetics, preparation and biological sources of enzymes, while KEGG is valuable for understanding enzymes and metabolic pathways. EzCatDB, SFLD and MACiE are key repositories for data on the chemical mechanisms by which enzymes operate. At the current rate of genome sequencing and manual annotation, human curation will never finish the functional annotation of the ever-expanding list of known enzymes. Hence there is an increasing need for automated annotation, though it is not yet widespread for enzyme data. In contrast, functional ontologies such as the Gene Ontology already profit from automation. Despite our growing understanding of enzyme structure and dynamics, we are only beginning to be able to design novel enzymes. One can now begin to trace the functional evolution of enzymes using phylogenetics. The ability of enzymes to perform secondary functions, albeit relatively inefficiently, gives clues as to how enzyme function evolves. Substrate promiscuity in enzymes is one example of imperfect specificity in protein-ligand interactions. Similarly, most drugs bind to more than one protein target. This may sometimes result in helpful polypharmacology as a drug modulates plural targets, but also often leads to adverse side-effects. Many chemoinformatics approaches can be used to model the interactions between druglike molecules and proteins in silico. We can even use quantum chemical techniques like DFT and QM/MM to compute the structural and energetic course of enzyme catalysed chemical reaction mechanisms, including a full description of bond making and breaking.
Collapse
Affiliation(s)
- Rosanna G Alderson
- Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, Purdie Building, University of St Andrews, North Haugh, St Andrews, Scotland, UK
| | | | | | | | | | | |
Collapse
|
14
|
Fujita KA, Ostaszewski M, Matsuoka Y, Ghosh S, Glaab E, Trefois C, Crespo I, Perumal TM, Jurkowski W, Antony PMA, Diederich N, Buttini M, Kodama A, Satagopam VP, Eifes S, del Sol A, Schneider R, Kitano H, Balling R. Integrating pathways of Parkinson's disease in a molecular interaction map. Mol Neurobiol 2014; 49:88-102. [PMID: 23832570 PMCID: PMC4153395 DOI: 10.1007/s12035-013-8489-4] [Citation(s) in RCA: 162] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2013] [Accepted: 06/13/2013] [Indexed: 12/12/2022]
Abstract
Parkinson's disease (PD) is a major neurodegenerative chronic disease, most likely caused by a complex interplay of genetic and environmental factors. Information on various aspects of PD pathogenesis is rapidly increasing and needs to be efficiently organized, so that the resulting data is available for exploration and analysis. Here we introduce a computationally tractable, comprehensive molecular interaction map of PD. This map integrates pathways implicated in PD pathogenesis such as synaptic and mitochondrial dysfunction, impaired protein degradation, alpha-synuclein pathobiology and neuroinflammation. We also present bioinformatics tools for the analysis, enrichment and annotation of the map, allowing the research community to open new avenues in PD research. The PD map is accessible at http://minerva.uni.lu/pd_map .
Collapse
Affiliation(s)
| | - Marek Ostaszewski
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 7, Avenue des Hauts-Fourneaux, Esch-sur-Alzette, Luxembourg
- Integrated Biobank of Luxembourg, Luxembourg City, Luxembourg
| | | | - Samik Ghosh
- The Systems Biology Institute, Minato-ku, Tokyo, Japan
| | - Enrico Glaab
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 7, Avenue des Hauts-Fourneaux, Esch-sur-Alzette, Luxembourg
| | - Christophe Trefois
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 7, Avenue des Hauts-Fourneaux, Esch-sur-Alzette, Luxembourg
| | - Isaac Crespo
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 7, Avenue des Hauts-Fourneaux, Esch-sur-Alzette, Luxembourg
| | - Thanneer M. Perumal
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 7, Avenue des Hauts-Fourneaux, Esch-sur-Alzette, Luxembourg
| | - Wiktor Jurkowski
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 7, Avenue des Hauts-Fourneaux, Esch-sur-Alzette, Luxembourg
| | - Paul M. A. Antony
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 7, Avenue des Hauts-Fourneaux, Esch-sur-Alzette, Luxembourg
| | - Nico Diederich
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 7, Avenue des Hauts-Fourneaux, Esch-sur-Alzette, Luxembourg
- Department of Neuroscience, Centre Hospitalier Luxembourg, Luxembourg City, Luxembourg
| | - Manuel Buttini
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 7, Avenue des Hauts-Fourneaux, Esch-sur-Alzette, Luxembourg
| | - Akihiko Kodama
- Faculty of Medicine, Tokyo Medical and Dental University, Tokyo, Japan
| | - Venkata P. Satagopam
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 7, Avenue des Hauts-Fourneaux, Esch-sur-Alzette, Luxembourg
- Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Serge Eifes
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 7, Avenue des Hauts-Fourneaux, Esch-sur-Alzette, Luxembourg
| | - Antonio del Sol
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 7, Avenue des Hauts-Fourneaux, Esch-sur-Alzette, Luxembourg
| | - Reinhard Schneider
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 7, Avenue des Hauts-Fourneaux, Esch-sur-Alzette, Luxembourg
- Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Hiroaki Kitano
- The Systems Biology Institute, Minato-ku, Tokyo, Japan
- Sony Computer Science Laboratories, Shinagawa-ku, Tokyo, Japan
- Division of Systems Biology, Cancer Institute, Tokyo, Japan
- Open Biology Unit, Okinawa Institute of Science and Technology, Kunigami, Okinawa Japan
| | - Rudi Balling
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 7, Avenue des Hauts-Fourneaux, Esch-sur-Alzette, Luxembourg
| |
Collapse
|
15
|
Abstract
Initially designed to infer evolutionary relationships based on morphological and physiological characters, phylogenetic reconstruction methods have greatly benefited from recent developments in molecular biology and sequencing technologies with a number of powerful methods having been developed specifically to infer phylogenies from macromolecular data. This chapter, while presenting an overview of basic concepts and methods used in phylogenetic reconstruction, is primarily intended as a simplified step-by-step guide to the construction of phylogenetic trees from nucleotide sequences using fairly up-to-date maximum likelihood methods implemented in freely available computer programs. While the analysis of chloroplast sequences from various Vanilla species is used as an illustrative example, the techniques covered here are relevant to the comparative analysis of homologous sequences datasets sampled from any group of organisms.
Collapse
Affiliation(s)
- Alexandre De Bruyn
- Pôle de Protection des Plantes, CIRAD, UMR PVBMT, Université de la Réunion, Saint-Pierre, France
| | | | | |
Collapse
|
16
|
Filtering and ranking techniques for automated selection of high-quality 16S rRNA gene sequences. Syst Appl Microbiol 2013; 36:549-59. [DOI: 10.1016/j.syapm.2013.09.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2013] [Revised: 09/06/2013] [Accepted: 09/10/2013] [Indexed: 11/21/2022]
|
17
|
Abstract
The first steps of building a new model can be very time-consuming, involving consulting many research papers and then assembling a plausible network of reactions. In this chapter, tools for speeding up this process will be discussed. Reactome is a database containing extensive coverage of pathways in Homo sapiens and numerous reference species. It offers researchers wishing to create new models from scratch various tools for extracting the relevant reactions, complete with layout information. In this chapter, two use cases will be described, in which a modeller provides certain essential pieces of information and Reactome automatically constructs the basic models and then dumps them in SBML-ML format.
Collapse
|
18
|
Kwok J, Kwong KM. Loop-mediated isothermal amplification for detection of HLA-B*58:01 allele. ACTA ACUST UNITED AC 2012; 81:83-92. [PMID: 23240628 DOI: 10.1111/tan.12042] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2012] [Revised: 11/08/2012] [Accepted: 11/11/2012] [Indexed: 11/28/2022]
Abstract
Strong association of human leukocyte antigen (HLA)-B*58:01 allele with allopurinol-induced hypersensitivity was found worldwide, especially in the Han Chinese populations. This study aims to develop and evaluate a loop-mediated isothermal amplification (LAMP) assay for rapid detection of HLA-B*58:01. Two sets of LAMP primers targeting exons 2 and 3 of HLA-B*58:01 allele were designed and their annealing temperatures were optimized accordingly. The heating devices for LAMP assay were tested. The analytical sensitivities of the two sets of LAMP primers were determined by 1:10 serial dilution of a positive control with homozygous HLA-B*58:01 allele from 100 ng down to 1 fg. The analytical specificities of the LAMP primers were evaluated by 30 selected University of California, Los Angeles (UCLA) DNA Exchange Program samples with known HLA-B loci typings previously typed by sequencing. Both sets of LAMP primers targeting exons 2 and 3 amplified optimally at 67°C. Thermal cycler is essential in achieving a more precise and specific LAMP result. The sensitivity of the exon 2 LAMP primer set was found to be 1 pg, whereas it was 10 ng for the exon 3 primer set in a 60-min amplification. The LAMP primers were highly specific because LAMP results were perfectly concordant to the sequencing results. The HLA-B*58:01 LAMP assay has compatible sensitivity and specificity to routine genotyping assays, and it is potentially an alternative screening test for the detection of HLA-B*58:01 and ultimately allopurinol-induced hypersensitivity.
Collapse
Affiliation(s)
- J Kwok
- Division of Transplantation and Immunogenetics, Department of Pathology and Clinical Biochemistry, Queen Mary Hospital, Hong Kong SAR, China.
| | | |
Collapse
|
19
|
Rustici G, Kolesnikov N, Brandizi M, Burdett T, Dylag M, Emam I, Farne A, Hastings E, Ison J, Keays M, Kurbatova N, Malone J, Mani R, Mupo A, Pedro Pereira R, Pilicheva E, Rung J, Sharma A, Tang YA, Ternent T, Tikhonov A, Welter D, Williams E, Brazma A, Parkinson H, Sarkans U. ArrayExpress update--trends in database growth and links to data analysis tools. Nucleic Acids Res 2012. [PMID: 23193272 PMCID: PMC3531147 DOI: 10.1093/nar/gks1174] [Citation(s) in RCA: 299] [Impact Index Per Article: 24.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
The ArrayExpress Archive of Functional Genomics Data (http://www.ebi.ac.uk/arrayexpress) is one of three international functional genomics public data repositories, alongside the Gene Expression Omnibus at NCBI and the DDBJ Omics Archive, supporting peer-reviewed publications. It accepts data generated by sequencing or array-based technologies and currently contains data from almost a million assays, from over 30 000 experiments. The proportion of sequencing-based submissions has grown significantly over the last 2 years and has reached, in 2012, 15% of all new data. All data are available from ArrayExpress in MAGE-TAB format, which allows robust linking to data analysis and visualization tools, including Bioconductor and GenomeSpace. Additionally, R objects, for microarray data, and binary alignment format files, for sequencing data, have been generated for a significant proportion of ArrayExpress data.
Collapse
Affiliation(s)
- Gabriella Rustici
- Functional Genomics Team, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Hoeppner MP, Gardner PP, Poole AM. Comparative analysis of RNA families reveals distinct repertoires for each domain of life. PLoS Comput Biol 2012; 8:e1002752. [PMID: 23133357 PMCID: PMC3486863 DOI: 10.1371/journal.pcbi.1002752] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2012] [Accepted: 09/07/2012] [Indexed: 02/02/2023] Open
Abstract
The RNA world hypothesis, that RNA genomes and catalysts preceded DNA genomes and genetically-encoded protein catalysts, has been central to models for the early evolution of life on Earth. A key part of such models is continuity between the earliest stages in the evolution of life and the RNA repertoires of extant lineages. Some assessments seem consistent with a diverse RNA world, yet direct continuity between modern RNAs and an RNA world has not been demonstrated for the majority of RNA families, and, anecdotally, many RNA functions appear restricted in their distribution. Despite much discussion of the possible antiquity of RNA families, no systematic analyses of RNA family distribution have been performed. To chart the broad evolutionary history of known RNA families, we performed comparative genomic analysis of over 3 million RNA annotations spanning 1446 families from the Rfam 10 database. We report that 99% of known RNA families are restricted to a single domain of life, revealing discrete repertoires for each domain. For the 1% of RNA families/clans present in more than one domain, over half show evidence of horizontal gene transfer (HGT), and the rest show a vertical trace, indicating the presence of a complex protein synthesis machinery in the Last Universal Common Ancestor (LUCA) and consistent with the evolutionary history of the most ancient protein-coding genes. However, with limited interdomain transfer and few RNA families exhibiting demonstrable antiquity as predicted under RNA world continuity, our results indicate that the majority of modern cellular RNA repertoires have primarily evolved in a domain-specific manner. In cells, DNA carries recipes for making proteins, and proteins perform chemical reactions, including replication of DNA. This interdependency raises questions for early evolution, since one molecule seemingly cannot exist without the other. A resolution to this problem is the RNA world, where RNA is postulated to have been both genetic material and primary catalyst. While artificially selected catalytic RNAs strengthen the chemical plausibility of an RNA world, a biological prediction is that some RNAs should date back to this period. In this study, we ask to what degree RNAs in extant organisms trace back to the common ancestor of cellular life. Using the Rfam RNA families database, we systematically screened genomes spanning the three domains of life (Archaea, Bacteria, Eukarya) for RNA genes, and examined how far back in evolution known RNA families can be traced. We find that 99% of RNA families are restricted to a single domain. Limited conservation within domains implies ongoing emergence of RNA functions during evolution. Of the remaining 1%, half show evidence of horizontal transfer (movement of genes between organisms), and half show an evolutionary history consistent with an RNA world. The oldest RNAs are primarily associated with protein synthesis and export.
Collapse
Affiliation(s)
- Marc P. Hoeppner
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- * E-mail: (MPH); (PPG); (AMP)
| | - Paul P. Gardner
- Biomolecular Interaction Centre & School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
- * E-mail: (MPH); (PPG); (AMP)
| | - Anthony M. Poole
- Biomolecular Interaction Centre & School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
- * E-mail: (MPH); (PPG); (AMP)
| |
Collapse
|
21
|
Spang A, Poehlein A, Offre P, Zumbrägel S, Haider S, Rychlik N, Nowka B, Schmeisser C, Lebedeva EV, Rattei T, Böhm C, Schmid M, Galushko A, Hatzenpichler R, Weinmaier T, Daniel R, Schleper C, Spieck E, Streit W, Wagner M. The genome of the ammonia-oxidizing Candidatus Nitrososphaera gargensis: insights into metabolic versatility and environmental adaptations. Environ Microbiol 2012; 14:3122-45. [PMID: 23057602 DOI: 10.1111/j.1462-2920.2012.02893.x] [Citation(s) in RCA: 211] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2012] [Accepted: 09/01/2012] [Indexed: 01/21/2023]
Abstract
The cohort of the ammonia-oxidizing archaea (AOA) of the phylum Thaumarchaeota is a diverse, widespread and functionally important group of microorganisms in many ecosystems. However, our understanding of their biology is still very rudimentary in part because all available genome sequences of this phylum are from members of the Nitrosopumilus cluster. Here we report on the complete genome sequence of Candidatus Nitrososphaera gargensis obtained from an enrichment culture, representing a different evolutionary lineage of AOA frequently found in high numbers in many terrestrial environments. With its 2.83 Mb the genome is much larger than that of other AOA. The presence of a high number of (active) IS elements/transposases, genomic islands, gene duplications and a complete CRISPR/Cas defence system testifies to its dynamic evolution consistent with low degree of synteny with other thaumarchaeal genomes. As expected, the repertoire of conserved enzymes proposed to be required for archaeal ammonia oxidation is encoded by N. gargensis, but it can also use urea and possibly cyanate as alternative ammonia sources. Furthermore, its carbon metabolism is more flexible at the central pyruvate switch point, encompasses the ability to take up small organic compounds and might even include an oxidative pentose phosphate pathway. Furthermore, we show that thaumarchaeota produce cofactor F420 as well as polyhydroxyalkanoates. Lateral gene transfer from bacteria and euryarchaeota has contributed to the metabolic versatility of N. gargensis. This organisms is well adapted to its niche in a heavy metal-containing thermal spring by encoding a multitude of heavy metal resistance genes, chaperones and mannosylglycerate as compatible solute and has the genetic ability to respond to environmental changes by signal transduction via a large number of two-component systems, by chemotaxis and flagella-mediated motility and possibly even by gas vacuole formation. These findings extend our understanding of thaumarchaeal evolution and physiology and offer many testable hypotheses for future experimental research on these nitrifiers.
Collapse
Affiliation(s)
- Anja Spang
- Department of Genetics in Ecology, University of Vienna, Althanstr. 14, 1090, Vienna, Austria
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Becnel LB, McKenna NJ. Minireview: progress and challenges in proteomics data management, sharing, and integration. Mol Endocrinol 2012; 26:1660-74. [PMID: 22902541 DOI: 10.1210/me.2012-1180] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
The proteome represents the identity, expression levels, interacting partners, and posttranslational modifications of proteins expressed within any given cell. Proteomic studies aim to census the quantitative and qualitative factors regulating the biological relationships of proteins acting in concert as functional cellular networks. In the field of endocrinology, proteomics has been of considerable value in determining the function and mechanism of action of endocrine signaling molecules in the cell membrane, cytoplasm, and nucleus and for the discovery of proteins as candidates for clinical biomarkers. The volume of data that can be generated by proteomics methodologies, up to gigabytes of data within a few hours, brings with it its own logistical hurdles and presents significant challenges to realizing the full potential of these datasets. In this minireview, we describe selected current proteomics methodologies and their application in basic and translational endocrinology before focusing on mass spectrometry as a model for current progress and challenges in data analysis, management, sharing, and integration.
Collapse
Affiliation(s)
- Lauren B Becnel
- Department of Medicine, Hematology and Oncology, Baylor College of Medicine, 1 Baylor Plaza MS-BCM305, Houston, Texas 77030, USA.
| | | |
Collapse
|
23
|
Epp LS, Boessenkool S, Bellemain EP, Haile J, Esposito A, Riaz T, Erséus C, Gusarov VI, Edwards ME, Johnsen A, Stenøien HK, Hassel K, Kauserud H, Yoccoz NG, Bråthen KA, Willerslev E, Taberlet P, Coissac E, Brochmann C. New environmental metabarcodes for analysing soil DNA: potential for studying past and present ecosystems. Mol Ecol 2012; 21:1821-33. [PMID: 22486821 DOI: 10.1111/j.1365-294x.2012.05537.x] [Citation(s) in RCA: 133] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Metabarcoding approaches use total and typically degraded DNA from environmental samples to analyse biotic assemblages and can potentially be carried out for any kinds of organisms in an ecosystem. These analyses rely on specific markers, here called metabarcodes, which should be optimized for taxonomic resolution, minimal bias in amplification of the target organism group and short sequence length. Using bioinformatic tools, we developed metabarcodes for several groups of organisms: fungi, bryophytes, enchytraeids, beetles and birds. The ability of these metabarcodes to amplify the target groups was systematically evaluated by (i) in silico PCRs using all standard sequences in the EMBL public database as templates, (ii) in vitro PCRs of DNA extracts from surface soil samples from a site in Varanger, northern Norway and (iii) in vitro PCRs of DNA extracts from permanently frozen sediment samples of late-Pleistocene age (~16,000-50,000 years bp) from two Siberian sites, Duvanny Yar and Main River. Comparison of the results from the in silico PCR with those obtained in vitro showed that the in silico approach offered a reliable estimate of the suitability of a marker. All target groups were detected in the environmental DNA, but we found large variation in the level of detection among the groups and between modern and ancient samples. Success rates for the Pleistocene samples were highest for fungal DNA, whereas bryophyte, beetle and bird sequences could also be retrieved, but to a much lesser degree. The metabarcoding approach has considerable potential for biodiversity screening of modern samples and also as a palaeoecological tool.
Collapse
Affiliation(s)
- Laura S Epp
- National Centre for Biosystematics, Natural History Museum, University of Oslo, Oslo, Norway.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Réblová M, Réblová K. RNA secondary structure, an important bioinformatics tool to enhance multiple sequence alignment: a case study (Sordariomycetes, Fungi). Mycol Prog 2012. [DOI: 10.1007/s11557-012-0836-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
25
|
Dreher F, Kreitler T, Hardt C, Kamburov A, Yildirimman R, Schellander K, Lehrach H, Lange BMH, Herwig R. DIPSBC--data integration platform for systems biology collaborations. BMC Bioinformatics 2012; 13:85. [PMID: 22568834 PMCID: PMC3424966 DOI: 10.1186/1471-2105-13-85] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2011] [Accepted: 05/01/2012] [Indexed: 11/17/2022] Open
Abstract
Background Modern biomedical research is often organized in collaborations involving labs worldwide. In particular in systems biology, complex molecular systems are analyzed that require the generation and interpretation of heterogeneous data for their explanation, for example ranging from gene expression studies and mass spectrometry measurements to experimental techniques for detecting molecular interactions and functional assays. XML has become the most prominent format for representing and exchanging these data. However, besides the development of standards there is still a fundamental lack of data integration systems that are able to utilize these exchange formats, organize the data in an integrative way and link it with applications for data interpretation and analysis. Results We have developed DIPSBC, an interactive data integration platform supporting collaborative research projects, based on Foswiki, Solr/Lucene, and specific helper applications. We describe the main features of the implementation and highlight the performance of the system with several use cases. All components of the system are platform independent and open-source developments and thus can be easily adopted by researchers. An exemplary installation of the platform which also provides several helper applications and detailed instructions for system usage and setup is available at http://dipsbc.molgen.mpg.de. Conclusions DIPSBC is a data integration platform for medium-scale collaboration projects that has been tested already within several research collaborations. Because of its modular design and the incorporation of XML data formats it is highly flexible and easy to use.
Collapse
Affiliation(s)
- Felix Dreher
- Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Ihnestr. 63-73, 14195 Berlin, Germany.
| | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Cruz-Toledo J, McKeague M, Zhang X, Giamberardino A, McConnell E, Francis T, DeRosa MC, Dumontier M. Aptamer Base: a collaborative knowledge base to describe aptamers and SELEX experiments. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2012; 2012:bas006. [PMID: 22434840 PMCID: PMC3308162 DOI: 10.1093/database/bas006] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Over the past several decades, rapid developments in both molecular and information technology have collectively increased our ability to understand molecular recognition. One emerging area of interest in molecular recognition research includes the isolation of aptamers. Aptamers are single-stranded nucleic acid or amino acid polymers that recognize and bind to targets with high affinity and selectivity. While research has focused on collecting aptamers and their interactions, most of the information regarding experimental methods remains in the unstructured and textual format of peer reviewed publications. To address this, we present the Aptamer Base, a database that provides detailed, structured information about the experimental conditions under which aptamers were selected and their binding affinity quantified. The open collaborative nature of the Aptamer Base provides the community with a unique resource that can be updated and curated in a decentralized manner, thereby accommodating the ever evolving field of aptamer research. Database URL:http://aptamer.freebase.com
Collapse
|
27
|
Abstract
We review technical and sociological issues facing the Life Sciences as they transform into more data-centric disciplines - the "Big New Biology". Three major challenges are: 1) lack of comprehensive standards; 2) lack of incentives for individual scientists to share data; 3) lack of appropriate infrastructure and support. Technological advances with standards, bandwidth, distributed computing, exemplar successes, and a strong presence in the emerging world of Linked Open Data are sufficient to conclude that technical issues will be overcome in the foreseeable future. While motivated to have a shared open infrastructure and data pool, and pressured by funding agencies in move in this direction, the sociological issues determine progress. Major sociological issues include our lack of understanding of the heterogeneous data cultures within Life Sciences, and the impediments to progress include a lack of incentives to build appropriate infrastructures into projects and institutions or to encourage scientists to make data openly available.
Collapse
Affiliation(s)
- Anne E Thessen
- Center for Library and Informatics, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543 USA
| | | |
Collapse
|
28
|
Colmsee C, Flemming S, Klapperstück M, Lange M, Scholz U. A case study for efficient management of high throughput primary lab data. BMC Res Notes 2011; 4:413. [PMID: 22005096 PMCID: PMC3217054 DOI: 10.1186/1756-0500-4-413] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2011] [Accepted: 10/17/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In modern life science research it is very important to have an efficient management of high throughput primary lab data. To realise such an efficient management, four main aspects have to be handled: (I) long term storage, (II) security, (III) upload and (IV) retrieval. FINDINGS In this paper we define central requirements for a primary lab data management and discuss aspects of best practices to realise these requirements. As a proof of concept, we introduce a pipeline that has been implemented in order to manage primary lab data at the Leibniz Institute of Plant Genetics and Crop Plant Research (IPK). It comprises: (I) a data storage implementation including a Hierarchical Storage Management system, a relational Oracle Database Management System and a BFiler package to store primary lab data and their meta information, (II) the Virtual Private Database (VPD) implementation for the realisation of data security and the LIMS Light application to (III) upload and (IV) retrieve stored primary lab data. CONCLUSIONS With the LIMS Light system we have developed a primary data management system which provides an efficient storage system with a Hierarchical Storage Management System and an Oracle relational database. With our VPD Access Control Method we can guarantee the security of the stored primary data. Furthermore the system provides high performance upload and download and efficient retrieval of data.
Collapse
Affiliation(s)
- Christian Colmsee
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstr, 3, 06466 Gatersleben, Germany.
| | | | | | | | | |
Collapse
|
29
|
Dugat-Bony E, Peyretaillade E, Parisot N, Biderre-Petit C, Jaziri F, Hill D, Rimour S, Peyret P. Detecting unknown sequences with DNA microarrays: explorative probe design strategies. Environ Microbiol 2011; 14:356-71. [PMID: 21895914 DOI: 10.1111/j.1462-2920.2011.02559.x] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Designing environmental DNA microarrays that can be used to survey the extreme diversity of microorganisms existing in nature, represents a stimulating challenge in the field of molecular ecology. Indeed, recent efforts in metagenomics have produced a substantial amount of sequence information from various ecosystems, and will continue to accumulate large amounts of sequence data given the qualitative and quantitative improvements in the next-generation sequencing methods. It is now possible to take advantage of these data to develop comprehensive microarrays by using explorative probe design strategies. Such strategies anticipate genetic variations and thus are able to detect known and unknown sequences in environmental samples. In this review, we provide a detailed overview of the probe design strategies currently available to construct both phylogenetic and functional DNA microarrays, with emphasis on those permitting the selection of such explorative probes. Furthermore, exploration of complex environments requires particular attention on probe sensitivity and specificity criteria. Finally, these innovative probe design approaches require exploiting newly available high-density microarray formats.
Collapse
Affiliation(s)
- Eric Dugat-Bony
- Clermont Université, Université Blaise Pascal, Laboratoire Microorganismes: Génome et Environnement, Clermont-Ferrand, France
| | | | | | | | | | | | | | | |
Collapse
|
30
|
Liu X, Zhao L, Dong Q. Protein remote homology detection based on auto-cross covariance transformation. Comput Biol Med 2011; 41:640-7. [DOI: 10.1016/j.compbiomed.2011.05.015] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2010] [Revised: 05/03/2011] [Accepted: 05/24/2011] [Indexed: 11/26/2022]
|
31
|
Assessment of soil fungal diversity in different alpine tundra habitats by means of pyrosequencing. FUNGAL DIVERS 2011. [DOI: 10.1007/s13225-011-0101-5] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
32
|
Williams GW, Davis PA, Rogers AS, Bieri T, Ozersky P, Spieth J. Methods and strategies for gene structure curation in WormBase. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2011; 2011:baq039. [PMID: 21543339 PMCID: PMC3092607 DOI: 10.1093/database/baq039] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The Caenorhabditis elegans genome sequence was published over a decade ago; this was the first published genome of a multi-cellular organism and now the WormBase project has had a decade of experience in curating this genome's sequence and gene structures. In one of its roles as a central repository for nematode biology, WormBase continues to refine the gene structure annotations using sequence similarity and other computational methods, as well as information from the literature- and community-submitted annotations. We describe the various methods of gene structure curation that have been tried by WormBase and the problems associated with each of them. We also describe the current strategy for gene structure curation, and introduce the WormBase ‘curation tool’, which integrates different data sources in order to identify new and correct gene structures. Database URL: http://www.wormbase.org/
Collapse
Affiliation(s)
- G W Williams
- WormBase Group, The Wellcome Trust Sanger Institute, Hinxton, Cambs, UK.
| | | | | | | | | | | |
Collapse
|
33
|
Simultaneous genome-wide inference of physical, genetic, regulatory, and functional pathway components. PLoS Comput Biol 2010; 6:e1001009. [PMID: 21124865 PMCID: PMC2991250 DOI: 10.1371/journal.pcbi.1001009] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2010] [Accepted: 10/25/2010] [Indexed: 11/19/2022] Open
Abstract
Biomolecular pathways are built from diverse types of pairwise interactions, ranging from physical protein-protein interactions and modifications to indirect regulatory relationships. One goal of systems biology is to bridge three aspects of this complexity: the growing body of high-throughput data assaying these interactions; the specific interactions in which individual genes participate; and the genome-wide patterns of interactions in a system of interest. Here, we describe methodology for simultaneously predicting specific types of biomolecular interactions using high-throughput genomic data. This results in a comprehensive compendium of whole-genome networks for yeast, derived from ∼3,500 experimental conditions and describing 30 interaction types, which range from general (e.g. physical or regulatory) to specific (e.g. phosphorylation or transcriptional regulation). We used these networks to investigate molecular pathways in carbon metabolism and cellular transport, proposing a novel connection between glycogen breakdown and glucose utilization supported by recent publications. Additionally, 14 specific predicted interactions in DNA topological change and protein biosynthesis were experimentally validated. We analyzed the systems-level network features within all interactomes, verifying the presence of small-world properties and enrichment for recurring network motifs. This compendium of physical, synthetic, regulatory, and functional interaction networks has been made publicly available through an interactive web interface for investigators to utilize in future research at http://function.princeton.edu/bioweaver/. To maintain the complexity of living biological systems, many proteins must interact in a coordinated manner to integrate their unique functions into a cooperative system. Pathways are typically constructed to capture modular subsets of this dynamic network, each made up of a collection of biomolecular interactions of diverse types that together carry out a specific cellular function. Deciphering these pathways at a global level is a crucial step for unraveling systems biology, aiding at every level from basic biological understanding to translational biomarker and drug target discovery. The combination of high-throughput genomic data with advanced computational methods has enabled us to infer the first genome-wide compendium of bimolecular pathway networks, comprising 30 distinct bimolecular interaction types. We demonstrate that this interaction network compendium, derived from ∼3,500 experimental conditions, can be used to direct a range of biomedical hypothesis generation and testing. We show that our results can be used to predict novel protein interactions and new pathway components, and also that they enable system-level analysis to investigate the network characteristics of cell-wide regulatory circuits. The resulting compendium of biological networks is made publicly available through an interactive web interface to enable future research in other biological systems of interest.
Collapse
|
34
|
Parkinson H, Sarkans U, Kolesnikov N, Abeygunawardena N, Burdett T, Dylag M, Emam I, Farne A, Hastings E, Holloway E, Kurbatova N, Lukk M, Malone J, Mani R, Pilicheva E, Rustici G, Sharma A, Williams E, Adamusiak T, Brandizi M, Sklyar N, Brazma A. ArrayExpress update--an archive of microarray and high-throughput sequencing-based functional genomics experiments. Nucleic Acids Res 2010; 39:D1002-4. [PMID: 21071405 PMCID: PMC3013660 DOI: 10.1093/nar/gkq1040] [Citation(s) in RCA: 271] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
The ArrayExpress Archive (http://www.ebi.ac.uk/arrayexpress) is one of the three international public repositories of functional genomics data supporting publications. It includes data generated by sequencing or array-based technologies. Data are submitted by users and imported directly from the NCBI Gene Expression Omnibus. The ArrayExpress Archive is closely integrated with the Gene Expression Atlas and the sequence databases at the European Bioinformatics Institute. Advanced queries provided via ontology enabled interfaces include queries based on technology and sample attributes such as disease, cell types and anatomy.
Collapse
Affiliation(s)
- Helen Parkinson
- Functional Genomics Team, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Hinz U. From protein sequences to 3D-structures and beyond: the example of the UniProt knowledgebase. Cell Mol Life Sci 2010; 67:1049-64. [PMID: 20043185 PMCID: PMC2835715 DOI: 10.1007/s00018-009-0229-6] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2009] [Revised: 12/01/2009] [Accepted: 12/07/2009] [Indexed: 11/12/2022]
Abstract
With the dramatic increase in the volume of experimental results in every domain of life sciences, assembling pertinent data and combining information from different fields has become a challenge. Information is dispersed over numerous specialized databases and is presented in many different formats. Rapid access to experiment-based information about well-characterized proteins helps predict the function of uncharacterized proteins identified by large-scale sequencing. In this context, universal knowledgebases play essential roles in providing access to data from complementary types of experiments and serving as hubs with cross-references to many specialized databases. This review outlines how the value of experimental data is optimized by combining high-quality protein sequences with complementary experimental results, including information derived from protein 3D-structures, using as an example the UniProt knowledgebase (UniProtKB) and the tools and links provided on its website ( http://www.uniprot.org/ ). It also evokes precautions that are necessary for successful predictions and extrapolations.
Collapse
Affiliation(s)
- Ursula Hinz
- Swiss-Prot Group, Swiss Institute of Bioinformatics, 1 rue Michel Servet, 1211, Geneva, Switzerland.
| |
Collapse
|
36
|
Protein Bioinformatics Infrastructure for the Integration and Analysis of Multiple High-Throughput "omics" Data. Adv Bioinformatics 2010:423589. [PMID: 20369061 PMCID: PMC2847380 DOI: 10.1155/2010/423589] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2009] [Accepted: 01/05/2010] [Indexed: 12/26/2022] Open
Abstract
High-throughput “omics” technologies bring new opportunities for biological and biomedical researchers to ask complex questions and gain new scientific insights. However, the voluminous, complex, and context-dependent data being maintained in heterogeneous and distributed environments plus the lack of well-defined data standard and standardized nomenclature imposes a major challenge which requires advanced computational methods and bioinformatics infrastructures for integration, mining, visualization, and comparative analysis to facilitate data-driven hypothesis generation and biological knowledge discovery. In this paper, we present the challenges in high-throughput “omics” data integration and analysis, introduce a protein-centric approach for systems integration of large and heterogeneous high-throughput “omics” data including microarray, mass spectrometry, protein sequence, protein structure, and protein interaction data, and use scientific case study to illustrate how one can use varied “omics” data from different laboratories to make useful connections that could lead to new biological knowledge.
Collapse
|
37
|
Valentin F, Squizzato S, Goujon M, McWilliam H, Paern J, Lopez R. Fast and efficient searching of biological data resources--using EB-eye. Brief Bioinform 2010; 11:375-84. [PMID: 20150321 DOI: 10.1093/bib/bbp065] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The EB-eye is a fast and efficient search engine that provides easy and uniform access to the biological data resources hosted at the EMBL-EBI. Currently, users can access information from more than 62 distinct datasets covering some 400 million entries. The data resources represented in the EB-eye include: nucleotide and protein sequences at both the genomic and proteomic levels, structures ranging from chemicals to macro-molecular complexes, gene-expression experiments, binary level molecular interactions as well as reaction maps and pathway models, functional classifications, biological ontologies, and comprehensive literature libraries covering the biomedical sciences and related intellectual property. The EB-eye can be accessed over the web or programmatically using a SOAP Web Services interface. This allows its search and retrieval capabilities to be exploited in workflows and analytical pipe-lines. The EB-eye is a novel alternative to existing biological search and retrieval engines. In this article we describe in detail how to exploit its powerful capabilities.
Collapse
|
38
|
Gerner M, Nenadic G, Bergman CM. LINNAEUS: a species name identification system for biomedical literature. BMC Bioinformatics 2010; 11:85. [PMID: 20149233 PMCID: PMC2836304 DOI: 10.1186/1471-2105-11-85] [Citation(s) in RCA: 153] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2009] [Accepted: 02/11/2010] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND The task of recognizing and identifying species names in biomedical literature has recently been regarded as critical for a number of applications in text and data mining, including gene name recognition, species-specific document retrieval, and semantic enrichment of biomedical articles. RESULTS In this paper we describe an open-source species name recognition and normalization software system, LINNAEUS, and evaluate its performance relative to several automatically generated biomedical corpora, as well as a novel corpus of full-text documents manually annotated for species mentions. LINNAEUS uses a dictionary-based approach (implemented as an efficient deterministic finite-state automaton) to identify species names and a set of heuristics to resolve ambiguous mentions. When compared against our manually annotated corpus, LINNAEUS performs with 94% recall and 97% precision at the mention level, and 98% recall and 90% precision at the document level. Our system successfully solves the problem of disambiguating uncertain species mentions, with 97% of all mentions in PubMed Central full-text documents resolved to unambiguous NCBI taxonomy identifiers. CONCLUSIONS LINNAEUS is an open source, stand-alone software system capable of recognizing and normalizing species name mentions with speed and accuracy, and can therefore be integrated into a range of bioinformatics and text-mining applications. The software and manually annotated corpus can be downloaded freely at http://linnaeus.sourceforge.net/.
Collapse
Affiliation(s)
- Martin Gerner
- Faculty of Life Sciences, University of Manchester, Manchester, M13 9PT, UK
| | - Goran Nenadic
- School of Computer Science, University of Manchester, Manchester, M13 9PL, UK
| | - Casey M Bergman
- Faculty of Life Sciences, University of Manchester, Manchester, M13 9PT, UK
| |
Collapse
|
39
|
van Ommen B, Bouwman J, Dragsted LO, Drevon CA, Elliott R, de Groot P, Kaput J, Mathers JC, Müller M, Pepping F, Saito J, Scalbert A, Radonjic M, Rocca-Serra P, Travis A, Wopereis S, Evelo CT. Challenges of molecular nutrition research 6: the nutritional phenotype database to store, share and evaluate nutritional systems biology studies. GENES AND NUTRITION 2010; 5:189-203. [PMID: 21052526 PMCID: PMC2935528 DOI: 10.1007/s12263-010-0167-9] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2009] [Accepted: 01/03/2010] [Indexed: 11/25/2022]
Abstract
The challenge of modern nutrition and health research is to identify food-based strategies promoting life-long optimal health and well-being. This research is complex because it exploits a multitude of bioactive compounds acting on an extensive network of interacting processes. Whereas nutrition research can profit enormously from the revolution in ‘omics’ technologies, it has discipline-specific requirements for analytical and bioinformatic procedures. In addition to measurements of the parameters of interest (measures of health), extensive description of the subjects of study and foods or diets consumed is central for describing the nutritional phenotype. We propose and pursue an infrastructural activity of constructing the “Nutritional Phenotype database” (dbNP). When fully developed, dbNP will be a research and collaboration tool and a publicly available data and knowledge repository. Creation and implementation of the dbNP will maximize benefits to the research community by enabling integration and interrogation of data from multiple studies, from different research groups, different countries and different—omics levels. The dbNP is designed to facilitate storage of biologically relevant, pre-processed—omics data, as well as study descriptive and study participant phenotype data. It is also important to enable the combination of this information at different levels (e.g. to facilitate linkage of data describing participant phenotype, genotype and food intake with information on study design and—omics measurements, and to combine all of this with existing knowledge). The biological information stored in the database (i.e. genetics, transcriptomics, proteomics, biomarkers, metabolomics, functional assays, food intake and food composition) is tailored to nutrition research and embedded in an environment of standard procedures and protocols, annotations, modular data-basing, networking and integrated bioinformatics. The dbNP is an evolving enterprise, which is only sustainable if it is accepted and adopted by the wider nutrition and health research community as an open source, pre-competitive and publicly available resource where many partners both can contribute and profit from its developments. We introduce the Nutrigenomics Organisation (NuGO, http://www.nugo.org) as a membership association responsible for establishing and curating the dbNP. Within NuGO, all efforts related to dbNP (i.e. usage, coordination, integration, facilitation and maintenance) will be directed towards a sustainable and federated infrastructure.
Collapse
Affiliation(s)
- Ben van Ommen
- TNO Quality of Life, PO Box 360, 6700 AJ Zeist, The Netherlands
| | - Jildau Bouwman
- TNO Quality of Life, PO Box 360, 6700 AJ Zeist, The Netherlands
| | - Lars O. Dragsted
- Institute of Human Nutrition, University of Copenhagen, 30 Rolighedsvej, 1958 Frederiksberg C, Denmark
| | - Christian A. Drevon
- Department of Nutrition, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, Oslo, Norway
| | - Ruan Elliott
- Institute of Food Research, Norwich Research Park, Norwich, Norfolk NR4 7UA UK
| | - Philip de Groot
- Nutrigenomics Consortium, TI Food and Nutrition, P.O. Box 557, 6700AN Wageningen, The Netherlands
- Division of Human Nutrition, Wageningen University, PO Box 8129, 6700 EV Wageningen, The Netherlands
| | - Jim Kaput
- Division of Personalized Nutrition and Medicine, Food and Drug Administration/National Center for Toxicological Research, Jefferson, AR USA
| | - John C. Mathers
- Human Nutrition Research Centre, Institute for Ageing and Health, Newcastle University, William Leech Building, Framlington Place, Newcastle, NE44 6HE UK
| | - Michael Müller
- Nutrigenomics Consortium, TI Food and Nutrition, P.O. Box 557, 6700AN Wageningen, The Netherlands
- Division of Human Nutrition, Wageningen University, PO Box 8129, 6700 EV Wageningen, The Netherlands
| | - Fre Pepping
- Division of Human Nutrition, Wageningen University, PO Box 8129, 6700 EV Wageningen, The Netherlands
| | - Jahn Saito
- Department of Bioinformatics (BiGCaT) and Department of Knowledge Engineering (DKE), Maastricht University, Maastricht, The Netherlands
| | - Augustin Scalbert
- INRA, UMR 1019, Unite´ de Nutrition Humaine, Centre de Recherche de Clermont-Ferrand/Theix, 63122 Saint-Genes-Champanelle, France
| | | | | | - Anthony Travis
- The Rowett Institute of Nutrition and Health, University of Aberdeen, Greenburn Road, Bucksburn Aberdeen, Scotland, AB21 9SB UK
| | - Suzan Wopereis
- TNO Quality of Life, PO Box 360, 6700 AJ Zeist, The Netherlands
| | - Chris T. Evelo
- Department of Bioinformatics (BiGCaT), Maastricht University, Maastricht, The Netherlands
| |
Collapse
|
40
|
Klucar L, Stano M, Hajduk M. phiSITE: database of gene regulation in bacteriophages. Nucleic Acids Res 2010; 38:D366-70. [PMID: 19900969 PMCID: PMC2808901 DOI: 10.1093/nar/gkp911] [Citation(s) in RCA: 86] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2009] [Accepted: 10/07/2009] [Indexed: 11/30/2022] Open
Abstract
We have developed phiSITE, database of gene regulation in bacteriophages. To date it contains detailed information about more than 700 experimentally confirmed or predicted regulatory elements (promoters, operators, terminators and attachment sites) from 32 bacteriophages belonging to Siphoviridae, Myoviridae and Podoviridae families. The database is manually curated, the data are collected mainly form scientific papers, cross-referenced with other database resources (EMBL, UniProt, NCBI taxonomy database, NCBI Genome, ICTVdb, PubMed Central) and stored in SQL based database system. The system provides full text search for regulatory elements, graphical visualization of phage genomes and several export options. In addition, visualizations of gene regulatory networks for five phages (Bacillus phage GA-1, Enterobacteria phage lambda, Enterobacteria phage Mu, Enterobacteria phage P2 and Mycoplasma phage P1) have been defined and made available. The phiSITE is accessible at http://www.phisite.org/.
Collapse
Affiliation(s)
- Lubos Klucar
- Institute of Molecular Biology, Slovak Academy of Sciences, Dubravska cesta 21, 84551 Bratislava, Slovakia.
| | | | | |
Collapse
|
41
|
Abstract
Next generation sequencing platforms are producing biological sequencing data in unprecedented amounts. The partners of the International Nucleotide Sequencing Database Collaboration, which includes the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI), and the DNA Data Bank of Japan (DDBJ), have established the Sequence Read Archive (SRA) to provide the scientific community with an archival destination for next generation data sets. The SRA is now accessible at http://www.ncbi.nlm.nih.gov/Traces/sra from NCBI, at http://www.ebi.ac.uk/ena from EBI and at http://www.ddbj.nig.ac.jp/sub/trace_sra-e.html from DDBJ. Users of these resources can obtain data sets deposited in any of the three SRA instances. Links and submission instructions are provided.
Collapse
Affiliation(s)
- Martin Shumway
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | | | |
Collapse
|
42
|
Kersey PJ, Lawson D, Birney E, Derwent PS, Haimel M, Herrero J, Keenan S, Kerhornou A, Koscielny G, Kähäri A, Kinsella RJ, Kulesha E, Maheswari U, Megy K, Nuhn M, Proctor G, Staines D, Valentin F, Vilella AJ, Yates A. Ensembl Genomes: extending Ensembl across the taxonomic space. Nucleic Acids Res 2009; 38:D563-9. [PMID: 19884133 PMCID: PMC2808935 DOI: 10.1093/nar/gkp871] [Citation(s) in RCA: 116] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Ensembl Genomes (http://www.ensemblgenomes.org) is a new portal offering integrated access to genome-scale data from non-vertebrate species of scientific interest, developed using the Ensembl genome annotation and visualisation platform. Ensembl Genomes consists of five sub-portals (for bacteria, protists, fungi, plants and invertebrate metazoa) designed to complement the availability of vertebrate genomes in Ensembl. Many of the databases supporting the portal have been built in close collaboration with the scientific community, which we consider as essential for maintaining the accuracy and usefulness of the resource. A common set of user interfaces (which include a graphical genome browser, FTP, BLAST search, a query optimised data warehouse, programmatic access, and a Perl API) is provided for all domains. Data types incorporated include annotation of (protein and non-protein coding) genes, cross references to external resources, and high throughput experimental data (e.g. data from large scale studies of gene expression and polymorphism visualised in their genomic context). Additionally, extensive comparative analysis has been performed, both within defined clades and across the wider taxonomy, and sequence alignments and gene trees resulting from this can be accessed through the site.
Collapse
Affiliation(s)
- P J Kersey
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Robinson J, Mistry K, McWilliam H, Lopez R, Marsh SGE. IPD--the Immuno Polymorphism Database. Nucleic Acids Res 2009; 38:D863-9. [PMID: 19875415 PMCID: PMC2808958 DOI: 10.1093/nar/gkp879] [Citation(s) in RCA: 155] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
The Immuno Polymorphism Database (IPD) (http://www.ebi.ac.uk/ipd/) is a set of specialist databases related to the study of polymorphic genes in the immune system. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of Killer-cell Immunoglobulin-like Receptors, IPD-MHC, is a database of sequences of the Major Histocompatibility Complex of different species; IPD-human platelet antigens, alloantigens expressed only on platelets and IPD-ESTDAB, which provides access to the European Searchable Tumour cell-line database, a cell bank of immunologically characterised melanoma cell lines. The data is currently available online from the website and ftp directory.
Collapse
Affiliation(s)
- James Robinson
- Anthony Nolan Research Institute, Royal Free Hospital, Hampstead, London NW3 2QG, UK
| | | | | | | | | |
Collapse
|
44
|
Aranda B, Achuthan P, Alam-Faruque Y, Armean I, Bridge A, Derow C, Feuermann M, Ghanbarian AT, Kerrien S, Khadake J, Kerssemakers J, Leroy C, Menden M, Michaut M, Montecchi-Palazzi L, Neuhauser SN, Orchard S, Perreau V, Roechert B, van Eijk K, Hermjakob H. The IntAct molecular interaction database in 2010. Nucleic Acids Res 2009; 38:D525-31. [PMID: 19850723 PMCID: PMC2808934 DOI: 10.1093/nar/gkp878] [Citation(s) in RCA: 524] [Impact Index Per Article: 34.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
IntAct is an open-source, open data molecular interaction database and toolkit. Data is abstracted from the literature or from direct data depositions by expert curators following a deep annotation model providing a high level of detail. As of September 2009, IntAct contains over 200.000 curated binary interaction evidences. In response to the growing data volume and user requests, IntAct now provides a two-tiered view of the interaction data. The search interface allows the user to iteratively develop complex queries, exploiting the detailed annotation with hierarchical controlled vocabularies. Results are provided at any stage in a simplified, tabular view. Specialized views then allows 'zooming in' on the full annotation of interactions, interactors and their properties. IntAct source code and data are freely available at http://www.ebi.ac.uk/intact.
Collapse
Affiliation(s)
- B Aranda
- EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SD, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Abstract
The primary mission of UniProt is to support biological research by maintaining a stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community. UniProt is produced by the UniProt Consortium which consists of groups from the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is comprised of four major components, each optimized for different uses: the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. UniProt is updated and distributed every 3 weeks and can be accessed online for searches or download at http://www.uniprot.org.
Collapse
Affiliation(s)
-
- The EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
46
|
ChIP-seq: advantages and challenges of a maturing technology. Nat Rev Genet 2009. [PMID: 19736561 DOI: 10.1038/nrg2641,+10.1038/ni0709-669] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is a technique for genome-wide profiling of DNA-binding proteins, histone modifications or nucleosomes. Owing to the tremendous progress in next-generation sequencing technology, ChIP-seq offers higher resolution, less noise and greater coverage than its array-based predecessor ChIP-chip. With the decreasing cost of sequencing, ChIP-seq has become an indispensable tool for studying gene regulation and epigenetic mechanisms. In this Review, I describe the benefits and challenges in harnessing this technique with an emphasis on issues related to experimental design and data analysis. ChIP-seq experiments generate large quantities of data, and effective computational analysis will be crucial for uncovering biological mechanisms.
Collapse
|
47
|
Abstract
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is a technique for genome-wide profiling of DNA-binding proteins, histone modifications or nucleosomes. Owing to the tremendous progress in next-generation sequencing technology, ChIP-seq offers higher resolution, less noise and greater coverage than its array-based predecessor ChIP-chip. With the decreasing cost of sequencing, ChIP-seq has become an indispensable tool for studying gene regulation and epigenetic mechanisms. In this Review, I describe the benefits and challenges in harnessing this technique with an emphasis on issues related to experimental design and data analysis. ChIP-seq experiments generate large quantities of data, and effective computational analysis will be crucial for uncovering biological mechanisms.
Collapse
|
48
|
Abstract
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is a technique for genome-wide profiling of DNA-binding proteins, histone modifications or nucleosomes. Owing to the tremendous progress in next-generation sequencing technology, ChIP-seq offers higher resolution, less noise and greater coverage than its array-based predecessor ChIP-chip. With the decreasing cost of sequencing, ChIP-seq has become an indispensable tool for studying gene regulation and epigenetic mechanisms. In this Review, I describe the benefits and challenges in harnessing this technique with an emphasis on issues related to experimental design and data analysis. ChIP-seq experiments generate large quantities of data, and effective computational analysis will be crucial for uncovering biological mechanisms.
Collapse
|
49
|
Abstract
BACKGROUND Increasingly, effective drug discovery involves the searching and data mining of large volumes of information from many sources covering the domains of chemistry, biology and pharmacology amongst others. This has led to a proliferation of databases and data sources relevant to drug discovery. OBJECTIVE This paper provides a review of the publicly-available large-scale databases relevant to drug discovery, describes the kinds of data mining approaches that can be applied to them and discusses recent work in integrative data mining that looks for associations that pan multiple sources, including the use of Semantic Web techniques. CONCLUSION The future of mining large data sets for drug discovery requires intelligent, semantic aggregation of information from all of the data sources described in this review, along with the application of advanced methods such as intelligent agents and inference engines in client applications.
Collapse
Affiliation(s)
- David J Wild
- Director of Cheminformatics Program, Assistant Professor of Informatics, Indiana Universtiy, School of Informatics and Computing, 901 E. 10th St., Bloomington, IN 47408, USA +1 812 856 1848 ; +1 608 541 5402 ;
| |
Collapse
|
50
|
Penel S, Arigon AM, Dufayard JF, Sertier AS, Daubin V, Duret L, Gouy M, Perrière G. Databases of homologous gene families for comparative genomics. BMC Bioinformatics 2009; 10 Suppl 6:S3. [PMID: 19534752 PMCID: PMC2697650 DOI: 10.1186/1471-2105-10-s6-s3] [Citation(s) in RCA: 102] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Background Comparative genomics is a central step in many sequence analysis studies, from gene annotation and the identification of new functional regions in genomes, to the study of evolutionary processes at the molecular level (speciation, single gene or whole genome duplications, etc.) and phylogenetics. In that context, databases providing users high quality homologous families and sequence alignments as well as phylogenetic trees based on state of the art algorithms are becoming indispensable. Methods We developed an automated procedure allowing massive all-against-all similarity searches, gene clustering, multiple alignments computation, and phylogenetic trees construction and reconciliation. The application of this procedure to a very large set of sequences is possible through parallel computing on a large computer cluster. Results Three databases were developed using this procedure: HOVERGEN, HOGENOM and HOMOLENS. These databases share the same architecture but differ in their content. HOVERGEN contains sequences from vertebrates, HOGENOM is mainly devoted to completely sequenced microbial organisms, and HOMOLENS is devoted to metazoan genomes from Ensembl. Access to the databases is provided through Web query forms, a general retrieval system and a client-server graphical interface. The later can be used to perform tree-pattern based searches allowing, among other uses, to retrieve sets of orthologous genes. The three databases, as well as the software required to build and query them, can be used or downloaded from the PBIL (Pôle Bioinformatique Lyonnais) site at .
Collapse
Affiliation(s)
- Simon Penel
- Laboratoire de Biométrie et Biologie Evolutive, CNRS, Université Claude Bernard - Lyon 1, 43 bd, du 11 Novembre 1918, 69622 Villeurbanne Cedex, France.
| | | | | | | | | | | | | | | |
Collapse
|