1
|
Role of Bioinformatics in Biological Sciences. Adv Bioinformatics 2021. [DOI: 10.1007/978-981-33-6191-1_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
2
|
Dobie C, Montgomery AP, Szabo R, Skropeta D, Yu H. Computer-aided design of human sialyltransferase inhibitors of hST8Sia III. J Mol Recognit 2017; 31. [PMID: 29119617 DOI: 10.1002/jmr.2684] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2017] [Revised: 10/05/2017] [Accepted: 10/06/2017] [Indexed: 11/07/2022]
Abstract
Sialyltransferase (ST) upregulation and the resultant hypersialylation of tumour cell surfaces is an established hallmark of many cancers including lung, breast, ovarian, pancreatic and prostate cancer. The role of ST enzymes in tumour cell growth and metastasis, as well as links to multi-drug resistance, has seen ST inhibition emerge as a target for potential antimetastatic cancer treatments. The most potent of these reported inhibitors are transition-state analogues. Although there are several examples of these in the literature, many have suspected poor pharmacokinetic properties and are not readily synthetically accessible. A proposed solution to these problems is the use of a neutral carbamate or 1,2,3-triazole linker instead of the more commonly used phosphodiester linker, and replacing the traditionally utilised cytidine nucleotide with uridine. Another issue in this area is the paucity of structural information of human ST enzymes. However, in late 2015 the structure of human ST8Sia III was reported (only the second human ST described so far), creating the opportunity for structure-based design of selective ST8 inhibitors for the first time. Herein, molecular docking and molecular dynamics simulations with the newly published crystal structure of hST8Sia III were performed for the first time with selected ST transition state analogues. Simulations showed that these compounds could participate in many of the key interactions common with the natural donor and acceptor substrates, and reveals some key insights into the synthesis of potentially selective ST inhibitors.
Collapse
Affiliation(s)
- Christopher Dobie
- School of Chemistry, Faculty of Science, Medicine and Health, University of Wollongong, Wollongong, NSW, Australia
| | - Andrew P Montgomery
- School of Chemistry, Faculty of Science, Medicine and Health, University of Wollongong, Wollongong, NSW, Australia
| | - Rémi Szabo
- School of Chemistry, Faculty of Science, Medicine and Health, University of Wollongong, Wollongong, NSW, Australia
| | - Danielle Skropeta
- School of Chemistry, Faculty of Science, Medicine and Health, University of Wollongong, Wollongong, NSW, Australia.,Centre for Medical and Molecular Bioscience, University of Wollongong, Wollongong, NSW, Australia.,Illawarra Health and Medical Research Institute, University of Wollongong, Wollongong, NSW, 2522, Australia
| | - Haibo Yu
- School of Chemistry, Faculty of Science, Medicine and Health, University of Wollongong, Wollongong, NSW, Australia.,Centre for Medical and Molecular Bioscience, University of Wollongong, Wollongong, NSW, Australia.,Illawarra Health and Medical Research Institute, University of Wollongong, Wollongong, NSW, 2522, Australia
| |
Collapse
|
3
|
Automated extraction of potential migraine biomarkers using a semantic graph. J Biomed Inform 2017; 71:178-189. [PMID: 28579531 DOI: 10.1016/j.jbi.2017.05.018] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2016] [Revised: 04/03/2017] [Accepted: 05/23/2017] [Indexed: 01/20/2023]
Abstract
PROBLEM Biomedical literature and databases contain important clues for the identification of potential disease biomarkers. However, searching these enormous knowledge reservoirs and integrating findings across heterogeneous sources is costly and difficult. Here we demonstrate how semantically integrated knowledge, extracted from biomedical literature and structured databases, can be used to automatically identify potential migraine biomarkers. METHOD We used a knowledge graph containing more than 3.5 million biomedical concepts and 68.4 million relationships. Biochemical compound concepts were filtered and ranked by their potential as biomarkers based on their connections to a subgraph of migraine-related concepts. The ranked results were evaluated against the results of a systematic literature review that was performed manually by migraine researchers. Weight points were assigned to these reference compounds to indicate their relative importance. RESULTS Ranked results automatically generated by the knowledge graph were highly consistent with results from the manual literature review. Out of 222 reference compounds, 163 (73%) ranked in the top 2000, with 547 out of the 644 (85%) weight points assigned to the reference compounds. For reference compounds that were not in the top of the list, an extensive error analysis has been performed. When evaluating the overall performance, we obtained a ROC-AUC of 0.974. DISCUSSION Semantic knowledge graphs composed of information integrated from multiple and varying sources can assist researchers in identifying potential disease biomarkers.
Collapse
|
4
|
Fuchs JE, Schilling O, Liedl KR. Determinants of Macromolecular Specificity from Proteomics-Derived Peptide Substrate Data. Curr Protein Pept Sci 2017; 18:905-913. [PMID: 27455965 PMCID: PMC5898033 DOI: 10.2174/1389203717666160724211231] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2017] [Revised: 03/30/2017] [Accepted: 04/15/2017] [Indexed: 11/22/2022]
Abstract
BACKGROUND Recent advances in proteomics methodologies allow for high throughput profiling of proteolytic cleavage events. The resulting substrate peptide distributions provide deep insights in the underlying macromolecular recognition events, as determinants of biomolecular specificity identified by proteomics approaches may be compared to structure-based analysis of corresponding protein-protein interfaces. METHOD Here, we present an overview of experimental and computational methodologies and tools applied in the area and provide an outlook beyond the protein class of proteases. RESULTS AND CONCLUSION We discuss here future potential, synergies and needs of the emerging overlap disciplines of proteomics and structure-based modelling.
Collapse
Affiliation(s)
- Julian E. Fuchs
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, CambridgeCB2 1EW, United Kingdom
| | - Oliver Schilling
- Institute of Molecular Medicine and Cell Research, University of Freiburg, Stefan-Meier-Str. 17, D-79104 Freiburg, Germany and BIOSS Centre for Biological Signaling Studies, University of Freiburg, D-79104Freiburg, Germany
| | - Klaus R. Liedl
- Institute of General, Inorganic and Theoretical Chemistry, Center for Molecular Biosciences Innsbruck (CMBI), University of Innsbruck, Innrain 80/82, A-6020Innsbruck, Austria
| |
Collapse
|
5
|
Abstract
The purpose of this chapter is to provide a starting point for the analysis of miRNA array data, using freely available online suites of tools. This chapter does not describe how to perform analysis of primary array data, rather how to use the top differentially regulated miRNA (returned from comparing one miRNA group to another) as the starting point for further practical analysis.Here we describe the methods and tools required to identify targets worthy of additional investigation, using the identified miRNA as a starting point. Importantly, this additional information (pathways targeted, gene expression, mRNA targets, miRNA families) can be used to positively inform any project.
Collapse
Affiliation(s)
- James A L Brown
- Discipline of Surgery, College of Medicine, Lambe Institute for Translational Research, National University of Ireland, University Road, Galway, Ireland.
| | - Emer Bourke
- Discipline of Pathology, College of Medicine, Lambe Institute for Translational Research, National University of Ireland, Galway, Ireland
| |
Collapse
|
6
|
Pranavchand R, Reddy BM. Genomics era and complex disorders: Implications of GWAS with special reference to coronary artery disease, type 2 diabetes mellitus, and cancers. J Postgrad Med 2016; 62:188-98. [PMID: 27424552 PMCID: PMC4970347 DOI: 10.4103/0022-3859.186390] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
The Human Genome Project (HGP) has identified millions of single nucleotide polymorphisms (SNPs) and their association with several diseases, apart from successfully characterizing the Mendelian/monogenic diseases. However, the dissection of precise etiology of complex genetic disorders still poses a challenge for human geneticists. This review outlines the landmark results of genome-wide association studies (GWAS) with respect to major complex diseases - Coronary artery disease (CAD), type 2 diabetes mellitus (T2DM), and predominant cancers. A brief account on the current Indian scenario is also given. All the relevant publications till mid-2015 were accessed through web databases such as PubMed and Google. Several databases providing genetic information related to these diseases were tabulated and in particular, the list of the most significant SNPs identified through GWAS was made, which may be useful for designing studies in functional validation. Post-GWAS implications and emerging concepts such as epigenomics and pharmacogenomics were also discussed.
Collapse
Affiliation(s)
- R Pranavchand
- Molecular Anthropology Group, Biological Anthropology Unit, Indian Statistical Institute, Hyderabad, Andhra Pradesh, India
| | - B M Reddy
- Molecular Anthropology Group, Biological Anthropology Unit, Indian Statistical Institute, Hyderabad, Andhra Pradesh, India
| |
Collapse
|
7
|
Savonnet M, Leclercq E, Naubourg P. eClims: An Extensible and Dynamic Integration Framework for Biomedical Information Systems. IEEE J Biomed Health Inform 2016; 20:1640-1649. [DOI: 10.1109/jbhi.2015.2464353] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
8
|
Abstract
Systems medicine promotes a range of approaches and strategies to study human health and disease at a systems level with the aim of improving the overall well-being of (healthy) individuals, and preventing, diagnosing, or curing disease. In this chapter we discuss how bioinformatics critically contributes to systems medicine. First, we explain the role of bioinformatics in the management and analysis of data. In particular we show the importance of publicly available biological and clinical repositories to support systems medicine studies. Second, we discuss how the integration and analysis of multiple types of omics data through integrative bioinformatics may facilitate the determination of more predictive and robust disease signatures, lead to a better understanding of (patho)physiological molecular mechanisms, and facilitate personalized medicine. Third, we focus on network analysis and discuss how gene networks can be constructed from omics data and how these networks can be decomposed into smaller modules. We discuss how the resulting modules can be used to generate experimentally testable hypotheses, provide insight into disease mechanisms, and lead to predictive models. Throughout, we provide several examples demonstrating how bioinformatics contributes to systems medicine and discuss future challenges in bioinformatics that need to be addressed to enable the advancement of systems medicine.
Collapse
|
9
|
Chang J, Cho H, Chou HH. Mango: combining and analyzing heterogeneous biological networks. BioData Min 2016; 9:25. [PMID: 27489569 PMCID: PMC4971676 DOI: 10.1186/s13040-016-0105-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2016] [Accepted: 06/20/2016] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Heterogeneous biological data such as sequence matches, gene expression correlations, protein-protein interactions, and biochemical pathways can be merged and analyzed via graphs, or networks. Existing software for network analysis has limited scalability to large data sets or is only accessible to software developers as libraries. In addition, the polymorphic nature of the data sets requires a more standardized method for integration and exploration. RESULTS Mango facilitates large network analyses with its Graph Exploration Language, automatic graph attribute handling, and real-time 3-dimensional visualization. On a personal computer Mango can load, merge, and analyze networks with millions of links and can connect to online databases to fetch and merge biological pathways. CONCLUSIONS Mango is written in C++ and runs on Mac OS, Windows, and Linux. The stand-alone distributions, including the Graph Exploration Language integrated development environment, are freely available for download from http://www.complex.iastate.edu/download/Mango. The Mango User Guide listing all features can be found at http://www.gitbook.com/book/j23414/mango-user-guide.
Collapse
Affiliation(s)
- Jennifer Chang
- Department of Genetics, Development and Cell Biology, Iowa State University, Iowa, 50011 Ames USA
| | - Hyejin Cho
- Department of Genetics, Development and Cell Biology, Iowa State University, Iowa, 50011 Ames USA
| | - Hui-Hsien Chou
- Department of Genetics, Development and Cell Biology, Iowa State University, Iowa, 50011 Ames USA.,Department of Computer Science, Iowa State University, Iowa, 50011 Ames USA
| |
Collapse
|
10
|
The nitrogen responsive transcriptome in potato (Solanum tuberosum L.) reveals significant gene regulatory motifs. Sci Rep 2016; 6:26090. [PMID: 27193058 PMCID: PMC4872257 DOI: 10.1038/srep26090] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2015] [Accepted: 04/25/2016] [Indexed: 12/18/2022] Open
Abstract
Nitrogen (N) is the most important nutrient for the growth of potato (Solanum tuberosum L.). Foliar gene expression in potato plants with and without N supplementation at 180 kg N ha(-1) was compared at mid-season. Genes with consistent differences in foliar expression due to N supplementation over three cultivars and two developmental time points were examined. In total, thirty genes were found to be over-expressed and nine genes were found to be under-expressed with supplemented N. Functional relationships between over-expressed genes were found. The main metabolic pathway represented among differentially expressed genes was amino acid metabolism. The 1000 bp upstream flanking regions of the differentially expressed genes were analysed and nine overrepresented motifs were found using three motif discovery algorithms (Seeder, Weeder and MEME). These results point to coordinated gene regulation at the transcriptional level controlling steady state potato responses to N sufficiency.
Collapse
|
11
|
Brown JAL. Evaluating the effectiveness of a practical inquiry-based learning bioinformatics module on undergraduate student engagement and applied skills. BIOCHEMISTRY AND MOLECULAR BIOLOGY EDUCATION : A BIMONTHLY PUBLICATION OF THE INTERNATIONAL UNION OF BIOCHEMISTRY AND MOLECULAR BIOLOGY 2016; 44:304-13. [PMID: 27161812 DOI: 10.1002/bmb.20954] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2015] [Revised: 11/20/2015] [Accepted: 12/08/2015] [Indexed: 05/27/2023]
Abstract
A pedagogic intervention, in the form of an inquiry-based peer-assisted learning project (as a practical student-led bioinformatics module), was assessed for its ability to increase students' engagement, practical bioinformatic skills and process-specific knowledge. Elements assessed were process-specific knowledge following module completion, qualitative student-based module evaluation and the novelty, scientific validity and quality of written student reports. Bioinformatics is often the starting point for laboratory-based research projects, therefore high importance was placed on allowing students to individually develop and apply processes and methods of scientific research. Students led a bioinformatic inquiry-based project (within a framework of inquiry), discovering, justifying and exploring individually discovered research targets. Detailed assessable reports were produced, displaying data generated and the resources used. Mimicking research settings, undergraduates were divided into small collaborative groups, with distinctive central themes. The module was evaluated by assessing the quality and originality of the students' targets through reports, reflecting students' use and understanding of concepts and tools required to generate their data. Furthermore, evaluation of the bioinformatic module was assessed semi-quantitatively using pre- and post-module quizzes (a non-assessable activity, not contributing to their grade), which incorporated process- and content-specific questions (indicative of their use of the online tools). Qualitative assessment of the teaching intervention was performed using post-module surveys, exploring student satisfaction and other module specific elements. Overall, a positive experience was found, as was a post module increase in correct process-specific answers. In conclusion, an inquiry-based peer-assisted learning module increased students' engagement, practical bioinformatic skills and process-specific knowledge. © 2016 by The International Union of Biochemistry and Molecular Biology, 44:304-313 2016.
Collapse
Affiliation(s)
- James A L Brown
- Department of Biochemistry, School of Natural Sciences, National University of Ireland Galway, Ireland and Discipline of Surgery, School of Medicine, Lambe Institute for Translational Research, National University of Ireland Galway, Ireland
| |
Collapse
|
12
|
Arend D, Junker A, Scholz U, Schüler D, Wylie J, Lange M. PGP repository: a plant phenomics and genomics data publication infrastructure. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw033. [PMID: 27087305 PMCID: PMC4834206 DOI: 10.1093/database/baw033] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Accepted: 02/26/2016] [Indexed: 11/22/2022]
Abstract
Plant genomics and phenomics represents the most promising tools for accelerating yield gains and overcoming emerging crop productivity bottlenecks. However, accessing this wealth of plant diversity requires the characterization of this material using state-of-the-art genomic, phenomic and molecular technologies and the release of subsequent research data via a long-term stable, open-access portal. Although several international consortia and public resource centres offer services for plant research data management, valuable digital assets remains unpublished and thus inaccessible to the scientific community. Recently, the Leibniz Institute of Plant Genetics and Crop Plant Research and the German Plant Phenotyping Network have jointly initiated the Plant Genomics and Phenomics Research Data Repository (PGP) as infrastructure to comprehensively publish plant research data. This covers in particular cross-domain datasets that are not being published in central repositories because of its volume or unsupported data scope, like image collections from plant phenotyping and microscopy, unfinished genomes, genotyping data, visualizations of morphological plant models, data from mass spectrometry as well as software and documents. The repository is hosted at Leibniz Institute of Plant Genetics and Crop Plant Research using e!DAL as software infrastructure and a Hierarchical Storage Management System as data archival backend. A novel developed data submission tool was made available for the consortium that features a high level of automation to lower the barriers of data publication. After an internal review process, data are published as citable digital object identifiers and a core set of technical metadata is registered at DataCite. The used e!DAL-embedded Web frontend generates for each dataset a landing page and supports an interactive exploration. PGP is registered as research data repository at BioSharing.org, re3data.org and OpenAIRE as valid EU Horizon 2020 open data archive. Above features, the programmatic interface and the support of standard metadata formats, enable PGP to fulfil the FAIR data principles—findable, accessible, interoperable, reusable. Database URL:http://edal.ipk-gatersleben.de/repos/pgp/
Collapse
Affiliation(s)
- Daniel Arend
- Leibniz Institute for Plant Genetics and Crop Plant Research (IPK), OT Gatersleben, Corrensstraße 3, Stadt Seeland, 06466, Gatersleben, Germany
| | - Astrid Junker
- Leibniz Institute for Plant Genetics and Crop Plant Research (IPK), OT Gatersleben, Corrensstraße 3, Stadt Seeland, 06466, Gatersleben, Germany
| | - Uwe Scholz
- Leibniz Institute for Plant Genetics and Crop Plant Research (IPK), OT Gatersleben, Corrensstraße 3, Stadt Seeland, 06466, Gatersleben, Germany
| | - Danuta Schüler
- Leibniz Institute for Plant Genetics and Crop Plant Research (IPK), OT Gatersleben, Corrensstraße 3, Stadt Seeland, 06466, Gatersleben, Germany
| | - Juliane Wylie
- Leibniz Institute for Plant Genetics and Crop Plant Research (IPK), OT Gatersleben, Corrensstraße 3, Stadt Seeland, 06466, Gatersleben, Germany
| | - Matthias Lange
- Leibniz Institute for Plant Genetics and Crop Plant Research (IPK), OT Gatersleben, Corrensstraße 3, Stadt Seeland, 06466, Gatersleben, Germany
| |
Collapse
|
13
|
Sharma S, Toledo O, Hedden M, Lyon KF, Brooks SB, David RP, Limtong J, Newsome JM, Novakovic N, Rajasekaran S, Thapar V, Williams SR, Schiller MR. The Functional Human C-Terminome. PLoS One 2016; 11:e0152731. [PMID: 27050421 PMCID: PMC4822787 DOI: 10.1371/journal.pone.0152731] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2015] [Accepted: 03/18/2016] [Indexed: 11/24/2022] Open
Abstract
All translated proteins end with a carboxylic acid commonly called the C-terminus. Many short functional sequences (minimotifs) are located on or immediately proximal to the C-terminus. However, information about the function of protein C-termini has not been consolidated into a single source. Here, we built a new "C-terminome" database and web system focused on human proteins. Approximately 3,600 C-termini in the human proteome have a minimotif with an established molecular function. To help evaluate the function of the remaining C-termini in the human proteome, we inferred minimotifs identified by experimentation in rodent cells, predicted minimotifs based upon consensus sequence matches, and predicted novel highly repetitive sequences in C-termini. Predictions can be ranked by enrichment scores or Gene Evolutionary Rate Profiling (GERP) scores, a measurement of evolutionary constraint. By searching for new anchored sequences on the last 10 amino acids of proteins in the human proteome with lengths between 3-10 residues and up to 5 degenerate positions in the consensus sequences, we have identified new consensus sequences that predict instances in the majority of human genes. All of this information is consolidated into a database that can be accessed through a C-terminome web system with search and browse functions for minimotifs and human proteins. A known consensus sequence-based predicted function is assigned to nearly half the proteins in the human proteome. Weblink: http://cterminome.bio-toolkit.com.
Collapse
Affiliation(s)
- Surbhi Sharma
- Nevada Institute of Personalized Medicine, and School of Life Sciences, University of Nevada, Las Vegas, Nevada, United States of America
| | - Oniel Toledo
- Nevada Institute of Personalized Medicine, and School of Life Sciences, University of Nevada, Las Vegas, Nevada, United States of America
| | - Michael Hedden
- Nevada Institute of Personalized Medicine, and School of Life Sciences, University of Nevada, Las Vegas, Nevada, United States of America
| | - Kenneth F. Lyon
- Nevada Institute of Personalized Medicine, and School of Life Sciences, University of Nevada, Las Vegas, Nevada, United States of America
| | - Steven B. Brooks
- Nevada Institute of Personalized Medicine, and School of Life Sciences, University of Nevada, Las Vegas, Nevada, United States of America
| | - Roxanne P. David
- Nevada Institute of Personalized Medicine, and School of Life Sciences, University of Nevada, Las Vegas, Nevada, United States of America
| | - Justin Limtong
- Nevada Institute of Personalized Medicine, and School of Life Sciences, University of Nevada, Las Vegas, Nevada, United States of America
| | - Jacklyn M. Newsome
- Nevada Institute of Personalized Medicine, and School of Life Sciences, University of Nevada, Las Vegas, Nevada, United States of America
| | - Nemanja Novakovic
- Nevada Institute of Personalized Medicine, and School of Life Sciences, University of Nevada, Las Vegas, Nevada, United States of America
| | - Sanguthevar Rajasekaran
- Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut 06269–2155, United States of America
| | - Vishal Thapar
- Department of Pathology, Massachusetts General Hospital, Boston, Massachusetts 02114, United States of America
| | - Sean R. Williams
- Nevada Institute of Personalized Medicine, and School of Life Sciences, University of Nevada, Las Vegas, Nevada, United States of America
| | - Martin R. Schiller
- Nevada Institute of Personalized Medicine, and School of Life Sciences, University of Nevada, Las Vegas, Nevada, United States of America
| |
Collapse
|
14
|
Yohe SL, Carter AB, Pfeifer JD, Crawford JM, Cushman-Vokoun A, Caughron S, Leonard DGB. Standards for Clinical Grade Genomic Databases. Arch Pathol Lab Med 2016; 139:1400-12. [PMID: 26516938 DOI: 10.5858/arpa.2014-0568-cp] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
CONTEXT Next-generation sequencing performed in a clinical environment must meet clinical standards, which requires reproducibility of all aspects of the testing. Clinical-grade genomic databases (CGGDs) are required to classify a variant and to assist in the professional interpretation of clinical next-generation sequencing. Applying quality laboratory standards to the reference databases used for sequence-variant interpretation presents a new challenge for validation and curation. OBJECTIVES To define CGGD and the categories of information contained in CGGDs and to frame recommendations for the structure and use of these databases in clinical patient care. DESIGN Members of the College of American Pathologists Personalized Health Care Committee reviewed the literature and existing state of genomic databases and developed a framework for guiding CGGD development in the future. RESULTS Clinical-grade genomic databases may provide different types of information. This work group defined 3 layers of information in CGGDs: clinical genomic variant repositories, genomic medical data repositories, and genomic medicine evidence databases. The layers are differentiated by the types of genomic and medical information contained and the utility in assisting with clinical interpretation of genomic variants. Clinical-grade genomic databases must meet specific standards regarding submission, curation, and retrieval of data, as well as the maintenance of privacy and security. CONCLUSION These organizing principles for CGGDs should serve as a foundation for future development of specific standards that support the use of such databases for patient care.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Debra G B Leonard
- From the Department of Laboratory Medicine and Pathology, University of Minnesota Medical Center, Minneapolis (Dr Yohe); the Department of Pathology and Laboratory Medicine and the Department of Biomedical Informatics, Emory University, Atlanta, Georgia (Dr Carter); the Department of Pathology, Washington University School of Medicine, St. Louis, Missouri (Dr Pfeifer); the Department of Pathology and Laboratory Medicine, Hofstra North Shore-Long Island Jewish School of Medicine, Hempstead, New York (Dr Crawford); the Department of Pathology and Microbiology, University of Nebraska Medical Center, Omaha (Dr Cushman-Vokoun); the MAWD Pathology Group, North Kansas City, Missouri (Dr Caughron); and the Department of Pathology and Laboratory Medicine, University of Vermont College of Medicine, Burlington (Dr Leonard)
| |
Collapse
|
15
|
Nagahawatte P, Willis E, Sakauye M, Jose R, Chen H, Davis RL. Featured Article: Genotation: Actionable knowledge for the scientific reader. Exp Biol Med (Maywood) 2016; 241:1202-9. [PMID: 26900164 DOI: 10.1177/1535370216633795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2015] [Accepted: 01/21/2016] [Indexed: 11/16/2022] Open
Abstract
We present an article viewer application that allows a scientific reader to easily discover and share knowledge by linking genomics-related concepts to knowledge of disparate biomedical databases. High-throughput data streams generated by technical advancements have contributed to scientific knowledge discovery at an unprecedented rate. Biomedical Informaticists have created a diverse set of databases to store and retrieve the discovered knowledge. The diversity and abundance of such resources present biomedical researchers a challenge with knowledge discovery. These challenges highlight a need for a better informatics solution. We use a text mining algorithm, Genomine, to identify gene symbols from the text of a journal article. The identified symbols are supplemented with information from the GenoDB knowledgebase. Self-updating GenoDB contains information from NCBI Gene, Clinvar, Medgen, dbSNP, KEGG, PharmGKB, Uniprot, and Hugo Gene databases. The journal viewer is a web application accessible via a web browser. The features described herein are accessible on www.genotation.org The Genomine algorithm identifies gene symbols with an accuracy shown by .65 F-Score. GenoDB currently contains information regarding 59,905 gene symbols, 5633 drug-gene relationships, 5981 gene-disease relationships, and 713 pathways. This application provides scientific readers with actionable knowledge related to concepts of a manuscript. The reader will be able to save and share supplements to be visualized in a graphical manner. This provides convenient access to details of complex biological phenomena, enabling biomedical researchers to generate novel hypothesis to further our knowledge in human health. This manuscript presents a novel application that integrates genomic, proteomic, and pharmacogenomic information to supplement content of a biomedical manuscript and enable readers to automatically discover actionable knowledge.
Collapse
Affiliation(s)
| | - Ethan Willis
- Center of Biomedical Informatics, Memphis, TN 38103, USA
| | - Mark Sakauye
- Center of Biomedical Informatics, Memphis, TN 38103, USA
| | - Rony Jose
- Center of Biomedical Informatics, Memphis, TN 38103, USA
| | - Hao Chen
- Department of Pharmacology, University of Tennessee Health Science Centre, Memphis, TN 38103, USA
| | - Robert L Davis
- Center of Biomedical Informatics, Memphis, TN 38103, USA
| |
Collapse
|
16
|
Kerksick CM, Tsatsakis AM, Hayes AW, Kafantaris I, Kouretas D. How can bioinformatics and toxicogenomics assist the next generation of research on physical exercise and athletic performance. J Strength Cond Res 2015; 29:270-8. [PMID: 25353080 DOI: 10.1519/jsc.0000000000000730] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
The past 2-3 decades have seen an explosion in analytical areas related to "omic" technologies. These advancements have reached a point where their application can be and are being used as a part of exercise physiology and sport performance research. Such advancements have drastically enabled researchers to analyze extremely large groups of data that can provide amounts of information never before made available. Although these "omic" technologies offer exciting possibilities, the analytical costs and time required to complete the statistical approaches are substantial. The areas of exercise physiology and sport performance continue to witness an exponential growth of published studies using any combination of these techniques. Because more investigators within these traditionally applied science disciplines use these approaches, the need for efficient, thoughtful, and accurate extraction of information from electronic databases is paramount. As before, these disciplines can learn much from other disciplines who have already developed software and technologies to rapidly enhance the quality of results received when searching for key information. In addition, further development and interest in areas such as toxicogenomics could aid in the development and identification of more accurate testing programs for illicit drugs, performance enhancing drugs abused in sport, and better therapeutic outcomes from prescribed drug use. This review is intended to offer a discussion related to how bioinformatics approaches may assist the new generation of "omic" research in areas related to exercise physiology and toxicogenomics. Consequently, more focus will be placed on popular tools that are already available for analyzing such complex data and highlighting additional strategies and considerations that can further aid in developing new tools and data management approaches to assist future research in this field. It is our contention that introducing more scientists to how this type of work can complement existing experimental approaches within exercise physiology and sport performance will foster additional discussion and stimulate new research in these areas.
Collapse
Affiliation(s)
- Chad M Kerksick
- 1Department of Exercise Science, School of Sport, Recreation and Exercise Sciences, Lindenwood University, St. Charles, Missouri; 2Department of Forensic Sciences and Toxicology, Laboratory of Toxicology, Medical School, University of Crete, Heraklion, Greece; 3Department of Environmental Health, Harvard School of Public Health, Boston, Massachusetts; 4Spherix Consulting, Inc., Bethesda, Maryland; and 5Department of Biochemistry and Biotechnology, University of Thessaly, Larissa, Greece
| | | | | | | | | |
Collapse
|
17
|
Wu C, Schwartz JM, Brabant G, Peng SL, Nenadic G. Constructing a molecular interaction network for thyroid cancer via large-scale text mining of gene and pathway events. BMC SYSTEMS BIOLOGY 2015; 9 Suppl 6:S5. [PMID: 26679379 PMCID: PMC4674859 DOI: 10.1186/1752-0509-9-s6-s5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Background Biomedical studies need assistance from automated tools and easily accessible data to address the problem of the rapidly accumulating literature. Text-mining tools and curated databases have been developed to address such needs and they can be applied to improve the understanding of molecular pathogenesis of complex diseases like thyroid cancer. Results We have developed a system, PWTEES, which extracts pathway interactions from the literature utilizing an existing event extraction tool (TEES) and pathway named entity recognition (PathNER). We then applied the system on a thyroid cancer corpus and systematically extracted molecular interactions involving either genes or pathways. With the extracted information, we constructed a molecular interaction network taking genes and pathways as nodes. Using curated pathway information and network topological analyses, we highlight key genes and pathways involved in thyroid carcinogenesis. Conclusions Mining events involving genes and pathways from the literature and integrating curated pathway knowledge can help improve the understanding of molecular interactions of complex diseases. The system developed for this study can be applied in studies other than thyroid cancer. The source code is freely available online at https://github.com/chengkun-wu/PWTEES.
Collapse
|
18
|
Fuchs JE, Bender A, Glen RC. Cheminformatics Research at the Unilever Centre for Molecular Science Informatics Cambridge. Mol Inform 2015; 34:626-633. [PMID: 26435758 PMCID: PMC4583778 DOI: 10.1002/minf.201400166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2014] [Accepted: 12/16/2014] [Indexed: 11/12/2022]
Abstract
The Centre for Molecular Informatics, formerly Unilever Centre for Molecular Science Informatics (UCMSI), at the University of Cambridge is a world-leading driving force in the field of cheminformatics. Since its opening in 2000 more than 300 scientific articles have fundamentally changed the field of molecular informatics. The Centre has been a key player in promoting open chemical data and semantic access. Though mainly focussing on basic research, close collaborations with industrial partners ensured real world feedback and access to high quality molecular data. A variety of tools and standard protocols have been developed and are ubiquitous in the daily practice of cheminformatics. Here, we present a retrospective of cheminformatics research performed at the UCMSI, thereby highlighting historical and recent trends in the field as well as indicating future directions.
Collapse
Affiliation(s)
- Julian E Fuchs
- Centre for Molecular Informatics, Department of Chemistry, University of CambridgeLensfield Road, Cambridge CB2 1EW, UK phone/fax: +44 (0)1223 336472/+44 (0)1223 763076
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of CambridgeLensfield Road, Cambridge CB2 1EW, UK phone/fax: +44 (0)1223 336472/+44 (0)1223 763076
| | - Robert C Glen
- Centre for Molecular Informatics, Department of Chemistry, University of CambridgeLensfield Road, Cambridge CB2 1EW, UK phone/fax: +44 (0)1223 336472/+44 (0)1223 763076
| |
Collapse
|
19
|
Moss WN, Steitz JA. In silico discovery and modeling of non-coding RNA structure in viruses. Methods 2015; 91:48-56. [PMID: 26116541 DOI: 10.1016/j.ymeth.2015.06.015] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2015] [Revised: 06/17/2015] [Accepted: 06/22/2015] [Indexed: 11/30/2022] Open
Abstract
This review covers several computational methods for discovering structured non-coding RNAs in viruses and modeling their putative secondary structures. Here we will use examples from two target viruses to highlight these approaches: influenza A virus-a relatively small, segmented RNA virus; and Epstein-Barr virus-a relatively large DNA virus with a complex transcriptome. Each system has unique challenges to overcome and unique characteristics to exploit. From these particular cases, generically useful approaches can be derived for the study of additional viral targets.
Collapse
Affiliation(s)
- Walter N Moss
- Department of Molecular Biophysics and Biochemistry, Howard Hughes Medical Institute, Yale University School of Medicine, New Haven, CT 06536, USA
| | - Joan A Steitz
- Department of Molecular Biophysics and Biochemistry, Howard Hughes Medical Institute, Yale University School of Medicine, New Haven, CT 06536, USA.
| |
Collapse
|
20
|
Holliday GL, Bairoch A, Bagos PG, Chatonnet A, Craik DJ, Finn RD, Henrissat B, Landsman D, Manning G, Nagano N, O’Donovan C, Pruitt KD, Rawlings ND, Saier M, Sowdhamini R, Spedding M, Srinivasan N, Vriend G, Babbitt PC, Bateman A. Key challenges for the creation and maintenance of specialist protein resources. Proteins 2015; 83:1005-13. [PMID: 25820941 PMCID: PMC4446195 DOI: 10.1002/prot.24803] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2015] [Revised: 03/06/2015] [Accepted: 03/20/2015] [Indexed: 11/12/2022]
Abstract
As the volume of data relating to proteins increases, researchers rely more and more on the analysis of published data, thus increasing the importance of good access to these data that vary from the supplemental material of individual articles, all the way to major reference databases with professional staff and long-term funding. Specialist protein resources fill an important middle ground, providing interactive web interfaces to their databases for a focused topic or family of proteins, using specialized approaches that are not feasible in the major reference databases. Many are labors of love, run by a single lab with little or no dedicated funding and there are many challenges to building and maintaining them. This perspective arose from a meeting of several specialist protein resources and major reference databases held at the Wellcome Trust Genome Campus (Cambridge, UK) on August 11 and 12, 2014. During this meeting some common key challenges involved in creating and maintaining such resources were discussed, along with various approaches to address them. In laying out these challenges, we aim to inform users about how these issues impact our resources and illustrate ways in which our working together could enhance their accuracy, currency, and overall value.
Collapse
Affiliation(s)
- Gemma L Holliday
- Department of Bioengineering and Therapeutic Sciences, University of CaliforniaSan Francisco, California, 94158
| | - Amos Bairoch
- SIB—Swiss Institute of Bioinformatics, University of GenevaGeneva, Switzerland
| | - Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of ThessalyLamia, 35100, Greece
| | - Arnaud Chatonnet
- INRA, Umr866 Dynamique Musculaire Et MétabolismeMontpellier, F-34000, France
- Université MontpellierMontpellier, F-34000, France
| | - David J Craik
- Institute for Molecular Bioscience. The University of QueenslandBrisbane, Queensland, 4072, Australia
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI)Wellcome Trust Genome Campus, Hinxton, Cambridge, Cb10 1SD, United Kingdom
| | - Bernard Henrissat
- Architecture Et Fonction Des Macromolécules Biologiques, CNRS, Aix-Marseille UniversitéMarseille, 13288, France
- Department of Biological Sciences, King Abdulaziz UniversityJeddah, Saudi Arabia
| | - David Landsman
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesda, Maryland, 20892
| | - Gerard Manning
- Department of Bioinformatics & Computational Biology, Genentech1 DNA Way, South San Francisco, California, 98010
| | - Nozomi Nagano
- Computational Biology Research Center, National Institute of Advanced Industrial Science and TechnologyTokyo, 135-0064, Japan
| | - Claire O’Donovan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI)Wellcome Trust Genome Campus, Hinxton, Cambridge, Cb10 1SD, United Kingdom
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesda, Maryland, 20892
| | - Neil D Rawlings
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI)Wellcome Trust Genome Campus, Hinxton, Cambridge, Cb10 1SD, United Kingdom
- Wellcome Trust Sanger InstituteWellcome Trust Genome Campus, Hinxton, Cambridge, Cb10 1SD, United Kingdom
| | - Milton Saier
- Department of Molecular Biology, University of California at San DiegoLa Jolla, California, 92093
| | - Ramanathan Sowdhamini
- National Centre for Biological Sciences, TIFRGKVK Campus, Bellary Road, Bangalore, 560065, India
| | - Michael Spedding
- Chair NC-IUPHAR, Spedding Research Solutions SARL6 Rue Ampere, Le Vesinet, 78110, France
| | | | - Gert Vriend
- Centre for Molecular and Biomolecular Informatics (CMBI), Radboud University Medical Center, Geert Grooteplein Zuid 26-28, 6525 GANijmegen, The Netherlands
| | - Patricia C Babbitt
- Department of Bioengineering and Therapeutic Sciences, University of CaliforniaSan Francisco, California, 94158
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI)Wellcome Trust Genome Campus, Hinxton, Cambridge, Cb10 1SD, United Kingdom
| |
Collapse
|
21
|
Trends in IT Innovation to Build a Next Generation Bioinformatics Solution to Manage and Analyse Biological Big Data Produced by NGS Technologies. BIOMED RESEARCH INTERNATIONAL 2015; 2015:904541. [PMID: 26125026 PMCID: PMC4466500 DOI: 10.1155/2015/904541] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2014] [Revised: 04/01/2015] [Accepted: 04/01/2015] [Indexed: 02/07/2023]
Abstract
Sequencing the human genome began in 1994, and 10 years of work were necessary in order to provide a nearly complete sequence. Nowadays, NGS technologies allow sequencing of a whole human genome in a few days. This deluge of data challenges scientists in many ways, as they are faced with data management issues and analysis and visualization drawbacks due to the limitations of current bioinformatics tools. In this paper, we describe how the NGS Big Data revolution changes the way of managing and analysing data. We present how biologists are confronted with abundance of methods, tools, and data formats. To overcome these problems, focus on Big Data Information Technology innovations from web and business intelligence. We underline the interest of NoSQL databases, which are much more efficient than relational databases. Since Big Data leads to the loss of interactivity with data during analysis due to high processing time, we describe solutions from the Business Intelligence that allow one to regain interactivity whatever the volume of data is. We illustrate this point with a focus on the Amadea platform. Finally, we discuss visualization challenges posed by Big Data and present the latest innovations with JavaScript graphic libraries.
Collapse
|
22
|
ImmuSort, a database on gene plasticity and electronic sorting for immune cells. Sci Rep 2015; 5:10370. [PMID: 25988315 PMCID: PMC4437374 DOI: 10.1038/srep10370] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2014] [Accepted: 04/09/2015] [Indexed: 01/15/2023] Open
Abstract
Gene expression is highly dynamic and plastic. We present a new immunological database, ImmuSort. Unlike other gene expression databases, ImmuSort provides a convenient way to view global differential gene expression data across thousands of experimental conditions in immune cells. It enables electronic sorting, which is a bioinformatics process to retrieve cell states associated with specific experimental conditions that are mainly based on gene expression intensity. A comparison of gene expression profiles reveals other applications, such as the evaluation of immune cell biomarkers and cell subsets, identification of cell specific and/or disease-associated genes or transcripts, comparison of gene expression in different transcript variants and probe set quality evaluation. A plasticity score is introduced to measure gene plasticity. Average rank and marker evaluation scores are used to evaluate biomarkers. The current version includes 31 human and 17 mouse immune cell groups, comprising 10,422 and 3,929 microarrays derived from public databases, respectively. A total of 20,283 human and 20,963 mouse genes are available to query in the database. Examples show the distinct advantages of the database. The database URL is http://202.85.212.211/Account/ImmuSort.html.
Collapse
|
23
|
Evans DM, Davey Smith G. Mendelian Randomization: New Applications in the Coming Age of Hypothesis-Free Causality. Annu Rev Genomics Hum Genet 2015; 16:327-50. [PMID: 25939054 DOI: 10.1146/annurev-genom-090314-050016] [Citation(s) in RCA: 230] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Mendelian randomization (MR) is an approach that uses genetic variants associated with a modifiable exposure or biological intermediate to estimate the causal relationship between these variables and a medically relevant outcome. Although it was initially developed to examine the relationship between modifiable exposures/biomarkers and disease, its use has expanded to encompass applications in molecular epidemiology, systems biology, pharmacogenomics, and many other areas. The purpose of this review is to introduce MR, the principles behind the approach, and its limitations. We consider some of the new applications of the methodology, including informing drug development, and comment on some promising extensions, including two-step, two-sample, and bidirectional MR. We show how these new methods can be combined to efficiently examine causality in complex biological networks and provide a new framework to data mine high-dimensional studies as we transition into the age of hypothesis-free causality.
Collapse
Affiliation(s)
- David M Evans
- University of Queensland Diamantina Institute, Translational Research Institute, Brisbane, Queensland 4102, Australia;
| | | |
Collapse
|
24
|
Ignatieva EV, Podkolodnaya OA, Orlov YL, Vasiliev GV, Kolchanov NA. Regulatory genomics: Combined experimental and computational approaches. RUSS J GENET+ 2015. [DOI: 10.1134/s1022795415040067] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
25
|
Linde J, Schulze S, Henkel SG, Guthke R. Data- and knowledge-based modeling of gene regulatory networks: an update. EXCLI JOURNAL 2015; 14:346-78. [PMID: 27047314 PMCID: PMC4817425 DOI: 10.17179/excli2015-168] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Accepted: 02/10/2015] [Indexed: 02/01/2023]
Abstract
Gene regulatory network inference is a systems biology approach which predicts interactions between genes with the help of high-throughput data. In this review, we present current and updated network inference methods focusing on novel techniques for data acquisition, network inference assessment, network inference for interacting species and the integration of prior knowledge. After the advance of Next-Generation-Sequencing of cDNAs derived from RNA samples (RNA-Seq) we discuss in detail its application to network inference. Furthermore, we present progress for large-scale or even full-genomic network inference as well as for small-scale condensed network inference and review advances in the evaluation of network inference methods by crowdsourcing. Finally, we reflect the current availability of data and prior knowledge sources and give an outlook for the inference of gene regulatory networks that reflect interacting species, in particular pathogen-host interactions.
Collapse
Affiliation(s)
- Jörg Linde
- Research Group Systems Biology / Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| | - Sylvie Schulze
- Research Group Systems Biology / Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| | | | - Reinhard Guthke
- Research Group Systems Biology / Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute, Beutenbergstr. 11a, 07745 Jena, Germany
| |
Collapse
|
26
|
Abstract
Behaviours of complex biomolecular systems are often irreducible to the elementary properties of their individual components. Explanatory and predictive mathematical models are therefore useful for fully understanding and precisely engineering cellular functions. The development and analyses of these models require their adaptation to the problems that need to be solved and the type and amount of available genetic or molecular data. Quantitative and logic modelling are among the main methods currently used to model molecular and gene networks. Each approach comes with inherent advantages and weaknesses. Recent developments show that hybrid approaches will become essential for further progress in synthetic biology and in the development of virtual organisms.
Collapse
Affiliation(s)
- Nicolas Le Novère
- Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, UK
| |
Collapse
|
27
|
Zou D, Ma L, Yu J, Zhang Z. Biological databases for human research. GENOMICS PROTEOMICS & BIOINFORMATICS 2015; 13:55-63. [PMID: 25712261 PMCID: PMC4411498 DOI: 10.1016/j.gpb.2015.01.006] [Citation(s) in RCA: 69] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 01/01/2015] [Revised: 01/16/2015] [Accepted: 01/16/2015] [Indexed: 01/01/2023]
Abstract
The completion of the Human Genome Project lays a foundation for systematically studying the human genome from evolutionary history to precision medicine against diseases. With the explosive growth of biological data, there is an increasing number of biological databases that have been developed in aid of human-related research. Here we present a collection of human-related biological databases and provide a mini-review by classifying them into different categories according to their data types. As human-related databases continue to grow not only in count but also in volume, challenges are ahead in big data storage, processing, exchange and curation.
Collapse
Affiliation(s)
- Dong Zou
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Lina Ma
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Jun Yu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.
| | - Zhang Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.
| |
Collapse
|
28
|
Esch M, Chen J, Colmsee C, Klapperstück M, Grafahrend-Belau E, Scholz U, Lange M. LAILAPS: the plant science search engine. PLANT & CELL PHYSIOLOGY 2015; 56:e8. [PMID: 25480116 PMCID: PMC4301746 DOI: 10.1093/pcp/pcu185] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
With the number of sequenced plant genomes growing, the number of predicted genes and functional annotations is also increasing. The association between genes and phenotypic traits is currently of great interest. Unfortunately, the information available today is widely scattered over a number of different databases. Information retrieval (IR) has become an all-encompassing bioinformatics methodology for extracting knowledge from complex, heterogeneous and distributed databases, and therefore can be a useful tool for obtaining a comprehensive view of plant genomics, from genes to traits. Here we describe LAILAPS (http://lailaps.ipk-gatersleben.de), an IR system designed to link plant genomic data in the context of phenotypic attributes for a detailed forward genetic research. LAILAPS comprises around 65 million indexed documents, encompassing >13 major life science databases with around 80 million links to plant genomic resources. The LAILAPS search engine allows fuzzy querying for candidate genes linked to specific traits over a loosely integrated system of indexed and interlinked genome databases. Query assistance and an evidence-based annotation system enable time-efficient and comprehensive information retrieval. An artificial neural network incorporating user feedback and behavior tracking allows relevance sorting of results. We fully describe LAILAPS's functionality and capabilities by comparing this system's performance with other widely used systems and by reporting both a validation in maize and a knowledge discovery use-case focusing on candidate genes in barley.
Collapse
Affiliation(s)
- Maria Esch
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstr. 3, D-06466 Stadt Seeland, Germany
| | - Jinbo Chen
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstr. 3, D-06466 Stadt Seeland, Germany
| | - Christian Colmsee
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstr. 3, D-06466 Stadt Seeland, Germany
| | - Matthias Klapperstück
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstr. 3, D-06466 Stadt Seeland, Germany
| | - Eva Grafahrend-Belau
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstr. 3, D-06466 Stadt Seeland, Germany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstr. 3, D-06466 Stadt Seeland, Germany
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstr. 3, D-06466 Stadt Seeland, Germany
| |
Collapse
|
29
|
Johnston L, Thompson R, Turner C, Bushby K, Lochmüller H, Straub V. The impact of integrated omics technologies for patients with rare diseases. Expert Opin Orphan Drugs 2014. [DOI: 10.1517/21678707.2014.974554] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
30
|
Daiger SP, Bowne SJ, Sullivan LS. Genes and Mutations Causing Autosomal Dominant Retinitis Pigmentosa. Cold Spring Harb Perspect Med 2014; 5:a017129. [PMID: 25304133 PMCID: PMC4588133 DOI: 10.1101/cshperspect.a017129] [Citation(s) in RCA: 87] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Retinitis pigmentosa (RP) has a prevalence of approximately one in 4000; 25%-30% of these cases are autosomal dominant retinitis pigmentosa (adRP). Like other forms of inherited retinal disease, adRP is exceptionally heterogeneous. Mutations in more than 25 genes are known to cause adRP, more than 1000 mutations have been reported in these genes, clinical findings are highly variable, and there is considerable overlap with other types of inherited disease. Currently, it is possible to detect disease-causing mutations in 50%-75% of adRP families in select populations. Genetic diagnosis of adRP has advantages over other forms of RP because segregation of disease in families is a useful tool for identifying and confirming potentially pathogenic variants, but there are disadvantages too. In addition to identifying the cause of disease in the remaining 25% of adRP families, a central challenge is reconciling clinical diagnosis, family history, and molecular findings in patients and families.
Collapse
Affiliation(s)
- Stephen P Daiger
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center, Houston, Texas 77030
| | - Sara J Bowne
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center, Houston, Texas 77030
| | - Lori S Sullivan
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center, Houston, Texas 77030
| |
Collapse
|
31
|
Shirai H, Prades C, Vita R, Marcatili P, Popovic B, Xu J, Overington JP, Hirayama K, Soga S, Tsunoyama K, Clark D, Lefranc MP, Ikeda K. Antibody informatics for drug discovery. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2014; 1844:2002-2015. [PMID: 25110827 DOI: 10.1016/j.bbapap.2014.07.006] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 03/27/2014] [Revised: 07/04/2014] [Accepted: 07/11/2014] [Indexed: 10/24/2022]
Abstract
More and more antibody therapeutics are being approved every year, mainly due to their high efficacy and antigen selectivity. However, it is still difficult to identify the antigen, and thereby the function, of an antibody if no other information is available. There are obstacles inherent to the antibody science in every project in antibody drug discovery. Recent experimental technologies allow for the rapid generation of large-scale data on antibody sequences, affinity, potency, structures, and biological functions; this should accelerate drug discovery research. Therefore, a robust bioinformatic infrastructure for these large data sets has become necessary. In this article, we first identify and discuss the typical obstacles faced during the antibody drug discovery process. We then summarize the current status of three sub-fields of antibody informatics as follows: (i) recent progress in technologies for antibody rational design using computational approaches to affinity and stability improvement, as well as ab-initio and homology-based antibody modeling; (ii) resources for antibody sequences, structures, and immune epitopes and open drug discovery resources for development of antibody drugs; and (iii) antibody numbering and IMGT. Here, we review "antibody informatics," which may integrate the above three fields so that bridging the gaps between industrial needs and academic solutions can be accelerated. This article is part of a Special Issue entitled: Recent advances in molecular engineering of antibody.
Collapse
Affiliation(s)
- Hiroki Shirai
- Molecular Medicine Research Laboratories, Drug Discovery Research, Astellas Pharma Inc., 21, Miyukigaoka, Tsukuba-shi, Ibaraki 305-8585, Japan
| | - Catherine Prades
- Global Biotherapeutics, Bioinformatics, Sanofi-Aventis Recherche & Développement, Centre de recherche Vitry-sur-Seine, 13, quai Jules Guesde, BP 14, 94403 Vitry-sur-Seine Cedex, France
| | - Randi Vita
- Immune Epitope Database and Analysis Project, La Jolla Institute for Allergy & Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA
| | - Paolo Marcatili
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Anker Engelunds Vej 1, 2800 Lyngby, Denmark
| | - Bojana Popovic
- MedImmune Ltd, Milstein Building, Granta Park, Cambridge CB21 6GH, UK
| | - Jianqing Xu
- MedImmune Ltd, Milstein Building, Granta Park, Cambridge CB21 6GH, UK
| | - John P Overington
- The EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kazunori Hirayama
- Molecular Medicine Research Laboratories, Drug Discovery Research, Astellas Pharma Inc., 21, Miyukigaoka, Tsukuba-shi, Ibaraki 305-8585, Japan
| | - Shinji Soga
- Molecular Medicine Research Laboratories, Drug Discovery Research, Astellas Pharma Inc., 21, Miyukigaoka, Tsukuba-shi, Ibaraki 305-8585, Japan
| | - Kazuhisa Tsunoyama
- Molecular Medicine Research Laboratories, Drug Discovery Research, Astellas Pharma Inc., 21, Miyukigaoka, Tsukuba-shi, Ibaraki 305-8585, Japan
| | - Dominic Clark
- The EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Marie-Paule Lefranc
- IMGT®, the international ImMunoGeneTics information system®, Laboratoire d'ImmunoGénétique Moléculaire (LIGM), Université Montpellier 2, Institut de Génétique Humaine, UPR CNRS 1142, 141 rue de la Cardonille, 34396 Montpellier Cedex 5, France
| | - Kazuyoshi Ikeda
- The EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| |
Collapse
|
32
|
Zhang P, Brusic V. Mathematical modeling for novel cancer drug discovery and development. Expert Opin Drug Discov 2014; 9:1133-50. [PMID: 25062617 DOI: 10.1517/17460441.2014.941351] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
INTRODUCTION Mathematical modeling enables: the in silico classification of cancers, the prediction of disease outcomes, optimization of therapy, identification of promising drug targets and prediction of resistance to anticancer drugs. In silico pre-screened drug targets can be validated by a small number of carefully selected experiments. AREAS COVERED This review discusses the basics of mathematical modeling in cancer drug discovery and development. The topics include in silico discovery of novel molecular drug targets, optimization of immunotherapies, personalized medicine and guiding preclinical and clinical trials. Breast cancer has been used to demonstrate the applications of mathematical modeling in cancer diagnostics, the identification of high-risk population, cancer screening strategies, prediction of tumor growth and guiding cancer treatment. EXPERT OPINION Mathematical models are the key components of the toolkit used in the fight against cancer. The combinatorial complexity of new drugs discovery is enormous, making systematic drug discovery, by experimentation, alone difficult if not impossible. The biggest challenges include seamless integration of growing data, information and knowledge, and making them available for a multiplicity of analyses. Mathematical models are essential for bringing cancer drug discovery into the era of Omics, Big Data and personalized medicine.
Collapse
Affiliation(s)
- Ping Zhang
- CSIRO Computational Informatics , Marsfield, NSW , Australia
| | | |
Collapse
|
33
|
Zhang Z, Zhu W, Luo J. Bringing biocuration to China. GENOMICS PROTEOMICS & BIOINFORMATICS 2014; 12:153-5. [PMID: 25042682 PMCID: PMC4411340 DOI: 10.1016/j.gpb.2014.07.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Accepted: 07/10/2014] [Indexed: 11/30/2022]
Abstract
Biocuration involves adding value to biomedical data by the processes of standardization, quality control and information transferring (also known as data annotation). It enhances data interoperability and consistency, and is critical in translating biomedical data into scientific discovery. Although China is becoming a leading scientific data producer, biocuration is still very new to the Chinese biomedical data community. In fact, there currently lacks an equivalent acknowledged word in Chinese for the word “curation”. Here we propose its Chinese translation as “审编” (Pinyin: shěn biān), based on its implied meanings taken by biomedical data community. The 8th International Biocuration Conference to be held in China (http://biocuration2015.tilsi.org) next year bears the potential to raise the general awareness in China of the significant role of biocuration in scientific discovery. However, challenges are ahead in its implementation.
Collapse
Affiliation(s)
- Zhang Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.
| | - Weimin Zhu
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Beijing 100730, China; Taicang Institute of Life Sciences Information, Taicang 215400, China
| | - Jingchu Luo
- College of Life Sciences and Center for Bioinformatics, Peking University, Beijing 100871, China
| |
Collapse
|
34
|
Henry VJ, Bandrowski AE, Pepin AS, Gonzalez BJ, Desfeux A. OMICtools: an informative directory for multi-omic data analysis. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau069. [PMID: 25024350 PMCID: PMC4095679 DOI: 10.1093/database/bau069] [Citation(s) in RCA: 135] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Recent advances in ‘omic’ technologies have created unprecedented opportunities for biological research, but current software and database resources are extremely fragmented. OMICtools is a manually curated metadatabase that provides an overview of more than 4400 web-accessible tools related to genomics, transcriptomics, proteomics and metabolomics. All tools have been classified by omic technologies (next-generation sequencing, microarray, mass spectrometry and nuclear magnetic resonance) associated with published evaluations of tool performance. Information about each tool is derived either from a diverse set of developers, the scientific literature or from spontaneous submissions. OMICtools is expected to serve as a useful didactic resource not only for bioinformaticians but also for experimental researchers and clinicians. Database URL:http://omictools.com/
Collapse
Affiliation(s)
- Vincent J Henry
- Haute-Normandie-INSERM ERI-28, Institute for Research and Innovation in Biomedicine of Rouen University, 76183 Rouen, France, Center for Research in Biological Systems, University of California, San Diego, 9500 Gilman Dr. La Jolla, CA 92093, USA and STATSARRAY, 76300 Sotteville-lès-Rouen, France
| | - Anita E Bandrowski
- Haute-Normandie-INSERM ERI-28, Institute for Research and Innovation in Biomedicine of Rouen University, 76183 Rouen, France, Center for Research in Biological Systems, University of California, San Diego, 9500 Gilman Dr. La Jolla, CA 92093, USA and STATSARRAY, 76300 Sotteville-lès-Rouen, France
| | - Anne-Sophie Pepin
- Haute-Normandie-INSERM ERI-28, Institute for Research and Innovation in Biomedicine of Rouen University, 76183 Rouen, France, Center for Research in Biological Systems, University of California, San Diego, 9500 Gilman Dr. La Jolla, CA 92093, USA and STATSARRAY, 76300 Sotteville-lès-Rouen, France
| | - Bruno J Gonzalez
- Haute-Normandie-INSERM ERI-28, Institute for Research and Innovation in Biomedicine of Rouen University, 76183 Rouen, France, Center for Research in Biological Systems, University of California, San Diego, 9500 Gilman Dr. La Jolla, CA 92093, USA and STATSARRAY, 76300 Sotteville-lès-Rouen, France
| | - Arnaud Desfeux
- Haute-Normandie-INSERM ERI-28, Institute for Research and Innovation in Biomedicine of Rouen University, 76183 Rouen, France, Center for Research in Biological Systems, University of California, San Diego, 9500 Gilman Dr. La Jolla, CA 92093, USA and STATSARRAY, 76300 Sotteville-lès-Rouen, France
| |
Collapse
|
35
|
Minie M, Chopra G, Sethi G, Horst J, White G, Roy A, Hatti K, Samudrala R. CANDO and the infinite drug discovery frontier. Drug Discov Today 2014; 19:1353-63. [PMID: 24980786 DOI: 10.1016/j.drudis.2014.06.018] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2014] [Revised: 06/18/2014] [Accepted: 06/19/2014] [Indexed: 12/21/2022]
Abstract
The Computational Analysis of Novel Drug Opportunities (CANDO) platform (http://protinfo.org/cando) uses similarity of compound-proteome interaction signatures to infer homology of compound/drug behavior. We constructed interaction signatures for 3733 human ingestible compounds covering 48,278 protein structures mapping to 2030 indications based on basic science methodologies to predict and analyze protein structure, function, and interactions developed by us and others. Our signature comparison and ranking approach yielded benchmarking accuracies of 12-25% for 1439 indications with at least two approved compounds. We prospectively validated 49/82 'high value' predictions from nine studies covering seven indications, with comparable or better activity to existing drugs, which serve as novel repurposed therapeutics. Our approach may be generalized to compounds beyond those approved by the FDA, and can also consider mutations in protein structures to enable personalization. Our platform provides a holistic multiscale modeling framework of complex atomic, molecular, and physiological systems with broader applications in medicine and engineering.
Collapse
Affiliation(s)
- Mark Minie
- University of Washington, Department of Bioengineering, Seattle, WA 98109, United States
| | - Gaurav Chopra
- University of Washington, Department of Microbiology, Seattle, WA 98109, United States; University of California, San Francisco, Diabetes Center, San Francisco, CA 94143, United States
| | - Geetika Sethi
- University of Washington, Department of Microbiology, Seattle, WA 98109, United States
| | - Jeremy Horst
- University of California, School of Medicine, San Francisco, CA 94143, United States
| | - George White
- University of Washington, Department of Microbiology, Seattle, WA 98109, United States
| | - Ambrish Roy
- Georgia Institute of Technology, Center for the Study of Systems Biology, Atlanta, GA 30318, United States
| | - Kaushik Hatti
- Molecular Biophysics Unit, Indian Institute of Science Bangalore, 560012, India
| | - Ram Samudrala
- University of Washington, Department of Microbiology, Seattle, WA 98109, United States.
| |
Collapse
|
36
|
Fischer AHL, Mozzherin D, Eren AM, Lans KD, Wilson N, Cosentino C, Smith J. SeaBase: a multispecies transcriptomic resource and platform for gene network inference. Integr Comp Biol 2014; 54:250-63. [PMID: 24907201 DOI: 10.1093/icb/icu065] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Marine and aquatic animals are extraordinarily useful as models for identifying mechanisms of development and evolution, regeneration, resistance to cancer, longevity and symbiosis, among many other areas of research. This is due to the great diversity of these organisms and their wide-ranging capabilities. Genomics tools are essential for taking advantage of these "free lessons" of nature. However, genomics and transcriptomics are challenging in emerging model systems. Here, we present SeaBase, a tool for helping to meet these needs. Specifically, SeaBase provides a platform for sharing and searching transcriptome data. More importantly, SeaBase will support a growing number of tools for inferring gene network mechanisms. The first dataset available on SeaBase is a developmental transcriptomic profile of the sea anemone Nematostella vectensis (Anthozoa, Cnidaria). Additional datasets are currently being prepared and we are aiming to expand SeaBase to include user-supplied data for any number of marine and aquatic organisms, thereby supporting many potentially new models for gene network studies. SeaBase can be accessed online at: http://seabase.core.cli.mbl.edu.
Collapse
Affiliation(s)
- Antje H L Fischer
- *Marine Biological Laboratory, Woods Hole, MA 02543, USA; Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA; Systems & Control Engineering, University of Magna Graecia, 88100 Catanzaro, Italy*Marine Biological Laboratory, Woods Hole, MA 02543, USA; Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA; Systems & Control Engineering, University of Magna Graecia, 88100 Catanzaro, Italy
| | - Dmitry Mozzherin
- *Marine Biological Laboratory, Woods Hole, MA 02543, USA; Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA; Systems & Control Engineering, University of Magna Graecia, 88100 Catanzaro, Italy
| | - A Murat Eren
- *Marine Biological Laboratory, Woods Hole, MA 02543, USA; Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA; Systems & Control Engineering, University of Magna Graecia, 88100 Catanzaro, Italy
| | - Kristen D Lans
- *Marine Biological Laboratory, Woods Hole, MA 02543, USA; Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA; Systems & Control Engineering, University of Magna Graecia, 88100 Catanzaro, Italy
| | - Nathan Wilson
- *Marine Biological Laboratory, Woods Hole, MA 02543, USA; Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA; Systems & Control Engineering, University of Magna Graecia, 88100 Catanzaro, Italy
| | - Carlo Cosentino
- *Marine Biological Laboratory, Woods Hole, MA 02543, USA; Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA; Systems & Control Engineering, University of Magna Graecia, 88100 Catanzaro, Italy
| | - Joel Smith
- *Marine Biological Laboratory, Woods Hole, MA 02543, USA; Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA; Systems & Control Engineering, University of Magna Graecia, 88100 Catanzaro, Italy
| |
Collapse
|
37
|
Abstract
The movement to bring datasets into the scholarly record as first class research products (validated, preserved, cited, and credited) has been inching forward for some time, but now the pace is quickening. As data publication venues proliferate, significant debate continues over formats, processes, and terminology. Here, we present an overview of data publication initiatives underway and the current conversation, highlighting points of consensus and issues still in contention. Data publication implementations differ in a variety of factors, including the kind of documentation, the location of the documentation relative to the data, and how the data is validated. Publishers may present data as supplemental material to a journal article, with a descriptive "data paper," or independently. Complicating the situation, different initiatives and communities use the same terms to refer to distinct but overlapping concepts. For instance, the term published means that the data is publicly available and citable to virtually everyone, but it may or may not imply that the data has been peer-reviewed. In turn, what is meant by data peer review is far from defined; standards and processes encompass the full range employed in reviewing the literature, plus some novel variations. Basic data citation is a point of consensus, but the general agreement on the core elements of a dataset citation frays if the data is dynamic or part of a larger set. Even as data publication is being defined, some are looking past publication to other metaphors, notably "data as software," for solutions to the more stubborn problems.
Collapse
Affiliation(s)
- John Kratz
- California Digital Library, University of California Office of the President, Oakland, CA, 94612, USA
| | - Carly Strasser
- California Digital Library, University of California Office of the President, Oakland, CA, 94612, USA
| |
Collapse
|
38
|
Sauter D. Counteraction of the multifunctional restriction factor tetherin. Front Microbiol 2014; 5:163. [PMID: 24782851 PMCID: PMC3989765 DOI: 10.3389/fmicb.2014.00163] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2014] [Accepted: 03/26/2014] [Indexed: 01/28/2023] Open
Abstract
The interferon-inducible restriction factor tetherin (also known as CD317, BST-2 or HM1.24) has emerged as a key component of the antiviral immune response. Initially, tetherin was shown to restrict replication of various enveloped viruses by inhibiting the release of budding virions from infected cells. More recently, it has become clear that tetherin also acts as a pattern recognition receptor inducing NF-κB-dependent proinflammatory gene expression in virus infected cells. Whereas the ability to restrict virion release is highly conserved among mammalian tetherin orthologs and thus probably an ancient function of this protein, innate sensing seems to be an evolutionarily recent activity. The potent and broad antiviral activity of tetherin is reflected by the fact that many viruses evolved means to counteract this restriction factor. A continuous arms race with viruses has apparently driven the evolution of different isoforms of tetherin with different functional properties. Interestingly, tetherin has also been implicated in cellular processes that are unrelated to immunity, such as the organization of the apical actin network and membrane microdomains or stabilization of the Golgi apparatus. In this review, I summarize our current knowledge of the different functions of tetherin and describe the molecular strategies that viruses have evolved to antagonize or evade this multifunctional host restriction factor.
Collapse
Affiliation(s)
- Daniel Sauter
- Institute of Molecular Virology, Ulm University Medical Center Ulm, Germany
| |
Collapse
|
39
|
Galperin MY, Koonin EV. Comparative Genomics Approaches to Identifying Functionally Related Genes. ALGORITHMS FOR COMPUTATIONAL BIOLOGY 2014. [DOI: 10.1007/978-3-319-07953-0_1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
40
|
Chaudhri VK, Elenius D, Goldenkranz A, Gong A, Martone ME, Webb W, Yorke-Smith N. Comparative analysis of knowledge representation and reasoning requirements across a range of life sciences textbooks. J Biomed Semantics 2014; 5:51. [PMID: 25785183 PMCID: PMC4362633 DOI: 10.1186/2041-1480-5-51] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2014] [Accepted: 11/26/2014] [Indexed: 11/29/2022] Open
Abstract
Background Using knowledge representation for biomedical projects is now commonplace. In previous work, we represented the knowledge found in a college-level biology textbook in a fashion useful for answering questions. We showed that embedding the knowledge representation and question-answering abilities in an electronic textbook helped to engage student interest and improve learning. A natural question that arises from this success, and this paper’s primary focus, is whether a similar approach is applicable across a range of life science textbooks. To answer that question, we considered four different textbooks, ranging from a below-introductory college biology text to an advanced, graduate-level neuroscience textbook. For these textbooks, we investigated the following questions: (1) To what extent is knowledge shared between the different textbooks? (2) To what extent can the same upper ontology be used to represent the knowledge found in different textbooks? (3) To what extent can the questions of interest for a range of textbooks be answered by using the same reasoning mechanisms? Results Our existing modeling and reasoning methods apply especially well both to a textbook that is comparable in level to the text studied in our previous work (i.e., an introductory-level text) and to a textbook at a lower level, suggesting potential for a high degree of portability. Even for the overlapping knowledge found across the textbooks, the level of detail covered in each textbook was different, which requires that the representations must be customized for each textbook. We also found that for advanced textbooks, representing models and scientific reasoning processes was particularly important. Conclusions With some additional work, our representation methodology would be applicable to a range of textbooks. The requirements for knowledge representation are common across textbooks, suggesting that a shared semantic infrastructure for the life sciences is feasible. Because our representation overlaps heavily with those already being used for biomedical ontologies, this work suggests a natural pathway to include such representations as part of the life sciences curriculum at different grade levels.
Collapse
Affiliation(s)
| | | | | | | | | | - William Webb
- Foothill Community College, Los Altos Hills, CA USA
| | - Neil Yorke-Smith
- American University of Beirut, Beirut, Lebanon ; University of Cambridge, Cambridge, UK
| |
Collapse
|