1
|
Reiser L, Bakker E, Subramaniam S, Chen X, Sawant S, Khosa K, Prithvi T, Berardini TZ. The Arabidopsis Information Resource in 2024. Genetics 2024; 227:iyae027. [PMID: 38457127 PMCID: PMC11075553 DOI: 10.1093/genetics/iyae027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 02/07/2024] [Indexed: 03/09/2024] Open
Abstract
Since 1999, The Arabidopsis Information Resource (www.arabidopsis.org) has been curating data about the Arabidopsis thaliana genome. Its primary focus is integrating experimental gene function information from the peer-reviewed literature and codifying it as controlled vocabulary annotations. Our goal is to produce a "gold standard" functional annotation set that reflects the current state of knowledge about the Arabidopsis genome. At the same time, the resource serves as a nexus for community-based collaborations aimed at improving data quality, access, and reuse. For the past decade, our work has been made possible by subscriptions from our global user base. This update covers our ongoing biocuration work, some of our modernization efforts that contribute to the first major infrastructure overhaul since 2011, the introduction of JBrowse2, and the resource's role in community activities such as organizing the structural reannotation of the genome. For gene function assessment, we used gene ontology annotations as a metric to evaluate: (1) what is currently known about Arabidopsis gene function and (2) the set of "unknown" genes. Currently, 74% of the proteome has been annotated to at least one gene ontology term. Of those loci, half have experimental support for at least one of the following aspects: molecular function, biological process, or cellular component. Our work sheds light on the genes for which we have not yet identified any published experimental data and have no functional annotation. Drawing attention to these unknown genes highlights knowledge gaps and potential sources of novel discoveries.
Collapse
|
2
|
Moore LR, Caspi R, Campbell DA, Casey JR, Crevecoeur S, Lea-Smith DJ, Long B, Omar NM, Paley SM, Schmelling NM, Torrado A, Zehr JP, Karp PD. CyanoCyc cyanobacterial web portal. Front Microbiol 2024; 15:1340413. [PMID: 38357349 PMCID: PMC10864581 DOI: 10.3389/fmicb.2024.1340413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 01/11/2024] [Indexed: 02/16/2024] Open
Abstract
CyanoCyc is a web portal that integrates an exceptionally rich database collection of information about cyanobacterial genomes with an extensive suite of bioinformatics tools. It was developed to address the needs of the cyanobacterial research and biotechnology communities. The 277 annotated cyanobacterial genomes currently in CyanoCyc are supplemented with computational inferences including predicted metabolic pathways, operons, protein complexes, and orthologs; and with data imported from external databases, such as protein features and Gene Ontology (GO) terms imported from UniProt. Five of the genome databases have undergone manual curation with input from more than a dozen cyanobacteria experts to correct errors and integrate information from more than 1,765 published articles. CyanoCyc has bioinformatics tools that encompass genome, metabolic pathway and regulatory informatics; omics data analysis; and comparative analyses, including visualizations of multiple genomes aligned at orthologous genes, and comparisons of metabolic networks for multiple organisms. CyanoCyc is a high-quality, reliable knowledgebase that accelerates scientists' work by enabling users to quickly find accurate information using its powerful set of search tools, to understand gene function through expert mini-reviews with citations, to acquire information quickly using its interactive visualization tools, and to inform better decision-making for fundamental and applied research.
Collapse
Affiliation(s)
| | - Ron Caspi
- SRI International, Menlo Park, CA, United States
| | | | - John R. Casey
- Lawrence Livermore National Laboratory, Physical and Life Sciences Directorate, Livermore, CA, United States
| | - Sophie Crevecoeur
- Watershed Hydrology and Ecology Research Division, Environment and Climate Change Canada, Burlington, ON, Canada
| | - David J. Lea-Smith
- School of Biological Sciences, University of East Anglia, Norwich, United Kingdom
| | - Bin Long
- Department of Plant Pathology and Microbiology, Texas A&M University, College Station, TX, United States
| | | | | | | | - Alejandro Torrado
- Institute of Plant Biochemistry and Photosynthesis, University of Seville and Spanish National Research Council, Sevilla, Spain
| | - Jonathan P. Zehr
- Ocean Sciences Department, University of California, Santa Cruz, Santa Cruz, CA, United States
| | | |
Collapse
|
3
|
Fahlgren N, Kapoor M, Yordanova G, Papatheodorou I, Waese J, Cole B, Harrison P, Ware D, Tickle T, Paten B, Burdett T, Elsik CG, Tuggle CK, Provart NJ. Toward a data infrastructure for the Plant Cell Atlas. PLANT PHYSIOLOGY 2023; 191:35-46. [PMID: 36200899 PMCID: PMC9806565 DOI: 10.1093/plphys/kiac468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 09/18/2022] [Indexed: 06/16/2023]
Abstract
We review how a data infrastructure for the Plant Cell Atlas might be built using existing infrastructure and platforms. The Human Cell Atlas has developed an extensive infrastructure for human and mouse single cell data, while the European Bioinformatics Institute has developed a Single Cell Expression Atlas, that currently houses several plant data sets. We discuss issues related to appropriate ontologies for describing a plant single cell experiment. We imagine how such an infrastructure will enable biologists and data scientists to glean new insights into plant biology in the coming decades, as long as such data are made accessible to the community in an open manner.
Collapse
Affiliation(s)
- Noah Fahlgren
- Donald Danforth Plant Science Center, Saint Louis, Missouri 63132, USA
| | - Muskan Kapoor
- Bioinformatics and Computational Biology Program, Department of Animal Science, Iowa State University, Ames, Iowa 50011, USA
| | | | | | - Jamie Waese
- Department of Cell and Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario M5S 3B2, Canada
| | - Benjamin Cole
- DOE-Joint Genome Institute, Lawrence Berkeley National Laboratory, 1, Cyclotron Road, Berkeley, California 94720, USA
| | - Peter Harrison
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Doreen Ware
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, New York 11724, USA
- USDA ARS NAA Robert W. Holley Center for Agriculture and Health, Ithaca, New York 14853, USA
| | - Timothy Tickle
- Data Sciences Platform, The Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, Massachusetts 02142, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, Baskin School of Engineering, 1156 High Street, Santa Cruz, California 95064, USA
| | - Tony Burdett
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Christine G Elsik
- Division of Animal Sciences/Division of Plant Science & Technology/Institute for Data Science & Informatics, University of Missouri, Columbia, Missouri 65211, USA
| | - Christopher K Tuggle
- Bioinformatics and Computational Biology Program, Department of Animal Science, Iowa State University, Ames, Iowa 50011, USA
| | - Nicholas J Provart
- Department of Cell and Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario M5S 3B2, Canada
| |
Collapse
|
4
|
Reiser L, Subramaniam S, Zhang P, Berardini T. Using the Arabidopsis Information Resource (TAIR) to Find Information About Arabidopsis Genes. Curr Protoc 2022; 2:e574. [PMID: 36200836 DOI: 10.1002/cpz1.574] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The Arabidopsis Information Resource (TAIR; http://arabidopsis.org) is a comprehensive web resource of Arabidopsis biology for plant scientists. TAIR curates and integrates information about genes, proteins, gene function, orthologs, gene expression, mutant phenotypes, biological materials such as clones and seed stocks, genetic markers, genetic and physical maps, genome organization, images of mutant plants, protein sub-cellular localizations, publications, and the research community. The various data types are extensively interconnected and can be accessed through a variety of web-based search and display tools. This article primarily focuses on some basic methods for searching, browsing, visualizing, and analyzing information about Arabidopsis genes and genomes. Additionally, we describe how members of the community can share data via JBrowse and the Generic Online Annotation Submission Tool (GOAT) in order to make their published research more accessible and visible. © 2022 Wiley Periodicals LLC. Basic Protocol 1: TAIR homepage, sitemap, and navigation Basic Protocol 2: Finding comprehensive information about Arabidopsis genes Basic Protocol 3: Using the Arabidopsis genome browser: JBrowse Basic Protocol 4: Using the Gene Ontology annotations for gene discovery and gene function analysis Basic Protocol 5: Using gene lists to download bulk datasets Basic Protocol 6: Using TAIR's analysis tools to find short sequences and motifs Basic Protocol 7: Using the TAIR generic online annotation tool (GOAT) to submit functional annotations for Arabidopsis (or any other species) genes Basic Protocol 8: Using PhyloGenes to visualize gene families and predict functions Basic Protocol 9: Using TAIR to browse Arabidopsis literature Basic Protocol 10: Using the synteny viewer to find and display syntenic regions from Arabidopsis and other plant species.
Collapse
Affiliation(s)
| | | | - Peifen Zhang
- Phoenix Bioinformatics, Newark, California, USA
- Computercraft, Washington, District of Columbia, Columbia, USA
| | | |
Collapse
|
5
|
Rodriguez-Esteban R. New reasons for biologists to write with a formal language. Database (Oxford) 2022; 2022:6600538. [PMID: 35657112 PMCID: PMC9216469 DOI: 10.1093/database/baac039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 03/18/2022] [Accepted: 05/17/2022] [Indexed: 12/03/2022]
Abstract
Current biological writing is afflicted by the use of ambiguous names, convoluted sentences, vague statements and narrative-fitted storylines. This represents a challenge for biological research in general and in particular for fields such as biological database curation and text mining, which have been tasked to cope with exponentially growing content. Improving the quality of biological writing by encouraging unambiguity and precision would foster expository discipline and machine reasoning. More specifically, the routine inclusion of formal languages in biological writing would improve our ability to describe, compile and model biology.
Collapse
Affiliation(s)
- Raul Rodriguez-Esteban
- Roche Pharmaceutical Research and Early Development, Roche Innovation Center Basel, Grenzacherstrasse 124 , Basel 4070, Switzerland
| |
Collapse
|
6
|
Freyre-González JA, Escorcia-Rodríguez JM, Gutiérrez-Mondragón LF, Martí-Vértiz J, Torres-Franco CN, Zorro-Aranda A. System Principles Governing the Organization, Architecture, Dynamics, and Evolution of Gene Regulatory Networks. Front Bioeng Biotechnol 2022; 10:888732. [PMID: 35646858 PMCID: PMC9135355 DOI: 10.3389/fbioe.2022.888732] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 04/27/2022] [Indexed: 11/21/2022] Open
Abstract
Synthetic biology aims to apply engineering principles for the rational, systematical design and construction of biological systems displaying functions that do not exist in nature or even building a cell from scratch. Understanding how molecular entities interconnect, work, and evolve in an organism is pivotal to this aim. Here, we summarize and discuss some historical organizing principles identified in bacterial gene regulatory networks. We propose a new layer, the concilion, which is the group of structural genes and their local regulators responsible for a single function that, organized hierarchically, coordinate a response in a way reminiscent of the deliberation and negotiation that take place in a council. We then highlight the importance that the network structure has, and discuss that the natural decomposition approach has unveiled the system-level elements shaping a common functional architecture governing bacterial regulatory networks. We discuss the incompleteness of gene regulatory networks and the need for network inference and benchmarking standardization. We point out the importance that using the network structural properties showed to improve network inference. We discuss the advances and controversies regarding the consistency between reconstructions of regulatory networks and expression data. We then discuss some perspectives on the necessity of studying regulatory networks, considering the interactions’ strength distribution, the challenges to studying these interactions’ strength, and the corresponding effects on network structure and dynamics. Finally, we explore the ability of evolutionary systems biology studies to provide insights into how evolution shapes functional architecture despite the high evolutionary plasticity of regulatory networks.
Collapse
Affiliation(s)
- Julio A Freyre-González
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Juan M Escorcia-Rodríguez
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Luis F Gutiérrez-Mondragón
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, México
- Undergraduate Program in Genomic Sciences, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Jerónimo Martí-Vértiz
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Camila N Torres-Franco
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Andrea Zorro-Aranda
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, México
- Department of Chemical Engineering, Universidad de Antioquia, Medellín, Colombia
| |
Collapse
|
7
|
Staton M, Cannon E, Sanderson LA, Wegrzyn J, Anderson T, Buehler S, Cobo-Simón I, Faaberg K, Grau E, Guignon V, Gunoskey J, Inderski B, Jung S, Lager K, Main D, Poelchau M, Ramnath R, Richter P, West J, Ficklin S. Tripal, a community update after 10 years of supporting open source, standards-based genetic, genomic and breeding databases. Brief Bioinform 2021; 22:6318561. [PMID: 34251419 PMCID: PMC8574961 DOI: 10.1093/bib/bbab238] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 05/28/2021] [Accepted: 06/01/2021] [Indexed: 12/01/2022] Open
Abstract
Online, open access databases for biological knowledge serve as central repositories for research communities to store, find and analyze integrated, multi-disciplinary datasets. With increasing volumes, complexity and the need to integrate genomic, transcriptomic, metabolomic, proteomic, phenomic and environmental data, community databases face tremendous challenges in ongoing maintenance, expansion and upgrades. A common infrastructure framework using community standards shared by many databases can reduce development burden, provide interoperability, ensure use of common standards and support long-term sustainability. Tripal is a mature, open source platform built to meet this need. With ongoing improvement since its first release in 2009, Tripal provides full functionality for searching, browsing, loading and curating numerous types of data and is a primary technology powering at least 31 publicly available databases spanning plants, animals and human data, primarily storing genomics, genetics and breeding data. Tripal software development is managed by a shared, inclusive governance structure including both project management and advisory teams. Here, we report on the most important and innovative aspects of Tripal after 11 years development, including integration of diverse types of biological data, successful collaborative projects across member databases, and support for implementing FAIR principles.
Collapse
Affiliation(s)
| | - Ethalinda Cannon
- USDA-ARS, Corn Insects and Crop Genetics Research Unit, Ames, IA USA
| | | | | | | | | | | | - Kay Faaberg
- USDA-ARS, National Animal Disease Center, Ames, IA, USA
| | - Emily Grau
- University of Connecticut, Storrs, CT USA
| | | | | | | | - Sook Jung
- Washington State University, Pullman, WA USA
| | - Kelly Lager
- USDA-ARS, National Animal Disease Center, Ames, IA, USA
| | - Dorrie Main
- Washington State University, Pullman, WA USA
| | - Monica Poelchau
- USDA-ARS, National Agricultural Library, Beltsville, MD, USA
| | | | | | - Joe West
- University of Tennessee, Knoxville, TN USA
| | | |
Collapse
|
8
|
Identifying protein subcellular localisation in scientific literature using bidirectional deep recurrent neural network. Sci Rep 2021; 11:1696. [PMID: 33462256 PMCID: PMC7813825 DOI: 10.1038/s41598-020-80441-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Accepted: 12/17/2020] [Indexed: 11/17/2022] Open
Abstract
The increased diversity and scale of published biological data has to led to a growing appreciation for the applications of machine learning and statistical methodologies to gain new insights. Key to achieving this aim is solving the Relationship Extraction problem which specifies the semantic interaction between two or more biological entities in a published study. Here, we employed two deep neural network natural language processing (NLP) methods, namely: the continuous bag of words (CBOW), and the bi-directional long short-term memory (bi-LSTM). These methods were employed to predict relations between entities that describe protein subcellular localisation in plants. We applied our system to 1700 published Arabidopsis protein subcellular studies from the SUBA manually curated dataset. The system combines pre-processing of full-text articles in a machine-readable format with relevant sentence extraction for downstream NLP analysis. Using the SUBA corpus, the neural network classifier predicted interactions between protein name, subcellular localisation and experimental methodology with an average precision, recall rate, accuracy and F1 scores of 95.1%, 82.8%, 89.3% and 88.4% respectively (n = 30). Comparable scoring metrics were obtained using the CropPAL database as an independent testing dataset that stores protein subcellular localisation in crop species, demonstrating wide applicability of prediction model. We provide a framework for extracting protein functional features from unstructured text in the literature with high accuracy, improving data dissemination and unlocking the potential of big data text analytics for generating new hypotheses.
Collapse
|
9
|
do Nascimento Fernandes de Souza E, Hawkins JA. Ewé: a web-based ethnobotanical database for storing and analysing data. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:5732429. [PMID: 32052012 PMCID: PMC7015817 DOI: 10.1093/database/baz144] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Revised: 11/28/2019] [Accepted: 11/29/2019] [Indexed: 12/19/2022]
Abstract
Ethnobotanical databases serve as repositories of traditional knowledge (TK), either at international or local scales. By documenting plant species with traditional use, and most importantly, the applications and modes of use of such species, ethnobotanical databases play a role in the conservation of TK and also provide access to information that could improve hypothesis generation and testing in ethnobotanical studies. Brazil has a rich medicinal flora and a rich cultural landscape. Nevertheless, cultural change and ecological degradation can lead to loss of TK. Here, we present an online database developed with open-source tools with a capacity to include all medicinal flora of Brazil. We present test data for the Leguminosae comprising a total of 2078 records, referred to here as use reports, including data compiled from literature and herbarium sources. Unlike existing databases, Ewé provides tools for the visualization of large datasets, facilitating hypothesis generation and meta-analyses. The Ewé database is currently available at www.ewedb.com.
Collapse
Affiliation(s)
| | - Julie A Hawkins
- School of Biological Sciences, University of Reading, Whiteknights Rd, Reading, Berkshire RG66AS, UK
| |
Collapse
|
10
|
Quality Matters: Biocuration Experts on the Impact of Duplication and Other Data Quality Issues in Biological Databases. GENOMICS PROTEOMICS & BIOINFORMATICS 2020; 18:91-103. [PMID: 32652120 PMCID: PMC7646089 DOI: 10.1016/j.gpb.2018.11.006] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Revised: 10/24/2018] [Accepted: 12/14/2018] [Indexed: 11/27/2022]
|
11
|
Parry G, Provart NJ, Brady SM, Uzilday B. Current status of the multinational Arabidopsis community. PLANT DIRECT 2020; 4:e00248. [PMID: 32775952 PMCID: PMC7396448 DOI: 10.1002/pld3.248] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 07/06/2020] [Accepted: 07/06/2020] [Indexed: 05/04/2023]
Abstract
The multinational Arabidopsis research community is highly collaborative and over the past thirty years these activities have been documented by the Multinational Arabidopsis Steering Committee (MASC). Here, we (a) highlight recent research advances made with the reference plant Arabidopsis thaliana; (b) provide summaries from recent reports submitted by MASC subcommittees, projects and resources associated with MASC and from MASC country representatives; and (c) initiate a call for ideas and foci for the "fourth decadal roadmap," which will advise and coordinate the global activities of the Arabidopsis research community.
Collapse
Affiliation(s)
- Geraint Parry
- School of BiosciencesCardiff UniversityCardiffUnited Kingdom
| | - Nicholas J. Provart
- Department of Cell and System Biology/Centre for the Analysis of Genome Evolution and FunctionUniversity of TorontoTorontoCanada
| | - Siobhan M. Brady
- Department of Plant Biology and Genome CenterUniversity of CaliforniaDavisUSA
| | - Baris Uzilday
- Department of BiologyFaculty of ScienceEge UniversityIzmirTurkey
| |
Collapse
|
12
|
Shaw F, Etuk A, Minotto A, Gonzalez-Beltran A, Johnson D, Rocca-Serra P, Laporte MA, Arnaud E, Devare M, Kersey P, Sansone SA, Davey RP. COPO: a metadata platform for brokering FAIR data in the life sciences. F1000Res 2020. [DOI: 10.12688/f1000research.23889.1] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Scientific innovation is increasingly reliant on data and computational resources. Much of today’s life science research involves generating, processing, and reusing heterogeneous datasets that are growing exponentially in size. Demand for technical experts (data scientists and bioinformaticians) to process these data is at an all-time high, but these are not typically trained in good data management practices. That said, we have come a long way in the last decade, with funders, publishers, and researchers themselves making the case for open, interoperable data as a key component of an open science philosophy. In response, recognition of the FAIR Principles (that data should be Findable, Accessible, Interoperable and Reusable) has become commonplace. However, both technical and cultural challenges for the implementation of these principles still exist when storing, managing, analysing and disseminating both legacy and new data. COPO is a computational system that attempts to address some of these challenges by enabling scientists to describe their research objects (raw or processed data, publications, samples, images, etc.) using community-sanctioned metadata sets and vocabularies, and then use public or institutional repositories to share them with the wider scientific community. COPO encourages data generators to adhere to appropriate metadata standards when publishing research objects, using semantic terms to add meaning to them and specify relationships between them. This allows data consumers, be they people or machines, to find, aggregate, and analyse data which would otherwise be private or invisible, building upon existing standards to push the state of the art in scientific data dissemination whilst minimising the burden of data publication and sharing.
Collapse
|
13
|
Waagmeester A, Stupp G, Burgstaller-Muehlbacher S, Good BM, Griffith M, Griffith OL, Hanspers K, Hermjakob H, Hudson TS, Hybiske K, Keating SM, Manske M, Mayers M, Mietchen D, Mitraka E, Pico AR, Putman T, Riutta A, Queralt-Rosinach N, Schriml LM, Shafee T, Slenter D, Stephan R, Thornton K, Tsueng G, Tu R, Ul-Hasan S, Willighagen E, Wu C, Su AI. Wikidata as a knowledge graph for the life sciences. eLife 2020; 9:e52614. [PMID: 32180547 PMCID: PMC7077981 DOI: 10.7554/elife.52614] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Accepted: 02/28/2020] [Indexed: 12/22/2022] Open
Abstract
Wikidata is a community-maintained knowledge base that has been assembled from repositories in the fields of genomics, proteomics, genetic variants, pathways, chemical compounds, and diseases, and that adheres to the FAIR principles of findability, accessibility, interoperability and reusability. Here we describe the breadth and depth of the biomedical knowledge contained within Wikidata, and discuss the open-source tools we have built to add information to Wikidata and to synchronize it with source databases. We also demonstrate several use cases for Wikidata, including the crowdsourced curation of biomedical ontologies, phenotype-based diagnosis of disease, and drug repurposing.
Collapse
Affiliation(s)
| | - Gregory Stupp
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| | - Sebastian Burgstaller-Muehlbacher
- Center for Integrative Bioinformatics Vienna, Max Perutz Laboratories, University of Vienna and Medical University of ViennaViennaAustria
| | - Benjamin M Good
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| | - Malachi Griffith
- McDonnell Genome Institute, Washington University School of MedicineSt. LouisUnited States
| | - Obi L Griffith
- McDonnell Genome Institute, Washington University School of MedicineSt. LouisUnited States
| | - Kristina Hanspers
- Institute of Data Science and Biotechnology, Gladstone InstitutesSan FranciscoUnited States
| | | | - Toby S Hudson
- School of Chemistry, The University of SydneySydneyAustralia
| | - Kevin Hybiske
- Division of Allergy and Infectious Diseases, Department of Medicine, University of WashingtonSeattleUnited States
| | - Sarah M Keating
- European Bioinformatics Institute (EMBL-EBI)HinxtonUnited Kingdom
| | - Magnus Manske
- Wellcome Trust Sanger InstituteCambridgeUnited Kingdom
| | - Michael Mayers
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| | - Daniel Mietchen
- School of Data Science, University of VirginiaCharlottesvilleUnited States
| | - Elvira Mitraka
- University of Maryland School of MedicineBaltimoreUnited States
| | - Alexander R Pico
- Institute of Data Science and Biotechnology, Gladstone InstitutesSan FranciscoUnited States
| | - Timothy Putman
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| | - Anders Riutta
- Institute of Data Science and Biotechnology, Gladstone InstitutesSan FranciscoUnited States
| | - Nuria Queralt-Rosinach
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| | - Lynn M Schriml
- University of Maryland School of MedicineBaltimoreUnited States
| | - Thomas Shafee
- Department of Animal Plant and Soil Sciences, La Trobe UniversityMelbourneAustralia
| | - Denise Slenter
- Department of Bioinformatics-BiGCaT, NUTRIM, Maastricht UniversityMaastrichtNetherlands
| | | | | | - Ginger Tsueng
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| | - Roger Tu
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| | - Sabah Ul-Hasan
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| | - Egon Willighagen
- Department of Bioinformatics-BiGCaT, NUTRIM, Maastricht UniversityMaastrichtNetherlands
| | - Chunlei Wu
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| | - Andrew I Su
- Department of Integrative Structural and Computational Biology, The Scripps Research InstituteLa JollaUnited States
| |
Collapse
|
14
|
Spoor S, Cheng CH, Sanderson LA, Condon B, Almsaeed A, Chen M, Bretaudeau A, Rasche H, Jung S, Main D, Bett K, Staton M, Wegrzyn JL, Feltus FA, Ficklin SP. Tripal v3: an ontology-based toolkit for construction of FAIR biological community databases. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2019:5532788. [PMID: 31328773 PMCID: PMC6643302 DOI: 10.1093/database/baz077] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Revised: 05/12/2019] [Accepted: 05/22/2019] [Indexed: 12/20/2022]
Abstract
Community biological databases provide an important online resource for both public and private data, analysis tools and community engagement. These sites house genomic, transcriptomic, genetic, breeding and ancillary data for specific species, families or clades. Due to the complexity and increasing quantities of these data, construction of online resources is increasingly difficult especially with limited funding and access to technical expertise. Furthermore, online repositories are expected to promote FAIR data principles (findable, accessible, interoperable and reusable) that presents additional challenges. The open-source Tripal database toolkit seeks to mitigate these challenges by creating both the software and an interactive community of developers for construction of online community databases. Additionally, through coordinated, distributed co-development, Tripal sites encourage community-wide sustainability. Here, we report the release of Tripal version 3 that improves data accessibility and data sharing through systematic use of controlled vocabularies (CVs). Tripal uses the community-developed Chado database as a default data store, but now provides tools to support other data stores, while ensuring that CVs remain the central organizational structure for the data. A new site developer can use Tripal to develop a basic site with little to no programming, with the ability to integrate other data types using extension modules and the Tripal application programming interface. A thorough online User’s Guide and Developer’s Handbook are available at http://tripal.info, providing download, installation and step-by-step setup instructions.
Collapse
Affiliation(s)
- Shawna Spoor
- Department of Horticulture, Washington State University, Pullman, WA, USA
| | - Chun-Huai Cheng
- Department of Horticulture, Washington State University, Pullman, WA, USA
| | | | - Bradford Condon
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN, USA
| | - Abdullah Almsaeed
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN, USA
| | - Ming Chen
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN, USA
| | - Anthony Bretaudeau
- INRA, UMR IGEPP, BIPAA/GenOuest, INRIA/Irisa - Campus de Beaulieu, Rennes Cedex, France
| | - Helena Rasche
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg im Breisgau, Germany
| | - Sook Jung
- Department of Horticulture, Washington State University, Pullman, WA, USA
| | - Dorrie Main
- Department of Horticulture, Washington State University, Pullman, WA, USA
| | - Kirstin Bett
- Department of Plant Sciences, University of Saskatchewan, Saskatoon, SK, Canada
| | - Margaret Staton
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN, USA
| | - Jill L Wegrzyn
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA.,Computational Biology Core, Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
| | - F Alex Feltus
- Dept. of Genetics and Biochemistry, Clemson University, Clemson, USA
| | - Stephen P Ficklin
- Department of Horticulture, Washington State University, Pullman, WA, USA
| |
Collapse
|
15
|
Tello-Ruiz MK, Marco CF, Hsu FM, Khangura RS, Qiao P, Sapkota S, Stitzer MC, Wasikowski R, Wu H, Zhan J, Chougule K, Barone LC, Ghiban C, Muna D, Olson AC, Wang L, Ware D, Micklos DA. Double triage to identify poorly annotated genes in maize: The missing link in community curation. PLoS One 2019; 14:e0224086. [PMID: 31658277 PMCID: PMC6816542 DOI: 10.1371/journal.pone.0224086] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2019] [Accepted: 10/05/2019] [Indexed: 02/02/2023] Open
Abstract
The sophistication of gene prediction algorithms and the abundance of RNA-based evidence for the maize genome may suggest that manual curation of gene models is no longer necessary. However, quality metrics generated by the MAKER-P gene annotation pipeline identified 17,225 of 130,330 (13%) protein-coding transcripts in the B73 Reference Genome V4 gene set with models of low concordance to available biological evidence. Working with eight graduate students, we used the Apollo annotation editor to curate 86 transcript models flagged by quality metrics and a complimentary method using the Gramene gene tree visualizer. All of the triaged models had significant errors-including missing or extra exons, non-canonical splice sites, and incorrect UTRs. A correct transcript model existed for about 60% of genes (or transcripts) flagged by quality metrics; we attribute this to the convention of elevating the transcript with the longest coding sequence (CDS) to the canonical, or first, position. The remaining 40% of flagged genes resulted in novel annotations and represent a manual curation space of about 10% of the maize genome (~4,000 protein-coding genes). MAKER-P metrics have a specificity of 100%, and a sensitivity of 85%; the gene tree visualizer has a specificity of 100%. Together with the Apollo graphical editor, our double triage provides an infrastructure to support the community curation of eukaryotic genomes by scientists, students, and potentially even citizen scientists.
Collapse
Affiliation(s)
- Marcela K. Tello-Ruiz
- Plant Biology Program, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
- Department of Biological Sciences, State University of New York at Old Westbury, Old Westbury, New York, United States of America
| | - Cristina F. Marco
- DNA Learning Center, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Fei-Man Hsu
- Graduate School of Frontier Sciences, University of Tokyo, Chiba, Japan
| | - Rajdeep S. Khangura
- Department of Biochemistry, Purdue University, West Lafayette, Indiana, United States of America
| | - Pengfei Qiao
- Plant Biology Section, School of Integrative Plant Sciences, Cornell University, Ithaca, New York, United States of America
| | - Sirjan Sapkota
- Department of Plant and Environmental Sciences, Clemson University, Clemson, South Carolina, United States of America
| | - Michelle C. Stitzer
- Department of Plant Sciences and Center for Population Biology, University of California Davis, Davis, California, United States of America
| | - Rachael Wasikowski
- Department of Biological Sciences, University of Toledo, Toledo, Ohio, United States of America
| | - Hao Wu
- Genetics, Development & Cell Biology Department, Iowa State University, Ames, Iowa, United States of America
| | - Junpeng Zhan
- School of Plant Sciences, University of Arizona, Tucson, Arizona, United States of America
- Donald Danforth Plant Science Center, St. Louis, Missouri, United States of America
| | - Kapeel Chougule
- Plant Biology Program, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Lindsay C. Barone
- DNA Learning Center, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Cornel Ghiban
- DNA Learning Center, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Demitri Muna
- Plant Biology Program, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Andrew C. Olson
- Plant Biology Program, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Liya Wang
- Plant Biology Program, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| | - Doreen Ware
- Plant Biology Program, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
- USDA, Agricultural Research Service, Washington, D.C., United States of America
| | - David A. Micklos
- DNA Learning Center, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
| |
Collapse
|
16
|
Zielinski T, Hay J, Millar AJ. The grant is dead, long live the data - migration as a pragmatic exit strategy for research data preservation. Wellcome Open Res 2019; 4:104. [PMID: 31363499 PMCID: PMC6652102 DOI: 10.12688/wellcomeopenres.15341.2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/18/2019] [Indexed: 11/24/2022] Open
Abstract
Open research, data sharing and data re-use have become a priority for publicly- and charity-funded research. Efficient data management naturally requires computational resources that assist in data description, preservation and discovery. While it is possible to fund development of data management systems, currently it is more difficult to sustain data resources beyond the original grants. That puts the safety of the data at risk and undermines the very purpose of data gathering. PlaSMo stands for ‘Plant Systems-biology Modelling’ and the PlaSMo model repository was envisioned by the plant systems biology community in 2005 with the initial funding lasting until 2010. We addressed the sustainability of the PlaSMo repository and assured preservation of these data by implementing an exit strategy. For our exit strategy we migrated data to an alternative, public repository with secured funding. We describe details of our decision process and aspects of the implementation. Our experience may serve as an example for other projects in a similar situation. We share our reflections on the sustainability of biological data management and the future outcomes of its funding. We expect it to be a useful input for funding bodies.
Collapse
Affiliation(s)
- Tomasz Zielinski
- SynthSys and School of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3BF, UK
| | - Johnny Hay
- EPCC, University of Edinburgh, Edinburgh, EH9 3FD, UK
| | - Andrew J Millar
- SynthSys and School of Biological Sciences, University of Edinburgh, Edinburgh, EH9 3BF, UK
| |
Collapse
|
17
|
Comparative Transcriptome Analysis of Pinus densiflora Following Inoculation with Pathogenic (Bursaphelenchus xylophilus) or Non-pathogenic Nematodes (B. thailandae). Sci Rep 2019; 9:12180. [PMID: 31434977 PMCID: PMC6704138 DOI: 10.1038/s41598-019-48660-w] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Accepted: 08/06/2019] [Indexed: 12/26/2022] Open
Abstract
Pinus densiflora (Korean red pine) is a species of evergreen conifer that is distributed in Korea, Japan, and China, and of economic, scientific, and ecological importance. Korean red pines suffer from pine wilt disease (PWD) caused by Bursaphelenchus xylophilus, the pinewood nematode (PWN). To facilitate diagnosis and prevention of PWD, studies have been conducted on the PWN and its beetle vectors. However, transcriptional responses of P. densiflora to PWN have received less attention. Here, we inoculated Korean red pines with pathogenic B. xylophilus, or non-pathogenic B. thailandae, and collected cambium layers 4 weeks after inoculation for RNA sequencing analysis. We obtained 72,864 unigenes with an average length of 869 bp (N50 = 1,403) from a Trinity assembly, and identified 991 differentially expressed genes (DEGs). Biological processes related to phenylpropanoid biosynthesis, flavonoid biosynthesis, oxidation–reduction, and plant-type hypersensitive response were significantly enriched in DEGs found in trees inoculated with B. xylophilus. Several transcription factor families were found to be involved in the response to B. xylophilus inoculation. Our study provides the first evidence of transcriptomic differences in Korean red pines inoculated with B. xylophilus and B. thailandae, and might facilitate early diagnosis of PWD and selection of PWD-tolerant Korean red pines.
Collapse
|
18
|
Tian T, Liu Y, Yan H, You Q, Yi X, Du Z, Xu W, Su Z. agriGO v2.0: a GO analysis toolkit for the agricultural community, 2017 update. Nucleic Acids Res 2019; 45:W122-W129. [PMID: 28472432 PMCID: PMC5793732 DOI: 10.1093/nar/gkx382] [Citation(s) in RCA: 1370] [Impact Index Per Article: 274.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2017] [Accepted: 04/25/2017] [Indexed: 01/30/2023] Open
Abstract
The agriGO platform, which has been serving the scientific community for >10 years, specifically focuses on gene ontology (GO) enrichment analyses of plant and agricultural species. We continuously maintain and update the databases and accommodate the various requests of our global users. Here, we present our updated agriGO that has a largely expanded number of supporting species (394) and datatypes (865). In addition, a larger number of species have been classified into groups covering crops, vegetables, fish, birds and insects closely related to the agricultural community. We further improved the computational efficiency, including the batch analysis and P-value distribution (PVD), and the user-friendliness of the web pages. More visualization features were added to the platform, including SEACOMPARE (cross comparison of singular enrichment analysis), direct acyclic graph (DAG) and Scatter Plots, which can be merged by choosing any significant GO term. The updated platform agriGO v2.0 is now publicly accessible at http://systemsbiology.cau.edu.cn/agriGOv2/.
Collapse
Affiliation(s)
- Tian Tian
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Yue Liu
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Hengyu Yan
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Qi You
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Xin Yi
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Zhou Du
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Wenying Xu
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Zhen Su
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| |
Collapse
|
19
|
Da L, Liu Y, Yang J, Tian T, She J, Ma X, Xu W, Su Z. AppleMDO: A Multi-Dimensional Omics Database for Apple Co-Expression Networks and Chromatin States. FRONTIERS IN PLANT SCIENCE 2019; 10:1333. [PMID: 31695717 PMCID: PMC6817610 DOI: 10.3389/fpls.2019.01333] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 09/25/2019] [Indexed: 05/17/2023]
Abstract
As an economically important crop, apple is one of the most cultivated fruit trees in temperate regions worldwide. Recently, a large number of high-quality transcriptomic and epigenomic datasets for apple were made available to the public, which could be helpful in inferring gene regulatory relationships and thus predicting gene function at the genome level. Through integration of the available apple genomic, transcriptomic, and epigenomic datasets, we constructed co-expression networks, identified functional modules, and predicted chromatin states. A total of 112 RNA-seq datasets were integrated to construct a global network and a conditional network (tissue-preferential network). Furthermore, a total of 1,076 functional modules with closely related gene sets were identified to assess the modularity of biological networks and further subjected to functional enrichment analysis. The results showed that the function of many modules was related to development, secondary metabolism, hormone response, and transcriptional regulation. Transcriptional regulation is closely related to epigenetic marks on chromatin. A total of 20 epigenomic datasets, which included ChIP-seq, DNase-seq, and DNA methylation analysis datasets, were integrated and used to classify chromatin states. Based on the ChromHMM algorithm, the genome was divided into 620,122 fragments, which were classified into 24 states according to the combination of epigenetic marks and enriched-feature regions. Finally, through the collaborative analysis of different omics datasets, the online database AppleMDO (http://bioinformatics.cau.edu.cn/AppleMDO/) was established for cross-referencing and the exploration of possible novel functions of apple genes. In addition, gene annotation information and functional support toolkits were also provided. Our database might be convenient for researchers to develop insights into the function of genes related to important agronomic traits and might serve as a reference for other fruit trees.
Collapse
|
20
|
Hong L, Zhang L, Liu M, Wang S, He L, Yang W, Li J, Yu Q, Li QQ, Zhou K. Heavy metal rich stone-processing wastewater inhibits the growth and development of plants. INTERNATIONAL JOURNAL OF PHYTOREMEDIATION 2018; 21:479-486. [PMID: 30560684 DOI: 10.1080/15226514.2018.1537241] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Large amounts of wastewater are generated from stone processing, which are toxic and cause serious environmental and health risks. To quantify the content of stone processing wastewater and estimate its effects on plant growth, we collected water samples from sewage outfall of four stone processing factories and nearby water bodies. The concentration of potential toxic metals were much higher in the wastewater than background controls. Wastewater inhibited plant primary root elongation, lateral root formation, and growth of aerial part. Seedlings treated with the effluents were unhealthy with deep purple leaves and usually died before flowering. Chlorophyll a/b contents and chloroplast number were reduced in those abnormal mesophyll cells. Transcriptional levels were decreased for chloroplast formation genes, but increased for those participated in chloroplast degradation and catabolism. Six out of nine tested senescence-associated genes were up-regulated. Furthermore, our results show that endogenous toxic metal levels indeed increased after wastewater treatment. Altogether, these results indicated that the potential toxic metals rich wastewater had significant inhibition on plant growth and led to senescence-associated program cell death, which could be helpful for the government and enterprises to understand the environmental risks and formulate reasonable wastewater emission standards for the stone processing industry.
Collapse
Affiliation(s)
- Liwei Hong
- a Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems , College of the Environment and Ecology, Xiamen University , Xiamen , Fujian , China
| | - Liangjie Zhang
- a Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems , College of the Environment and Ecology, Xiamen University , Xiamen , Fujian , China
| | - Meiling Liu
- a Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems , College of the Environment and Ecology, Xiamen University , Xiamen , Fujian , China
| | - Shengjie Wang
- a Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems , College of the Environment and Ecology, Xiamen University , Xiamen , Fujian , China
| | - Linjun He
- a Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems , College of the Environment and Ecology, Xiamen University , Xiamen , Fujian , China
| | - Wanyu Yang
- a Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems , College of the Environment and Ecology, Xiamen University , Xiamen , Fujian , China
| | - Jingli Li
- a Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems , College of the Environment and Ecology, Xiamen University , Xiamen , Fujian , China
| | - Qiaojie Yu
- a Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems , College of the Environment and Ecology, Xiamen University , Xiamen , Fujian , China
| | - Qingshun Q Li
- a Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems , College of the Environment and Ecology, Xiamen University , Xiamen , Fujian , China
- b Graduate College , Western University of Health Science , Pomona , CA , USA
| | - Kefu Zhou
- a Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems , College of the Environment and Ecology, Xiamen University , Xiamen , Fujian , China
| |
Collapse
|
21
|
Mishra B, Sun Y, Howton TC, Kumar N, Mukhtar MS. Dynamic modeling of transcriptional gene regulatory network uncovers distinct pathways during the onset of Arabidopsis leaf senescence. NPJ Syst Biol Appl 2018; 4:35. [PMID: 30181903 PMCID: PMC6119185 DOI: 10.1038/s41540-018-0071-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2017] [Revised: 08/10/2018] [Accepted: 08/14/2018] [Indexed: 11/08/2022] Open
Abstract
Age-dependent senescence is a multifaceted and highly coordinated developmental phase in the life of plants that is manifested with genetic, biochemical and phenotypic continuum. Thus, elucidating the dynamic network modeling and simulation of molecular events, in particular gene regulatory network during the onset of senescence is essential. Here, we constructed a computational pipeline that integrates senescence-related co-expression networks with transcription factor (TF)-promoter relationships and microRNA (miR)-target interactions. Network structural and functional analyses revealed important nodes within each module of these co-expression networks. Subsequently, we inferred significant dynamic transcriptional regulatory models in leaf senescence using time-course gene expression datasets. Dynamic simulations and predictive network perturbation analyses followed by experimental dataset illustrated the kinetic relationships among TFs and their downstream targets. In conclusion, our network science framework discovers cohorts of TFs and their paths with previously unrecognized roles in leaf senescence and provides a comprehensive landscape of dynamic transcriptional circuitry.
Collapse
Affiliation(s)
- Bharat Mishra
- Department of Biology, University of Alabama at Birmingham, 1300 University Blvd., Birmingham, AL 35294 USA
| | - Yali Sun
- Department of Biology, University of Alabama at Birmingham, 1300 University Blvd., Birmingham, AL 35294 USA
| | - TC Howton
- Department of Biology, University of Alabama at Birmingham, 1300 University Blvd., Birmingham, AL 35294 USA
| | - Nilesh Kumar
- Department of Biology, University of Alabama at Birmingham, 1300 University Blvd., Birmingham, AL 35294 USA
| | - M. Shahid Mukhtar
- Department of Biology, University of Alabama at Birmingham, 1300 University Blvd., Birmingham, AL 35294 USA
- Nutrition Obesity Research Center, University of Alabama at Birmingham, 1675 University Blvd., Birmingham, AL 35294 USA
| |
Collapse
|
22
|
Harper L, Campbell J, Cannon EKS, Jung S, Poelchau M, Walls R, Andorf C, Arnaud E, Berardini TZ, Birkett C, Cannon S, Carson J, Condon B, Cooper L, Dunn N, Elsik CG, Farmer A, Ficklin SP, Grant D, Grau E, Herndon N, Hu ZL, Humann J, Jaiswal P, Jonquet C, Laporte MA, Larmande P, Lazo G, McCarthy F, Menda N, Mungall CJ, Munoz-Torres MC, Naithani S, Nelson R, Nesdill D, Park C, Reecy J, Reiser L, Sanderson LA, Sen TZ, Staton M, Subramaniam S, Tello-Ruiz MK, Unda V, Unni D, Wang L, Ware D, Wegrzyn J, Williams J, Woodhouse M, Yu J, Main D. AgBioData consortium recommendations for sustainable genomics and genetics databases for agriculture. Database (Oxford) 2018; 2018:5096675. [PMID: 30239679 PMCID: PMC6146126 DOI: 10.1093/database/bay088] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2018] [Revised: 07/19/2018] [Accepted: 07/30/2018] [Indexed: 01/07/2023]
Abstract
The future of agricultural research depends on data. The sheer volume of agricultural biological data being produced today makes excellent data management essential. Governmental agencies, publishers and science funders require data management plans for publicly funded research. Furthermore, the value of data increases exponentially when they are properly stored, described, integrated and shared, so that they can be easily utilized in future analyses. AgBioData (https://www.agbiodata.org) is a consortium of people working at agricultural biological databases, data archives and knowledgbases who strive to identify common issues in database development, curation and management, with the goal of creating database products that are more Findable, Accessible, Interoperable and Reusable. We strive to promote authentic, detailed, accurate and explicit communication between all parties involved in scientific data. As a step toward this goal, we present the current state of biocuration, ontologies, metadata and persistence, database platforms, programmatic (machine) access to data, communication and sustainability with regard to data curation. Each section describes challenges and opportunities for these topics, along with recommendations and best practices.
Collapse
Affiliation(s)
- Lisa Harper
- Corn Insects and Crop Genetics Research Unit, USDA-ARS, Ames, IA, USA
| | | | - Ethalinda K S Cannon
- Corn Insects and Crop Genetics Research Unit, USDA-ARS, Ames, IA, USA
- Computer Science, Iowa State University, Ames, IA, USA
| | - Sook Jung
- Horticulture, Washington State University, Pullman, WA, USA
| | - Monica Poelchau
- National Agricultural Library, USDA Agricultural Research Service, Beltsville, MD, USA
| | | | - Carson Andorf
- Corn Insects and Crop Genetics Research Unit, USDA-ARS, Ames, IA, USA
- Computer Science, Iowa State University, Ames, IA, USA
| | - Elizabeth Arnaud
- Bioversity International, Informatics Unit, Conservation and Availability Programme, Parc Scientifique Agropolis II, Montpellier, France
| | - Tanya Z Berardini
- The Arabidopsis Information Resource, Phoenix Bioinformatics, Fremont, CA, USA
| | | | - Steve Cannon
- Corn Insects and Crop Genetics Research Unit, USDA-ARS, Ames, IA, USA
| | - James Carson
- Texas Advanced Computing Center, The University of Texas at Austin, Austin, TX, USA
| | - Bradford Condon
- Entomology and Plant Pathology, University of Tennessee Knoxville, Knoxville, TN, USA
| | - Laurel Cooper
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Nathan Dunn
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Christine G Elsik
- Division of Animal Sciences and Division of Plant Sciences, University of Missouri, Columbia, MO, USA
| | - Andrew Farmer
- National Center for Genome Resources, Santa Fe, NM, USA
| | | | - David Grant
- Corn Insects and Crop Genetics Research Unit, USDA-ARS, Ames, IA, USA
| | - Emily Grau
- National Center for Genome Resources, Santa Fe, NM, USA
| | - Nic Herndon
- Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA
| | - Zhi-Liang Hu
- Animal Science, Iowa State University, Ames, USA
| | - Jodi Humann
- Horticulture, Washington State University, Pullman, WA, USA
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Clement Jonquet
- Laboratory of Informatics, Robotics, Microelectronics of Montpellier, University of Montpellier & CNRS, Montpellier, France
| | - Marie-Angélique Laporte
- Bioversity International, Informatics Unit, Conservation and Availability Programme, Parc Scientifique Agropolis II, Montpellier, France
| | | | - Gerard Lazo
- Crop Improvement and Genetics Research Unit, USDA-ARS, Albany, CA, USA
| | - Fiona McCarthy
- School of Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, AZ, USA
| | | | | | | | - Sushma Naithani
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Rex Nelson
- Corn Insects and Crop Genetics Research Unit, USDA-ARS, Ames, IA, USA
| | - Daureen Nesdill
- Marriott Library, University of Utah, Salt Lake City, UT, USA
| | - Carissa Park
- Animal Science, Iowa State University, Ames, USA
| | - James Reecy
- Animal Science, Iowa State University, Ames, USA
| | - Leonore Reiser
- The Arabidopsis Information Resource, Phoenix Bioinformatics, Fremont, CA, USA
| | | | - Taner Z Sen
- Crop Improvement and Genetics Research Unit, USDA-ARS, Albany, CA, USA
| | - Margaret Staton
- Entomology and Plant Pathology, University of Tennessee Knoxville, Knoxville, TN, USA
| | | | | | - Victor Unda
- Horticulture, Washington State University, Pullman, WA, USA
| | - Deepak Unni
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Liya Wang
- Plant Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Doreen Ware
- USDA, Plant, Soil and Nutrition Research, Ithaca, NY, USA
- Plant Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Jill Wegrzyn
- Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA
| | - Jason Williams
- Cold Spring Harbor Laboratory, DNA Learning Center, Cold Spring Harbor, NY, USA
| | - Margaret Woodhouse
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, USA
| | - Jing Yu
- Horticulture, Washington State University, Pullman, WA, USA
| | - Doreen Main
- Horticulture, Washington State University, Pullman, WA, USA
| |
Collapse
|
23
|
Reiser L, Subramaniam S, Li D, Huala E. Using the
Arabidopsis
Information Resource (TAIR) to Find Information About
Arabidopsis
Genes. ACTA ACUST UNITED AC 2017; 60:1.11.1-1.11.45. [DOI: 10.1002/cpbi.36] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
| | | | - Donghui Li
- Phoenix Bioinformatics Fremont California
| | - Eva Huala
- Phoenix Bioinformatics Fremont California
| |
Collapse
|
24
|
Gabella C, Durinx C, Appel R. Funding knowledgebases: Towards a sustainable funding model for the UniProt use case. F1000Res 2017; 6. [PMID: 29333230 PMCID: PMC5747334 DOI: 10.12688/f1000research.12989.2] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/19/2018] [Indexed: 11/30/2022] Open
Abstract
Millions of life scientists across the world rely on bioinformatics data resources for their research projects. Data resources can be very expensive, especially those with a high added value as the expert-curated knowledgebases. Despite the increasing need for such highly accurate and reliable sources of scientific information, most of them do not have secured funding over the near future and often depend on short-term grants that are much shorter than their planning horizon. Additionally, they are often evaluated as research projects rather than as research infrastructure components. In this work, twelve funding models for data resources are described and applied on the case study of the Universal Protein Resource (UniProt), a key resource for protein sequences and functional information knowledge. We show that most of the models present inconsistencies with open access or equity policies, and that while some models do not allow to cover the total costs, they could potentially be used as a complementary income source. We propose the
Infrastructure Model as a sustainable and equitable model for all core data resources in the life sciences. With this model, funding agencies would set aside a fixed percentage of their research grant volumes, which would subsequently be redistributed to core data resources according to well-defined selection criteria. This model, compatible with the principles of open science, is in agreement with several international initiatives such as the Human Frontiers Science Program Organisation (HFSPO) and the OECD Global Science Forum (GSF) project. Here, we have estimated that less than 1% of the total amount dedicated to research grants in the life sciences would be sufficient to cover the costs of the core data resources worldwide, including both knowledgebases and deposition databases.
Collapse
Affiliation(s)
- Chiara Gabella
- ELIXIR-Switzerland, SIB Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
| | - Christine Durinx
- ELIXIR-Switzerland, SIB Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
| | - Ron Appel
- ELIXIR-Switzerland, SIB Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
| |
Collapse
|
25
|
Leonelli S, Davey RP, Arnaud E, Parry G, Bastow R. Data management and best practice for plant science. NATURE PLANTS 2017; 3:17086. [PMID: 28585570 DOI: 10.1038/nplants.2017.86] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Affiliation(s)
- Sabina Leonelli
- Department of Sociology, Philosophy and Anthropology &Exeter Centre for the Study of the Life Sciences, Byrne House, Exeter University, Exeter EX4 4PJ, UK
- School of Humanities, University of Adelaide, Adelaide 5005, Australia
| | - Robert P Davey
- The Earlham Institute, Norwich Research Park, Norwich NR4 7UG, UK
| | - Elizabeth Arnaud
- Bioversity International, Parc Scientifique Agropolis II, 34397 Montpellier Cedex 5, France
| | - Geraint Parry
- GARNet, School of Biosciences, Cardiff University, Cardiff CF10 3AX, UK
| | - Ruth Bastow
- GARNet, School of Biosciences, Cardiff University, Cardiff CF10 3AX, UK
- Global Plant Council, 1a Bow Lane, London EC4M 9EE, UK
| |
Collapse
|
26
|
A New Source of Nonprofit Neurosurgical Funding. World Neurosurg 2017; 98:603-613. [DOI: 10.1016/j.wneu.2016.10.084] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Revised: 10/14/2016] [Accepted: 10/15/2016] [Indexed: 11/23/2022]
|
27
|
Krishnakumar V, Contrino S, Cheng CY, Belyaeva I, Ferlanti ES, Miller JR, Vaughn MW, Micklem G, Town CD, Chan AP. ThaleMine: A Warehouse for Arabidopsis Data Integration and Discovery. PLANT & CELL PHYSIOLOGY 2017; 58:e4. [PMID: 28013278 DOI: 10.1093/pcp/pcw200] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Accepted: 11/11/2016] [Indexed: 05/08/2023]
Abstract
ThaleMine (https://apps.araport.org/thalemine/) is a comprehensive data warehouse that integrates a wide array of genomic information of the model plant Arabidopsis thaliana. The data collection currently includes the latest structural and functional annotation from the Araport11 update, the Col-0 genome sequence, RNA-seq and array expression, co-expression, protein interactions, homologs, pathways, publications, alleles, germplasm and phenotypes. The data are collected from a wide variety of public resources. Users can browse gene-specific data through Gene Report pages, identify and create gene lists based on experiments or indexed keywords, and run GO enrichment analysis to investigate the biological significance of selected gene sets. Developed by the Arabidopsis Information Portal project (Araport, https://www.araport.org/), ThaleMine uses the InterMine software framework, which builds well-structured data, and provides powerful data query and analysis functionality. The warehoused data can be accessed by users via graphical interfaces, as well as programmatically via web-services. Here we describe recent developments in ThaleMine including new features and extensions, and discuss future improvements. InterMine has been broadly adopted by the model organism research community including nematode, rat, mouse, zebrafish, budding yeast, the modENCODE project, as well as being used for human data. ThaleMine is the first InterMine developed for a plant model. As additional new plant InterMines are developed by the legume and other plant research communities, the potential of cross-organism integrative data analysis will be further enabled.
Collapse
Affiliation(s)
- Vivek Krishnakumar
- Plant Genomics, J. Craig Venter Institute, Medical Center Dr, Rockville, MD, USA
| | - Sergio Contrino
- Department of Genetics, Cambridge Systems Biology Centre, Tennis Court Road, Cambridge, UK
| | - Chia-Yi Cheng
- Plant Genomics, J. Craig Venter Institute, Medical Center Dr, Rockville, MD, USA
| | - Irina Belyaeva
- Plant Genomics, J. Craig Venter Institute, Medical Center Dr, Rockville, MD, USA
| | - Erik S Ferlanti
- Life Sciences Computing, Texas Advanced Computing Center, 10100 Burnet Rd, Austin, TX, USA
| | - Jason R Miller
- Plant Genomics, J. Craig Venter Institute, Medical Center Dr, Rockville, MD, USA
| | - Matthew W Vaughn
- Life Sciences Computing, Texas Advanced Computing Center, 10100 Burnet Rd, Austin, TX, USA
| | - Gos Micklem
- Department of Genetics, Cambridge Systems Biology Centre, Tennis Court Road, Cambridge, UK
| | - Christopher D Town
- Plant Genomics, J. Craig Venter Institute, Medical Center Dr, Rockville, MD, USA
| | - Agnes P Chan
- Plant Genomics, J. Craig Venter Institute, Medical Center Dr, Rockville, MD, USA
| |
Collapse
|
28
|
Karp PD. Crowd-sourcing and author submission as alternatives to professional curation. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw149. [PMID: 28025340 PMCID: PMC5199147 DOI: 10.1093/database/baw149] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 08/23/2016] [Revised: 10/18/2016] [Accepted: 10/19/2016] [Indexed: 11/28/2022]
Abstract
Can we decrease the costs of database curation by crowd-sourcing curation work or by offloading curation to publication authors? This perspective considers the significant experience accumulated by the bioinformatics community with these two alternatives to professional curation in the last 20 years; that experience should be carefully considered when formulating new strategies for biological databases. The vast weight of empirical evidence to date suggests that crowd-sourced curation is not a successful model for biological databases. Multiple approaches to crowd-sourced curation have been attempted by multiple groups, and extremely low participation rates by ‘the crowd’ are the overwhelming outcome. The author-curation model shows more promise for boosting curator efficiency. However, its limitations include that the quality of author-submitted annotations is uncertain, the response rate is low (but significant), and to date author curation has involved relatively simple forms of annotation involving one or a few types of data. Furthermore, shifting curation to authors may simply redistribute costs rather than decreasing costs; author curation may in fact increase costs because of the overhead involved in having every curating author learn what professional curators know: curation conventions, curation software and curation procedures.
Collapse
Affiliation(s)
- Peter D Karp
- Bioinformatics Research Group, SRI International, 333 Ravenswood Ave, Menlo Park, CA 94025, USA. Tel:650-859-4358; Fax: 650-859-3735; e-mail:
| |
Collapse
|
29
|
Lv Q, Lan Y, Shi Y, Wang H, Pan X, Li P, Shi T. AtPID: a genome-scale resource for genotype-phenotype associations in Arabidopsis. Nucleic Acids Res 2016; 45:D1060-D1063. [PMID: 27899679 PMCID: PMC5210528 DOI: 10.1093/nar/gkw1029] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2016] [Revised: 10/16/2016] [Accepted: 11/08/2016] [Indexed: 01/01/2023] Open
Abstract
AtPID (Arabidopsis thalianaProtein Interactome Database, available at http://www.megabionet.org/atpid) is an integrated database resource for protein interaction network and functional annotation. In the past few years, we collected 5564 mutants with significant morphological alterations and manually curated them to 167 plant ontology (PO) morphology categories. These single/multiple-gene mutants were indexed and linked to 3919 genes. After integrated these genotype–phenotype associations with the comprehensive protein interaction network in AtPID, we developed a Naïve Bayes method and predicted 4457 novel high confidence gene-PO pairs with 1369 genes as the complement. Along with the accumulated novel data for protein interaction and functional annotation, and the updated visualization toolkits, we present a genome-scale resource for genotype–phenotype associations for Arabidopsis in AtPID 5.0. In our updated website, all the new genotype–phenotype associations from mutants, protein network, and the protein annotation information can be vividly displayed in a comprehensive network view, which will greatly enhance plant protein function and genotype–phenotype association studies in a systematical way.
Collapse
Affiliation(s)
- Qi Lv
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai 200241, China.,School of Finance and Statistics, East China Normal University, Shanghai 200241, China
| | - Yiheng Lan
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Yan Shi
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Huan Wang
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Xia Pan
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Peng Li
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai 200241, China
| | - Tieliu Shi
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai 200241, China
| |
Collapse
|
30
|
Chang JW, Zhou YQ, Ul Qamar MT, Chen LL, Ding YD. Prediction of Protein-Protein Interactions by Evidence Combining Methods. Int J Mol Sci 2016; 17:ijms17111946. [PMID: 27879651 PMCID: PMC5133940 DOI: 10.3390/ijms17111946] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 11/15/2016] [Accepted: 11/15/2016] [Indexed: 12/27/2022] Open
Abstract
Most cellular functions involve proteins' features based on their physical interactions with other partner proteins. Sketching a map of protein-protein interactions (PPIs) is therefore an important inception step towards understanding the basics of cell functions. Several experimental techniques operating in vivo or in vitro have made significant contributions to screening a large number of protein interaction partners, especially high-throughput experimental methods. However, computational approaches for PPI predication supported by rapid accumulation of data generated from experimental techniques, 3D structure definitions, and genome sequencing have boosted the map sketching of PPIs. In this review, we shed light on in silico PPI prediction methods that integrate evidence from multiple sources, including evolutionary relationship, function annotation, sequence/structure features, network topology and text mining. These methods are developed for integration of multi-dimensional evidence, for designing the strategies to predict novel interactions, and for making the results consistent with the increase of prediction coverage and accuracy.
Collapse
Affiliation(s)
- Ji-Wei Chang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China.
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
| | - Yan-Qing Zhou
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China.
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
| | - Muhammad Tahir Ul Qamar
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China.
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
| | - Ling-Ling Chen
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China.
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
| | - Yu-Duan Ding
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China.
- College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.
| |
Collapse
|
31
|
You Q, Xu W, Zhang K, Zhang L, Yi X, Yao D, Wang C, Zhang X, Zhao X, Provart NJ, Li F, Su Z. ccNET: Database of co-expression networks with functional modules for diploid and polyploid Gossypium. Nucleic Acids Res 2016; 45:D1090-D1099. [PMID: 28053168 PMCID: PMC5210623 DOI: 10.1093/nar/gkw910] [Citation(s) in RCA: 57] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2016] [Revised: 09/28/2016] [Accepted: 09/30/2016] [Indexed: 12/28/2022] Open
Abstract
Plant genera with both diploid and polyploid species are a common evolutionary occurrence. Polyploids, especially allopolyploids such as cotton and wheat, are a great model system for heterosis research. Here, we have integrated genome sequences and transcriptome data of Gossypium species to construct co-expression networks and identified functional modules from different cotton species, including 1155 and 1884 modules in G. arboreum and G. hirsutum, respectively. We overlayed the gene expression results onto the co-expression network. We further provided network comparison analysis for orthologous genes across the diploid and allotetraploid Gossypium. We also constructed miRNA-target networks and predicted PPI networks for both cotton species. Furthermore, we integrated in-house ChIP-seq data of histone modification (H3K4me3) together with cis-element analysis and gene sets enrichment analysis tools for studying possible gene regulatory mechanism in Gossypium species. Finally, we have constructed an online ccNET database (http://structuralbiology.cau.edu.cn/gossypium) for comparative gene functional analyses at a multi-dimensional network and epigenomic level across diploid and polyploid Gossypium species. The ccNET database will be beneficial for community to yield novel insights into gene/module functions during cotton development and stress response, and might be useful for studying conservation and diversity in other polyploid plants, such as T. aestivum and Brassica napus.
Collapse
Affiliation(s)
- Qi You
- State key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Wenying Xu
- State key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Kang Zhang
- State key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Liwei Zhang
- State key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Xin Yi
- State key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Dongxia Yao
- State key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Chunchao Wang
- State key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Xueyan Zhang
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agriculture Sciences (CAAS), Anyang, Henan 455000, China
| | - Xinhua Zhao
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agriculture Sciences (CAAS), Anyang, Henan 455000, China
| | - Nicholas J Provart
- Department of Cell & Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, 25 Willcocks St, Toronto, ON M5S 3B2, Canada
| | - Fuguang Li
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agriculture Sciences (CAAS), Anyang, Henan 455000, China
| | - Zhen Su
- State key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| |
Collapse
|