1
|
Burns JJR, Shealy BT, Greer MS, Hadish JA, McGowan MT, Biggs T, Smith MC, Feltus FA, Ficklin SP. Addressing noise in co-expression network construction. Brief Bioinform 2021; 23:6446269. [PMID: 34850822 PMCID: PMC8769892 DOI: 10.1093/bib/bbab495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Revised: 10/25/2021] [Accepted: 10/28/2021] [Indexed: 11/13/2022] Open
Abstract
Gene co-expression networks (GCNs) provide multiple benefits to molecular research including hypothesis generation and biomarker discovery. Transcriptome profiles serve as input for GCN construction and are derived from increasingly larger studies with samples across multiple experimental conditions, treatments, time points, genotypes, etc. Such experiments with larger numbers of variables confound discovery of true network edges, exclude edges and inhibit discovery of context (or condition) specific network edges. To demonstrate this problem, a 475-sample dataset is used to show that up to 97% of GCN edges can be misleading because correlations are false or incorrect. False and incorrect correlations can occur when tests are applied without ensuring assumptions are met, and pairwise gene expression may not meet test assumptions if the expression of at least one gene in the pairwise comparison is a function of multiple confounding variables. The ‘one-size-fits-all’ approach to GCN construction is therefore problematic for large, multivariable datasets. Recently, the Knowledge Independent Network Construction toolkit has been used in multiple studies to provide a dynamic approach to GCN construction that ensures statistical tests meet assumptions and confounding variables are addressed. Additionally, it can associate experimental context for each edge of the network resulting in context-specific GCNs (csGCNs). To help researchers recognize such challenges in GCN construction, and the creation of csGCNs, we provide a review of the workflow.
Collapse
Affiliation(s)
- Joshua J R Burns
- Department of Horticulture, 149 Johnson Hall. Washington State University, Pullman, WA 99164. USA
| | - Benjamin T Shealy
- Department of Electrical & Computer Engineering, 105 Riggs Hall. Clemson University, Clemson, SC 29631. USA
| | - Mitchell S Greer
- School of Electrical Engineering and Computer Science, EME 102. Washington State University, Pullman, WA 99164. USA
| | - John A Hadish
- Molecular Plant Sciences Program, French Ad 324g. Washington State University, Pullman, WA 99164. USA
| | - Matthew T McGowan
- Molecular Plant Sciences Program, French Ad 324g. Washington State University, Pullman, WA 99164. USA
| | - Tyler Biggs
- Department of Horticulture, 149 Johnson Hall. Washington State University, Pullman, WA 99164. USA
| | - Melissa C Smith
- Department of Electrical & Computer Engineering, 105 Riggs Hall. Clemson University, Clemson, SC 29631. USA
| | - F Alex Feltus
- Department of Genetics and Biochemistry, 130 McGinty Court. Clemson University, Clemson, SC 29634. USA.,Biomedical Data Science & Informatics Program, 100 McAdams Hall. Clemson University, Clemson, SC 29634. USA.,Clemson Center for Human Genetics, 114 Gregor Mendel Circle, Greenwood, SC 29646. USA
| | - Stephen P Ficklin
- Department of Horticulture, 149 Johnson Hall. Washington State University, Pullman, WA 99164. USA.,School of Electrical Engineering and Computer Science, EME 102. Washington State University, Pullman, WA 99164. USA
| |
Collapse
|
2
|
Staton M, Cannon E, Sanderson LA, Wegrzyn J, Anderson T, Buehler S, Cobo-Simón I, Faaberg K, Grau E, Guignon V, Gunoskey J, Inderski B, Jung S, Lager K, Main D, Poelchau M, Ramnath R, Richter P, West J, Ficklin S. Tripal, a community update after 10 years of supporting open source, standards-based genetic, genomic and breeding databases. Brief Bioinform 2021; 22:6318561. [PMID: 34251419 PMCID: PMC8574961 DOI: 10.1093/bib/bbab238] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 05/28/2021] [Accepted: 06/01/2021] [Indexed: 12/01/2022] Open
Abstract
Online, open access databases for biological knowledge serve as central repositories for research communities to store, find and analyze integrated, multi-disciplinary datasets. With increasing volumes, complexity and the need to integrate genomic, transcriptomic, metabolomic, proteomic, phenomic and environmental data, community databases face tremendous challenges in ongoing maintenance, expansion and upgrades. A common infrastructure framework using community standards shared by many databases can reduce development burden, provide interoperability, ensure use of common standards and support long-term sustainability. Tripal is a mature, open source platform built to meet this need. With ongoing improvement since its first release in 2009, Tripal provides full functionality for searching, browsing, loading and curating numerous types of data and is a primary technology powering at least 31 publicly available databases spanning plants, animals and human data, primarily storing genomics, genetics and breeding data. Tripal software development is managed by a shared, inclusive governance structure including both project management and advisory teams. Here, we report on the most important and innovative aspects of Tripal after 11 years development, including integration of diverse types of biological data, successful collaborative projects across member databases, and support for implementing FAIR principles.
Collapse
Affiliation(s)
| | - Ethalinda Cannon
- USDA-ARS, Corn Insects and Crop Genetics Research Unit, Ames, IA USA
| | | | | | | | | | | | - Kay Faaberg
- USDA-ARS, National Animal Disease Center, Ames, IA, USA
| | - Emily Grau
- University of Connecticut, Storrs, CT USA
| | | | | | | | - Sook Jung
- Washington State University, Pullman, WA USA
| | - Kelly Lager
- USDA-ARS, National Animal Disease Center, Ames, IA, USA
| | - Dorrie Main
- Washington State University, Pullman, WA USA
| | - Monica Poelchau
- USDA-ARS, National Agricultural Library, Beltsville, MD, USA
| | | | | | - Joe West
- University of Tennessee, Knoxville, TN USA
| | | |
Collapse
|
3
|
Agro-Physiologic Responses and Stress-Related Gene Expression of Four Doubled Haploid Wheat Lines under Salinity Stress Conditions. BIOLOGY 2021; 10:biology10010056. [PMID: 33466713 PMCID: PMC7828821 DOI: 10.3390/biology10010056] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 01/06/2021] [Accepted: 01/08/2021] [Indexed: 12/18/2022]
Abstract
Simple Summary Productivity of wheat can be enhanced using salt-tolerant genotypes. However, the assessment of salt tolerance potential in wheat through agro-physiological traits and stress-related gene expression analysis could potentially minimize the cost of breeding programs and be a powerful way for the selection of the most salt-tolerant genotype. The study evaluated the salt tolerance potential of four doubled haploid lines of wheat and compared them with the check cultivar Sakha-93 using an extensive set of agro-physiologic parameters and salt-stress-related gene expressions. The results indicated that the five genotypes tested displayed reduction in all traits evaluated except the canopy temperature and electrical conductivity, which had the greatest decline occurring in the check cultivar and the least decline in DHL2. The genotypes DHL21 and DHL5 exhibited increased expression rate of salt-stress-related genes under salt stress conditions. The multiple linear regression model and path coefficient analysis showed a coefficient of determination of 0.93. Concluding, the number of spikelets, and/or number of kernels were identified to be unbiased traits for assessing wheat DHLs under salinity conditions, given their contribution and direct impact on the grain yield. Moreover, the two most salt-tolerant genotypes DHL2 and DHL21 can be useful as genetic resources for future breeding programs. Abstract Salinity majorly hinders horizontal and vertical expansion in worldwide wheat production. Productivity can be enhanced using salt-tolerant wheat genotypes. However, the assessment of salt tolerance potential in bread wheat doubled haploid lines (DHL) through agro-physiological traits and stress-related gene expression analysis could potentially minimize the cost of breeding programs and be a powerful way for the selection of the most salt-tolerant genotype. We used an extensive set of agro-physiologic parameters and salt-stress-related gene expressions. Multivariate analysis was used to detect phenotypic and genetic variations of wheat genotypes more closely under salinity stress, and we analyzed how these strategies effectively balance each other. Four doubled haploid lines (DHLs) and the check cultivar (Sakha93) were evaluated in two salinity levels (without and 150 mM NaCl) until harvest. The five genotypes showed reduced growth under 150 mM NaCl; however, the check cultivar (Sakha93) died at the beginning of the flowering stage. Salt stress induced reduction traits, except the canopy temperature and initial electrical conductivity, which was found in each of the five genotypes, with the greatest decline occurring in the check cultivar (Sakha-93) and the least decline in DHL2. The genotypes DHL21 and DHL5 exhibited increased expression rate of salt-stress-related genes (TaNHX1, TaHKT1, and TaCAT1) compared with DHL2 and Sakha93 under salt stress conditions. Principle component analysis detection of the first two components explains 70.78% of the overall variation of all traits (28 out of 32 traits). A multiple linear regression model and path coefficient analysis showed a coefficient of determination (R2) of 0.93. The models identified two interpretive variables, number of spikelets, and/or number of kernels, which can be unbiased traits for assessing wheat DHLs under salinity stress conditions, given their contribution and direct impact on the grain yield.
Collapse
|
4
|
Exploration into biomarker potential of region-specific brain gene co-expression networks. Sci Rep 2020; 10:17089. [PMID: 33051491 PMCID: PMC7553962 DOI: 10.1038/s41598-020-73611-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Accepted: 08/04/2020] [Indexed: 11/08/2022] Open
Abstract
The human brain is a complex organ that consists of several regions each with a unique gene expression pattern. Our intent in this study was to construct a gene co-expression network (GCN) for the normal brain using RNA expression profiles from the Genotype-Tissue Expression (GTEx) project. The brain GCN contains gene correlation relationships that are broadly present in the brain or specific to thirteen brain regions, which we later combined into six overarching brain mini-GCNs based on the brain's structure. Using the expression profiles of brain region-specific GCN edges, we determined how well the brain region samples could be discriminated from each other, visually with t-SNE plots or quantitatively with the Gene Oracle deep learning classifier. Next, we tested these gene sets on their relevance to human tumors of brain and non-brain origin. Interestingly, we found that genes in the six brain mini-GCNs showed markedly higher mutation rates in tumors relative to matched sets of random genes. Further, we found that cortex genes subdivided Head and Neck Squamous Cell Carcinoma (HNSC) tumors and Pheochromocytoma and Paraganglioma (PCPG) tumors into distinct groups. The brain GCN and mini-GCNs are useful resources for the classification of brain regions and identification of biomarker genes for brain related phenotypes.
Collapse
|
5
|
Medina S, Vicente R, Nieto-Taladriz MT, Aparicio N, Chairi F, Vergara-Diaz O, Araus JL. The Plant-Transpiration Response to Vapor Pressure Deficit (VPD) in Durum Wheat Is Associated With Differential Yield Performance and Specific Expression of Genes Involved in Primary Metabolism and Water Transport. FRONTIERS IN PLANT SCIENCE 2019; 9:1994. [PMID: 30697225 PMCID: PMC6341309 DOI: 10.3389/fpls.2018.01994] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Accepted: 12/21/2018] [Indexed: 05/23/2023]
Abstract
The regulation of plant transpiration was proposed as a key factor affecting transpiration efficiency and agronomical adaptation of wheat to water-limited Mediterranean environments. However, to date no studies have related this trait to crop performance in the field. In this study, the transpiration response to increasing vapor pressure deficit (VPD) of modern Spanish semi-dwarf durum wheat lines was evaluated under controlled conditions at vegetative stage, and the agronomical performance of the same set of lines was assessed at grain filling as well as grain yield at maturity, in Mediterranean environments ranging from water stressed to good agronomical conditions. A group of linear-transpiration response (LTR) lines exhibited better performance in grain yield and biomass compared to segmented-transpiration response (STR) lines, particularly in the wetter environments, whereas the reverse occurred only in the most stressed trial. LTR lines generally exhibited better water status (stomatal conductance) and larger green biomass (vegetation indices) during the reproductive stage than STR lines. In both groups, the responses to growing conditions were associated with the expression levels of dehydration-responsive transcription factors (DREB) leading to different performances of primary metabolism-related enzymes. Thus, the response of LTR lines under fair to good conditions was associated with higher transcription levels of genes involved in nitrogen (GS1 and GOGAT) and carbon (RCBL) metabolism, as well as water transport (TIP1.1). In conclusion, modern durum wheat lines differed in their response to water loss, the linear transpiration seemed to favor uptake and transport of water and nutrients, and photosynthetic metabolism led to higher grain yield except for very harsh drought conditions. The transpiration response to VPD may be a trait to further explore when selecting adaptation to specific water conditions.
Collapse
Affiliation(s)
- Susan Medina
- Integrative Crop Ecophysiology Group, Plant Physiology Section, Faculty of Biology, University of Barcelona (UB), Barcelona, Spain
- Facultad de Ciencias Ambientales, Universidad Científica del Sur, Lima, Peru
| | - Rubén Vicente
- Integrative Crop Ecophysiology Group, Plant Physiology Section, Faculty of Biology, University of Barcelona (UB), Barcelona, Spain
| | | | - Nieves Aparicio
- Agricultural Technology Institute of Castilla and León (ITACYL), Valladolid, Spain
| | - Fadia Chairi
- Integrative Crop Ecophysiology Group, Plant Physiology Section, Faculty of Biology, University of Barcelona (UB), Barcelona, Spain
| | - Omar Vergara-Diaz
- Integrative Crop Ecophysiology Group, Plant Physiology Section, Faculty of Biology, University of Barcelona (UB), Barcelona, Spain
| | - José Luis Araus
- Integrative Crop Ecophysiology Group, Plant Physiology Section, Faculty of Biology, University of Barcelona (UB), Barcelona, Spain
| |
Collapse
|
6
|
Zhao K, Lin F, Romero-Gamboa SP, Saha P, Goh HJ, An G, Jung KH, Hazen SP, Bartley LE. Rice Genome-Scale Network Integration Reveals Transcriptional Regulators of Grass Cell Wall Synthesis. FRONTIERS IN PLANT SCIENCE 2019; 10:1275. [PMID: 31681374 PMCID: PMC6813959 DOI: 10.3389/fpls.2019.01275] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Accepted: 09/12/2019] [Indexed: 05/07/2023]
Abstract
Grasses have evolved distinct cell wall composition and patterning relative to dicotyledonous plants. However, despite the importance of this plant family, transcriptional regulation of its cell wall biosynthesis is poorly understood. To identify grass cell wall-associated transcription factors, we constructed the Rice Combined mutual Ranked Network (RCRN). The RCRN covers >90% of annotated rice (Oryza sativa) genes, is high quality, and includes most grass-specific cell wall genes, such as mixed-linkage glucan synthases and hydroxycinnamoyl acyltransferases. Comparing the RCRN and an equivalent Arabidopsis network suggests that grass orthologs of most genetically verified eudicot cell wall regulators also control this process in grasses, but some transcription factors vary significantly in network connectivity between these divergent species. Reverse genetics, yeast-one-hybrid, and protoplast-based assays reveal that OsMYB61a activates a grass-specific acyltransferase promoter, which confirms network predictions and supports grass-specific cell wall synthesis genes being incorporated into conserved regulatory circuits. In addition, 10 of 15 tested transcription factors, including six novel Wall-Associated regulators (WAP1, WACH1, WAHL1, WADH1, OsMYB13a, and OsMYB13b), alter abundance of cell wall-related transcripts when transiently expressed. The results highlight the quality of the RCRN for examining rice biology, provide insight into the evolution of cell wall regulation, and identify network nodes and edges that are possible leads for improving cell wall composition.
Collapse
Affiliation(s)
- Kangmei Zhao
- Department of Microbiology and Plant Biology, University of Oklahoma, Norman, OK, United States
| | - Fan Lin
- Department of Microbiology and Plant Biology, University of Oklahoma, Norman, OK, United States
| | | | - Prasenjit Saha
- Department of Microbiology and Plant Biology, University of Oklahoma, Norman, OK, United States
| | - Hyung-Jung Goh
- Graduate School of Biotechnology & Crop Biotech Institute, Kyung Hee University, Yongin, South Korea
| | - Gynheung An
- Graduate School of Biotechnology & Crop Biotech Institute, Kyung Hee University, Yongin, South Korea
| | - Ki-Hong Jung
- Graduate School of Biotechnology & Crop Biotech Institute, Kyung Hee University, Yongin, South Korea
| | - Samuel P. Hazen
- Department of Biology, University of Massachusetts, Amherst, MA, United States
| | - Laura E. Bartley
- Department of Microbiology and Plant Biology, University of Oklahoma, Norman, OK, United States
- *Correspondence: Laura E. Bartley,
| |
Collapse
|
7
|
Gonzalez-Dominguez J, Martin MJ. MPIGeneNet: Parallel Calculation of Gene Co-Expression Networks on Multicore Clusters. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1732-1737. [PMID: 29028205 DOI: 10.1109/tcbb.2017.2761340] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In this work, we present MPIGeneNet, a parallel tool that applies Pearson's correlation and Random Matrix Theory to construct gene co-expression networks. It is based on the state-of-the-art sequential tool RMTGeneNet, which provides networks with high robustness and sensitivity at the expenses of relatively long runtimes for large scale input datasets. MPIGeneNet returns the same results as RMTGeneNet but improves the memory management, reduces the I/O cost, and accelerates the two most computationally demanding steps of co-expression network construction by exploiting the compute capabilities of common multicore CPU clusters. Our performance evaluation on two different systems using three typical input datasets shows that MPIGeneNet is significantly faster than RMTGeneNet. As an example, our tool is up to 175.41 times faster on a cluster with eight nodes, each one containing two 12-core Intel Haswell processors. The source code of MPIGeneNet, as well as a reference manual, are available at https://sourceforge.net/projects/mpigenenet/.
Collapse
|
8
|
Sorting Five Human Tumor Types Reveals Specific Biomarkers and Background Classification Genes. Sci Rep 2018; 8:8180. [PMID: 29802335 PMCID: PMC5970138 DOI: 10.1038/s41598-018-26310-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Accepted: 05/10/2018] [Indexed: 12/16/2022] Open
Abstract
We applied two state-of-the-art, knowledge independent data-mining methods - Dynamic Quantum Clustering (DQC) and t-Distributed Stochastic Neighbor Embedding (t-SNE) - to data from The Cancer Genome Atlas (TCGA). We showed that the RNA expression patterns for a mixture of 2,016 samples from five tumor types can sort the tumors into groups enriched for relevant annotations including tumor type, gender, tumor stage, and ethnicity. DQC feature selection analysis discovered 48 core biomarker transcripts that clustered tumors by tumor type. When these transcripts were removed, the geometry of tumor relationships changed, but it was still possible to classify the tumors using the RNA expression profiles of the remaining transcripts. We continued to remove the top biomarkers for several iterations and performed cluster analysis. Even though the most informative transcripts were removed from the cluster analysis, the sorting ability of remaining transcripts remained strong after each iteration. Further, in some iterations we detected a repeating pattern of biological function that wasn't detectable with the core biomarker transcripts present. This suggests the existence of a "background classification" potential in which the pattern of gene expression after continued removal of "biomarker" transcripts could still classify tumors in agreement with the tumor type.
Collapse
|
9
|
Jung S, Lee T, Cheng CH, Ficklin S, Yu J, Humann J, Main D. Extension modules for storage, visualization and querying of genomic, genetic and breeding data in Tripal databases. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2017; 2017:4718480. [PMID: 31725859 PMCID: PMC5727400 DOI: 10.1093/database/bax092] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Revised: 11/11/2017] [Accepted: 11/16/2017] [Indexed: 01/15/2023]
Abstract
Tripal is an open-source database platform primarily used for development of genomic, genetic and breeding databases. We report here on the release of the Chado Loader, Chado Data Display and Chado Search modules to extend the functionality of the core Tripal modules. These new extension modules provide additional tools for (1) data loading, (2) customized visualization and (3) advanced search functions for supported data types such as organism, marker, QTL/Mendelian Trait Loci, germplasm, map, project, phenotype, genotype and their respective metadata. The Chado Loader module provides data collection templates in Excel with defined metadata and data loaders with front end forms. The Chado Data Display module contains tools to visualize each data type and the metadata which can be used as is or customized as desired. The Chado Search module provides search and download functionality for the supported data types. Also included are the tools to visualize map and species summary. The use of materialized views in the Chado Search module enables better performance as well as flexibility of data modeling in Chado, allowing existing Tripal databases with different metadata types to utilize the module. These Tripal Extension modules are implemented in the Genome Database for Rosaceae (rosaceae.org), CottonGen (cottongen.org), Citrus Genome Database (citrusgenomedb.org), Genome Database for Vaccinium (vaccinium.org) and the Cool Season Food Legume Database (coolseasonfoodlegume.org). Database URL: https://www.citrusgenomedb.org/, https://www.coolseasonfoodlegume.org/, https://www.cottongen.org/, https://www.rosaceae.org/, https://www.vaccinium.org/.
Collapse
Affiliation(s)
- Sook Jung
- Department of Horticulture, Washington State University, Pullman, WA, 99164, USA
| | - Taein Lee
- Department of Horticulture, Washington State University, Pullman, WA, 99164, USA
| | - Chun-Huai Cheng
- Department of Horticulture, Washington State University, Pullman, WA, 99164, USA
| | - Stephen Ficklin
- Department of Horticulture, Washington State University, Pullman, WA, 99164, USA
| | - Jing Yu
- Department of Horticulture, Washington State University, Pullman, WA, 99164, USA
| | - Jodi Humann
- Department of Horticulture, Washington State University, Pullman, WA, 99164, USA
| | - Dorrie Main
- Department of Horticulture, Washington State University, Pullman, WA, 99164, USA
| |
Collapse
|
10
|
Ficklin SP, Dunwoodie LJ, Poehlman WL, Watson C, Roche KE, Feltus FA. Discovering Condition-Specific Gene Co-Expression Patterns Using Gaussian Mixture Models: A Cancer Case Study. Sci Rep 2017; 7:8617. [PMID: 28819158 PMCID: PMC5561081 DOI: 10.1038/s41598-017-09094-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2017] [Accepted: 07/21/2017] [Indexed: 01/10/2023] Open
Abstract
A gene co-expression network (GCN) describes associations between genes and points to genetic coordination of biochemical pathways. However, genetic correlations in a GCN are only detectable if they are present in the sampled conditions. With the increasing quantity of gene expression samples available in public repositories, there is greater potential for discovery of genetic correlations from a variety of biologically interesting conditions. However, even if gene correlations are present, their discovery can be masked by noise. Noise is introduced from natural variation (intrinsic and extrinsic), systematic variation (caused by sample measurement protocols and instruments), and algorithmic and statistical variation created by selection of data processing tools. A variety of published studies, approaches and methods attempt to address each of these contributions of variation to reduce noise. Here we describe an approach using Gaussian Mixture Models (GMMs) to address natural extrinsic (condition-specific) variation during network construction from mixed input conditions. To demonstrate utility, we build and analyze a condition-annotated GCN from a compendium of 2,016 mixed gene expression data sets from five tumor subtypes obtained from The Cancer Genome Atlas. Our results show that GMMs help discover tumor subtype specific gene co-expression patterns (modules) that are significantly enriched for clinical attributes.
Collapse
Affiliation(s)
- Stephen P Ficklin
- Department of Horticulture, Washington State University, Pullman, WA, 99164, USA.
| | - Leland J Dunwoodie
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, 29631, USA
| | - William L Poehlman
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, 29631, USA
| | - Christopher Watson
- Molecular Plant Sciences Program, Washington State University, Pullman, WA, 99164, USA
| | - Kimberly E Roche
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, 29631, USA
| | - F Alex Feltus
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, 29631, USA.
| |
Collapse
|
11
|
Gonzalez S, Clavijo B, Rivarola M, Moreno P, Fernandez P, Dopazo J, Paniego N. ATGC transcriptomics: a web-based application to integrate, explore and analyze de novo transcriptomic data. BMC Bioinformatics 2017; 18:121. [PMID: 28222698 PMCID: PMC5320735 DOI: 10.1186/s12859-017-1494-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Accepted: 01/21/2017] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND In the last years, applications based on massively parallelized RNA sequencing (RNA-seq) have become valuable approaches for studying non-model species, e.g., without a fully sequenced genome. RNA-seq is a useful tool for detecting novel transcripts and genetic variations and for evaluating differential gene expression by digital measurements. The large and complex datasets resulting from functional genomic experiments represent a challenge in data processing, management, and analysis. This problem is especially significant for small research groups working with non-model species. RESULTS We developed a web-based application, called ATGC transcriptomics, with a flexible and adaptable interface that allows users to work with new generation sequencing (NGS) transcriptomic analysis results using an ontology-driven database. This new application simplifies data exploration, visualization, and integration for a better comprehension of the results. CONCLUSIONS ATGC transcriptomics provides access to non-expert computer users and small research groups to a scalable storage option and simple data integration, including database administration and management. The software is freely available under the terms of GNU public license at http://atgcinta.sourceforge.net .
Collapse
Affiliation(s)
- Sergio Gonzalez
- Instituto de Biotecnología, Centro Investigación en Ciencias Veterinarias y Agronómicas (CICVyA) INTA, Hurlingham, Buenos Aires Argentina
| | | | - Máximo Rivarola
- Instituto de Biotecnología, Centro Investigación en Ciencias Veterinarias y Agronómicas (CICVyA) INTA, Hurlingham, Buenos Aires Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Godoy Cruz 2290, Buenos Aires, C1425FQB Argentina
| | - Patricio Moreno
- Instituto de Ingeniería Biomédica, Facultad de Ingeniería, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Paula Fernandez
- Instituto de Biotecnología, Centro Investigación en Ciencias Veterinarias y Agronómicas (CICVyA) INTA, Hurlingham, Buenos Aires Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Godoy Cruz 2290, Buenos Aires, C1425FQB Argentina
- Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, San Martín, Buenos Aires Argentina
| | - Joaquín Dopazo
- Computational Genomics Department, Centro de Investigación Príncipe Felipe, Valencia, Spain
| | - Norma Paniego
- Instituto de Biotecnología, Centro Investigación en Ciencias Veterinarias y Agronómicas (CICVyA) INTA, Hurlingham, Buenos Aires Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Godoy Cruz 2290, Buenos Aires, C1425FQB Argentina
| |
Collapse
|
12
|
Wytko C, Soto B, Ficklin SP. blend4php: a PHP API for galaxy. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2017; 2017:baw154. [PMID: 28077564 PMCID: PMC5225400 DOI: 10.1093/database/baw154] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Revised: 10/12/2016] [Accepted: 11/01/2016] [Indexed: 01/17/2023]
Abstract
Galaxy is a popular framework for execution of complex analytical pipelines typically for large data sets, and is a commonly used for (but not limited to) genomic, genetic and related biological analysis. It provides a web front-end and integrates with high performance computing resources. Here we report the development of the blend4php library that wraps Galaxy’s RESTful API into a PHP-based library. PHP-based web applications can use blend4php to automate execution, monitoring and management of a remote Galaxy server, including its users, workflows, jobs and more. The blend4php library was specifically developed for the integration of Galaxy with Tripal, the open-source toolkit for the creation of online genomic and genetic web sites. However, it was designed as an independent library for use by any application, and is freely available under version 3 of the GNU Lesser General Public License (LPGL v3.0) at https://github.com/galaxyproject/blend4php. Database URL:https://github.com/galaxyproject/blend4php
Collapse
Affiliation(s)
- Connor Wytko
- Department of Horticulture and.,School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99164, USA
| | - Brian Soto
- Department of Horticulture and.,School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99164, USA
| | | |
Collapse
|
13
|
Sreenivasulu N, Butardo VM, Misra G, Cuevas RP, Anacleto R, Kavi Kishor PB. Designing climate-resilient rice with ideal grain quality suited for high-temperature stress. JOURNAL OF EXPERIMENTAL BOTANY 2015; 66:1737-48. [PMID: 25662847 PMCID: PMC4669556 DOI: 10.1093/jxb/eru544] [Citation(s) in RCA: 89] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/23/2014] [Revised: 12/15/2014] [Accepted: 12/17/2014] [Indexed: 05/18/2023]
Abstract
To ensure rice food security, the target outputs of future rice breeding programmes should focus on developing climate-resilient rice varieties with emphasis on increased head rice yield coupled with superior grain quality. This challenge is made greater by a world that is increasingly becoming warmer. Such environmental changes dramatically impact head rice and milling yield as well as increasing chalkiness because of impairment in starch accumulation and other storage biosynthetic pathways in the grain. This review highlights the knowledge gained through gene discovery via quantitative trait locus (QTL) cloning and structural-functional genomic strategies to reduce chalk, increase head rice yield, and develop stable lines with optimum grain quality in challenging environments. The newly discovered genes and the knowledge gained on the influence of specific alleles related to stability of grain quality attributes provide a robust platform for marker-assisted selection in breeding to design heat-tolerant rice varieties with superior grain quality. Using the chalkiness trait in rice as a case study, we demonstrate here that the emerging field of systems genetics can help fast-track the identification of novel alleles and gene targets that can be pyramided for the development of environmentally robust rice varieties that possess improved grain quality.
Collapse
Affiliation(s)
- Nese Sreenivasulu
- Grain Quality and Nutrition Center, International Rice Research Institute, DAPO Box 7777, Metro Manila 1301, Philippines
| | - Vito M Butardo
- Grain Quality and Nutrition Center, International Rice Research Institute, DAPO Box 7777, Metro Manila 1301, Philippines
| | - Gopal Misra
- Grain Quality and Nutrition Center, International Rice Research Institute, DAPO Box 7777, Metro Manila 1301, Philippines
| | - Rosa Paula Cuevas
- Grain Quality and Nutrition Center, International Rice Research Institute, DAPO Box 7777, Metro Manila 1301, Philippines
| | - Roslen Anacleto
- Grain Quality and Nutrition Center, International Rice Research Institute, DAPO Box 7777, Metro Manila 1301, Philippines
| | | |
Collapse
|
14
|
Feltus FA. Systems genetics: a paradigm to improve discovery of candidate genes and mechanisms underlying complex traits. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2014; 223:45-8. [PMID: 24767114 DOI: 10.1016/j.plantsci.2014.03.003] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2013] [Revised: 02/18/2014] [Accepted: 03/02/2014] [Indexed: 05/02/2023]
Abstract
Understanding the control of any trait optimally requires the detection of causal genes, gene interaction, and mechanism of action to discover and model the biochemical pathways underlying the expressed phenotype. Functional genomics techniques, including RNA expression profiling via microarray and high-throughput DNA sequencing, allow for the precise genome localization of biological information. Powerful genetic approaches, including quantitative trait locus (QTL) and genome-wide association study mapping, link phenotype with genome positions, yet genetics is less precise in localizing the relevant mechanistic information encoded in DNA. The coupling of salient functional genomic signals with genetically mapped positions is an appealing approach to discover meaningful gene-phenotype relationships. Techniques used to define this genetic-genomic convergence comprise the field of systems genetics. This short review will address an application of systems genetics where RNA profiles are associated with genetically mapped genome positions of individual genes (eQTL mapping) or as gene sets (co-expression network modules). Both approaches can be applied for knowledge independent selection of candidate genes (and possible control mechanisms) underlying complex traits where multiple, likely unlinked, genomic regions might control specific complex traits.
Collapse
Affiliation(s)
- F Alex Feltus
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC 29634, USA.
| |
Collapse
|
15
|
Sanderson LA, Ficklin SP, Cheng CH, Jung S, Feltus FA, Bett KE, Main D. Tripal v1.1: a standards-based toolkit for construction of online genetic and genomic databases. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013; 2013:bat075. [PMID: 24163125 PMCID: PMC3808541 DOI: 10.1093/database/bat075] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Tripal is an open-source freely available toolkit for construction of online genomic and genetic databases. It aims to facilitate development of community-driven biological websites by integrating the GMOD Chado database schema with Drupal, a popular website creation and content management software. Tripal provides a suite of tools for interaction with a Chado database and display of content therein. The tools are designed to be generic to support the various ways in which data may be stored in Chado. Previous releases of Tripal have supported organisms, genomic libraries, biological stocks, stock collections and genomic features, their alignments and annotations. Also, Tripal and its extension modules provided loaders for commonly used file formats such as FASTA, GFF, OBO, GAF, BLAST XML, KEGG heir files and InterProScan XML. Default generic templates were provided for common views of biological data, which could be customized using an open Application Programming Interface to change the way data are displayed. Here, we report additional tools and functionality that are part of release v1.1 of Tripal. These include (i) a new bulk loader that allows a site curator to import data stored in a custom tab delimited format; (ii) full support of every Chado table for Drupal Views (a powerful tool allowing site developers to construct novel displays and search pages); (iii) new modules including ‘Feature Map’, ‘Genetic’, ‘Publication’, ‘Project’, ‘Contact’ and the ‘Natural Diversity’ modules. Tutorials, mailing lists, download and set-up instructions, extension modules and other documentation can be found at the Tripal website located at http://tripal.info. Database URL: http://tripal.info/
Collapse
Affiliation(s)
- Lacey-Anne Sanderson
- Department of Plant Sciences, University of Saskatchewan. Saskatoon, SK Canada, Department of Horticulture, Washington State University. Pullman, WA, USA and Department of Genetics and Biochemistry, Clemson University. Clemson, SC, USA
| | | | | | | | | | | | | |
Collapse
|