1
|
Bioengineering and Molecular Biology of Miscanthus. ENERGIES 2022. [DOI: 10.3390/en15144941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Miscanthus is a perennial wild plant that is vital for the production of paper and roofing, as well as horticulture and the development of new high-yielding crops in temperate climates. Chromosome-level assembly of the ancient tetraploid genome of miscanthus chromosomes is reported to provide resources that can link its chromosomes to related diploid sorghum and complex polyploid sugarcane. Analysis of Miscanthus sinensis and Miscanthus sacchariflorus showed intense mixing and interspecific hybridization and documented the origin of a high-yielding triploid bioenergetic plant, Miscanthus × giganteus. The Miscanthus genome expands comparative genomics functions to better understand the main abilities of Andropogoneae herbs. Miscanthus × giganteus is widely regarded as a promising lignocellulosic biomass crop due to its high-biomass yield, which does not emit toxic compounds into the environment, and ability to grow in depleted lands. The high production cost of lignocellulosic bioethanol limits its commercialization. The main components that inhibit the enzymatic reactions of fermentation and saccharification are lignin in the cell wall and its by-products released during the pre-treatment stage. One approach to overcoming this barrier could be to genetically modify the genes involved in lignin biosynthesis, manipulating the lignin content and composition of miscanthus.
Collapse
|
2
|
Zhang J, Li Y. High-Dimensional Gaussian Graphical Regression Models with Covariates. J Am Stat Assoc 2022; 118:2088-2100. [PMID: 38143787 PMCID: PMC10746132 DOI: 10.1080/01621459.2022.2034632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 01/20/2022] [Indexed: 10/19/2022]
Abstract
Though Gaussian graphical models have been widely used in many scientific fields, relatively limited progress has been made to link graph structures to external covariates. We propose a Gaussian graphical regression model, which regresses both the mean and the precision matrix of a Gaussian graphical model on covariates. In the context of co-expression quantitative trait locus (QTL) studies, our method can determine how genetic variants and clinical conditions modulate the subject-level network structures, and recover both the population-level and subject-level gene networks. Our framework encourages sparsity of covariate effects on both the mean and the precision matrix. In particular for the precision matrix, we stipulate simultaneous sparsity, i.e., group sparsity and element-wise sparsity, on effective covariates and their effects on network edges, respectively. We establish variable selection consistency first under the case with known mean parameters and then a more challenging case with unknown means depending on external covariates, and establish in both cases the ℓ2 convergence rates and the selection consistency of the estimated precision parameters. The utility and efficacy of our proposed method is demonstrated through simulation studies and an application to a co-expression QTL study with brain cancer patients.
Collapse
Affiliation(s)
- Jingfei Zhang
- Department of Management Science, University of Miami, Coral Gables, FL 33146
| | - Yi Li
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109
| |
Collapse
|
3
|
Yang Z, Ho YY. Modeling dynamic correlation in zero-inflated bivariate count data with applications to single-cell RNA sequencing data. Biometrics 2021; 78:766-776. [PMID: 33720414 PMCID: PMC8477913 DOI: 10.1111/biom.13457] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 03/03/2021] [Accepted: 03/08/2021] [Indexed: 12/13/2022]
Abstract
Interactions between biological molecules in a cell are tightly coordinated and often highly dynamic. As a result of these varying signaling activities, changes in gene coexpression patterns could often be observed. The advancements in next‐generation sequencing technologies bring new statistical challenges for studying these dynamic changes of gene coexpression. In recent years, methods have been developed to examine genomic information from individual cells. Single‐cell RNA sequencing (scRNA‐seq) data are count‐based, and often exhibit characteristics such as overdispersion and zero inflation. To explore the dynamic dependence structure in scRNA‐seq data and other zero‐inflated count data, new approaches are needed. In this paper, we consider overdispersion and zero inflation in count outcomes and propose a ZEro‐inflated negative binomial dynamic COrrelation model (ZENCO). The observed count data are modeled as a mixture of two components: success amplifications and dropout events in ZENCO. A latent variable is incorporated into ZENCO to model the covariate‐dependent correlation structure. We conduct simulation studies to evaluate the performance of our proposed method and to compare it with existing approaches. We also illustrate the implementation of our proposed approach using scRNA‐seq data from a study of minimal residual disease in melanoma.
Collapse
Affiliation(s)
- Zhen Yang
- Department of Statistics, University of South Carolina, Columbia, South Carolina, USA
| | - Yen-Yi Ho
- Department of Statistics, University of South Carolina, Columbia, South Carolina, USA
| |
Collapse
|
4
|
Lea A, Subramaniam M, Ko A, Lehtimäki T, Raitoharju E, Kähönen M, Seppälä I, Mononen N, Raitakari OT, Ala-Korpela M, Pajukanta P, Zaitlen N, Ayroles JF. Genetic and environmental perturbations lead to regulatory decoherence. eLife 2019; 8:e40538. [PMID: 30834892 PMCID: PMC6400502 DOI: 10.7554/elife.40538] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2018] [Accepted: 02/14/2019] [Indexed: 01/24/2023] Open
Abstract
Correlation among traits is a fundamental feature of biological systems that remains difficult to study. To address this problem, we developed a flexible approach that allows us to identify factors associated with inter-individual variation in correlation. We use data from three human cohorts to study the effects of genetic and environmental variation on correlations among mRNA transcripts and among NMR metabolites. We first show that environmental exposures (infection and disease) lead to a systematic loss of correlation, which we define as 'decoherence'. Using longitudinal data, we show that decoherent metabolites are better predictors of whether someone will develop metabolic syndrome than metabolites commonly used as biomarkers of this disease. Finally, we demonstrate that correlation itself is under genetic control by mapping hundreds of 'correlation quantitative trait loci (QTLs)'. Together, this work furthers our understanding of how and why coordinated biological processes break down, and points to a potential role for decoherence in disease. Editorial note This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed (see decision letter).
Collapse
Affiliation(s)
- Amanda Lea
- Department of Ecology and EvolutionPrinceton UniversityPrincetonUnited States
- Lewis-Sigler Institute for Integrative GenomicsPrinceton UniversityPrincetonUnited States
| | - Meena Subramaniam
- Department of Medicine, Lung Biology CenterUniversity of California, San FranciscoSan FranciscoUnited States
| | - Arthur Ko
- Department of Medicine, David Geffen School of Medicine at UCLAUniversity of California, Los AngelesLos AngelesUnited States
| | - Terho Lehtimäki
- Department of Clinical Chemistry, Fimlab Laboratories, Faculty of Medicine and Health TechnologyTampere UniversityTampereFinland
- Finnish Cardiovascular Research Center, Faculty of Medicine and Health TechnologyTampere UniversityTampereFinland
| | - Emma Raitoharju
- Finnish Cardiovascular Research Center, Faculty of Medicine and Health TechnologyTampere UniversityTampereFinland
| | - Mika Kähönen
- Finnish Cardiovascular Research Center, Faculty of Medicine and Health TechnologyTampere UniversityTampereFinland
- Department of Clinical PhysiologyTampere University, Tampere University HospitalTampereFinland
| | - Ilkka Seppälä
- Finnish Cardiovascular Research Center, Faculty of Medicine and Health TechnologyTampere UniversityTampereFinland
| | - Nina Mononen
- Finnish Cardiovascular Research Center, Faculty of Medicine and Health TechnologyTampere UniversityTampereFinland
| | - Olli T Raitakari
- Research Centre of Applied and Preventive Cardiovascular MedicineUniversity of TurkuTurkuFinland
- Department of Clinical Physiology and Nuclear MedicineTurku University HospitalTurkuFinland
| | - Mika Ala-Korpela
- Systems Epidemiology, Baker Heart and Diabetes InstituteMelbourneAustralia
- Computational Medicine, Faculty of Medicine, Biocenter OuluUniversity of OuluOuluFinland
- NMR Metabolomics Laboratory, School of PharmacyUniversity of Eastern FinlandKuopioFinland
- Population Health Science, Bristol Medical SchoolUniversity of BristolBristolUnited Kingdom
- Medical Research Council Integrative Epidemiology UnitUniversity of BristolBristolUnited Kingdom
- Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Faculty of Medicine, Nursing and Health SciencesThe Alfred Hospital, Monash UniversityMelbourneAustralia
| | - Päivi Pajukanta
- Department of Human Genetics, David Geffen School of Medicine at UCLAUniversity of California, Los AngelesLos AngelesUnited States
| | - Noah Zaitlen
- Department of Medicine, Lung Biology CenterUniversity of California, San FranciscoSan FranciscoUnited States
| | - Julien F Ayroles
- Department of Ecology and EvolutionPrinceton UniversityPrincetonUnited States
- Lewis-Sigler Institute for Integrative GenomicsPrinceton UniversityPrincetonUnited States
| |
Collapse
|
5
|
Kinzy TG, Starr TK, Tseng GC, Ho YY. Meta-analytic framework for modeling genetic coexpression dynamics. Stat Appl Genet Mol Biol 2019; 18:/j/sagmb.2019.18.issue-1/sagmb-2017-0052/sagmb-2017-0052.xml. [PMID: 30735484 DOI: 10.1515/sagmb-2017-0052] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2024]
Abstract
Methods for exploring genetic interactions have been developed in an attempt to move beyond single gene analyses. Because biological molecules frequently participate in different processes under various cellular conditions, investigating the changes in gene coexpression patterns under various biological conditions could reveal important regulatory mechanisms. One of the methods for capturing gene coexpression dynamics, named liquid association (LA), quantifies the relationship where the coexpression between two genes is modulated by a third "coordinator" gene. This LA measure offers a natural framework for studying gene coexpression changes and has been applied increasingly to study regulatory networks among genes. With a wealth of publicly available gene expression data, there is a need to develop a meta-analytic framework for LA analysis. In this paper, we incorporated mixed effects when modeling correlation to account for between-studies heterogeneity. For statistical inference about LA, we developed a Markov chain Monte Carlo (MCMC) estimation procedure through a Bayesian hierarchical framework. We evaluated the proposed methods in a set of simulations and illustrated their use in two collections of experimental data sets. The first data set combined 10 pancreatic ductal adenocarcinoma gene expression studies to determine the role of possible coordinator gene USP9X in the Hippo pathway. The second experimental data set consisted of 907 gene expression microarray Escherichia coli experiments from multiple studies publicly available through the Many Microbe Microarray Database website (http://m3d.bu.edu/) and examined genes that coexpress with serA in the presence of coordinator gene Lrp.
Collapse
Affiliation(s)
| | | | | | - Yen-Yi Ho
- Department of Statistics, University of South Carolina, Columbia, SC29209,USA
| |
Collapse
|
6
|
Kanonidis EI, Roy MM, Deighton RF, Le Bihan T. Protein Co-Expression Analysis as a Strategy to Complement a Standard Quantitative Proteomics Approach: Case of a Glioblastoma Multiforme Study. PLoS One 2016; 11:e0161828. [PMID: 27571357 PMCID: PMC5003355 DOI: 10.1371/journal.pone.0161828] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Accepted: 08/14/2016] [Indexed: 12/21/2022] Open
Abstract
Although correlation network studies from co-expression analysis are increasingly popular, they are rarely applied to proteomics datasets. Protein co-expression analysis provides a complementary view of underlying trends, which can be overlooked by conventional data analysis. The core of the present study is based on Weighted Gene Co-expression Network Analysis applied to a glioblastoma multiforme proteomic dataset. Using this method, we have identified three main modules which are associated with three different membrane associated groups; mitochondrial, endoplasmic reticulum, and a vesicle fraction. The three networks based on protein co-expression were assessed against a publicly available database (STRING) and show a statistically significant overlap. Each of the three main modules were de-clustered into smaller networks using different strategies based on the identification of highly connected networks, hierarchical clustering and enrichment of Gene Ontology functional terms. Most of the highly connected proteins found in the endoplasmic reticulum module were associated with redox activity while a core of the unfolded protein response was identified in addition to proteins involved in oxidative stress pathways. The proteins composing the electron transfer chain were found differently affected with proteins from mitochondrial Complex I being more down-regulated than proteins from Complex III. Finally, the two pyruvate kinases isoforms show major differences in their co-expressed protein networks suggesting roles in different cellular locations.
Collapse
Affiliation(s)
- Evangelos I. Kanonidis
- SynthSys and School of Biological Sciences, Waddington building, University of Edinburgh, Edinburgh, United Kingdom, EH9 3BF
| | - Marcia M. Roy
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh United Kingdom, EH16 4SB
| | - Ruth F. Deighton
- Edinburgh Medical School: Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh, United Kingdom, EH8 9AG
| | - Thierry Le Bihan
- SynthSys and School of Biological Sciences, Waddington building, University of Edinburgh, Edinburgh, United Kingdom, EH9 3BF
- * E-mail:
| |
Collapse
|
7
|
Hou J, Acharya L, Zhu D, Cheng J. An overview of bioinformatics methods for modeling biological pathways in yeast. Brief Funct Genomics 2016; 15:95-108. [PMID: 26476430 PMCID: PMC5065356 DOI: 10.1093/bfgp/elv040] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The advent of high-throughput genomics techniques, along with the completion of genome sequencing projects, identification of protein-protein interactions and reconstruction of genome-scale pathways, has accelerated the development of systems biology research in the yeast organism Saccharomyces cerevisiae In particular, discovery of biological pathways in yeast has become an important forefront in systems biology, which aims to understand the interactions among molecules within a cell leading to certain cellular processes in response to a specific environment. While the existing theoretical and experimental approaches enable the investigation of well-known pathways involved in metabolism, gene regulation and signal transduction, bioinformatics methods offer new insights into computational modeling of biological pathways. A wide range of computational approaches has been proposed in the past for reconstructing biological pathways from high-throughput datasets. Here we review selected bioinformatics approaches for modeling biological pathways inS. cerevisiae, including metabolic pathways, gene-regulatory pathways and signaling pathways. We start with reviewing the research on biological pathways followed by discussing key biological databases. In addition, several representative computational approaches for modeling biological pathways in yeast are discussed.
Collapse
|
8
|
Abstract
Transcriptional control of gene expression requires interactions between the cis-regulatory elements (CREs) controlling gene promoters. We developed a sensitive computational method to identify CRE combinations with conserved spacing that does not require genome alignments. When applied to seven sensu stricto and sensu lato Saccharomyces species, 80% of the predicted interactions displayed some evidence of combinatorial transcriptional behavior in several existing datasets including: (1) chromatin immunoprecipitation data for colocalization of transcription factors, (2) gene expression data for coexpression of predicted regulatory targets, and (3) gene ontology databases for common pathway membership of predicted regulatory targets. We tested several predicted CRE interactions with chromatin immunoprecipitation experiments in a wild-type strain and strains in which a predicted cofactor was deleted. Our experiments confirmed that transcription factor (TF) occupancy at the promoters of the CRE combination target genes depends on the predicted cofactor while occupancy of other promoters is independent of the predicted cofactor. Our method has the additional advantage of identifying regulatory differences between species. By analyzing the S. cerevisiae and S. bayanus genomes, we identified differences in combinatorial cis-regulation between the species and showed that the predicted changes in gene regulation explain several of the species-specific differences seen in gene expression datasets. In some instances, the same CRE combinations appear to regulate genes involved in distinct biological processes in the two different species. The results of this research demonstrate that (1) combinatorial cis-regulation can be inferred by multi-genome analysis and (2) combinatorial cis-regulation can explain differences in gene expression between species.
Collapse
|
9
|
Yuan H, Li Z, Tang NLS, Deng M. A network based covariance test for detecting multivariate eQTL in saccharomyces cerevisiae. BMC SYSTEMS BIOLOGY 2016; 10 Suppl 1:8. [PMID: 26818242 PMCID: PMC4895706 DOI: 10.1186/s12918-015-0245-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Background Expression quantitative trait locus (eQTL) analysis has been widely used to understand how genetic variations affect gene expressions in the biological systems. Traditional eQTL is investigated in a pair-wise manner in which one SNP affects the expression of one gene. In this way, some associated markers found in GWAS have been related to disease mechanism by eQTL study. However, in real life, biological process is usually performed by a group of genes. Although some methods have been proposed to identify a group of SNPs that affect the mean of gene expressions in the network, the change of co-expression pattern has not been considered. So we propose a process and algorithm to identify the marker which affects the co-expression pattern of a pathway. Considering two genes may have different correlations under different isoforms which is hard to detect by the linear test, we also consider the nonlinear test. Results When we applied our method to yeast eQTL dataset profiled under both the glucose and ethanol conditions, we identified a total of 166 modules, with each module consisting of a group of genes and one eQTL where the eQTL regulate the co-expression patterns of the group of genes. We found that many of these modules have biological significance. Conclusions We propose a network based covariance test to identify the SNP which affects the structure of a pathway. We also consider the nonlinear test as considering two genes may have different correlations under different isoforms which is hard to detect by linear test. Electronic supplementary material The online version of this article (doi:10.1186/s12918-015-0245-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Huili Yuan
- LMAM, School of Mathematical Sciences, Peking University, Yiheyuan Road, Beijing, 100871, China.
| | - Zhenye Li
- LMAM, School of Mathematical Sciences, Peking University, Yiheyuan Road, Beijing, 100871, China.
| | - Nelson L S Tang
- Department of Chemical Pathology, Prince of Wales Hospital, Faculty of Medicine, The Chinese University of Hong Kong, Shatin, Hong Kong, China.
| | - Minghua Deng
- LMAM, School of Mathematical Sciences, Peking University, Yiheyuan Road, Beijing, 100871, China. .,Center for Quantitative Biology, Peking University, Yiheyuan Road, Beijing, 100871, China. .,Center for Statistical Sciences, Peking University, Yiheyuan Road, Beijing, 100871, China.
| |
Collapse
|
10
|
A forest-based feature screening approach for large-scale genome data with complex structures. BMC Genet 2015; 16:148. [PMID: 26698561 PMCID: PMC4690313 DOI: 10.1186/s12863-015-0294-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2015] [Accepted: 11/13/2015] [Indexed: 01/06/2023] Open
Abstract
Background Genome-wide association studies (GWAS) interrogate large-scale whole genome to characterize the complex genetic architecture for biomedical traits. When the number of SNPs dramatically increases to half million but the sample size is still limited to thousands, the traditional p-value based statistical approaches suffer from unprecedented limitations. Feature screening has proved to be an effective and powerful approach to handle ultrahigh dimensional data statistically, yet it has not received much attention in GWAS. Feature screening reduces the feature space from millions to hundreds by removing non-informative noise. However, the univariate measures used to rank features are mainly based on individual effect without considering the mutual interactions with other features. In this article, we explore the performance of a random forest (RF) based feature screening procedure to emphasize the SNPs that have complex effects for a continuous phenotype. Results Both simulation and real data analysis are conducted to examine the power of the forest-based feature screening. We compare it with five other popular feature screening approaches via simulation and conclude that RF can serve as a decent feature screening tool to accommodate complex genetic effects such as nonlinear, interactive, correlative, and joint effects. Unlike the traditional p-value based Manhattan plot, we use the Permutation Variable Importance Measure (PVIM) to display the relative significance and believe that it will provide as much useful information as the traditional plot. Conclusion Most complex traits are found to be regulated by epistatic and polygenic variants. The forest-based feature screening is proven to be an efficient, easily implemented, and accurate approach to cope whole genome data with complex structures. Our explorations should add to a growing body of enlargement of feature screening better serving the demands of contemporary genome data.
Collapse
|
11
|
Jin C, Kim SK, Willis SD, Cooper KF. The MAPKKKs Ste11 and Bck1 jointly transduce the high oxidative stress signal through the cell wall integrity MAP kinase pathway. MICROBIAL CELL 2015; 2:329-342. [PMID: 27135035 PMCID: PMC4850913 DOI: 10.15698/mic2015.09.226] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Oxidative stress stimulates the Rho1 GTPase, which in turn induces the cell wall integrity (CWI) MAP kinase cascade. CWI activation promotes stress-responsive gene expression through activation of transcription factors (Rlm1, SBF) and nuclear release and subsequent destruction of the repressor cyclin C. This study reports that, in response to high hydrogen peroxide exposure, or in the presence of constitutively active Rho1, cyclin C still translocates to the cytoplasm and is degraded in cells lacking Bck1, the MAPKKK of the CWI pathway. However, in mutants defective for both Bck1 and Ste11, the MAPKKK from the high osmolarity, pseudohyphal and mating MAPK pathways, cyclin C nuclear to cytoplasmic relocalization and destruction is prevented. Further analysis revealed that cyclin C goes from a diffuse nuclear signal to a terminal nucleolar localization in this double mutant. Live cell imaging confirmed that cyclin C transiently passes through the nucleolus prior to cytoplasmic entry in wild-type cells. Taken together with previous studies, these results indicate that under low levels of oxidative stress, Bck1 activation is sufficient to induce cyclin C translocation and degradation. However, higher stress conditions also stimulate Ste11, which reinforces the stress signal to cyclin C and other transcription factors. This model would provide a mechanism by which different stress levels can be sensed and interpreted by the cell.
Collapse
Affiliation(s)
- Chunyan Jin
- Department of Molecular Biology, Rowan University School of Osteopathic Medicine, Stratford, NJ, 08055 USA
| | - Stephen K Kim
- Department of Molecular Biology, Rowan University School of Osteopathic Medicine, Stratford, NJ, 08055 USA
| | - Stephen D Willis
- Department of Molecular Biology, Rowan University School of Osteopathic Medicine, Stratford, NJ, 08055 USA
| | - Katrina F Cooper
- Department of Molecular Biology, Rowan University School of Osteopathic Medicine, Stratford, NJ, 08055 USA
| |
Collapse
|
12
|
Liang M, Zhang F, Jin G, Zhu J. FastGCN: a GPU accelerated tool for fast gene co-expression networks. PLoS One 2015; 10:e0116776. [PMID: 25602758 PMCID: PMC4300192 DOI: 10.1371/journal.pone.0116776] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2014] [Accepted: 12/08/2014] [Indexed: 01/31/2023] Open
Abstract
Gene co-expression networks comprise one type of valuable biological networks. Many methods and tools have been published to construct gene co-expression networks; however, most of these tools and methods are inconvenient and time consuming for large datasets. We have developed a user-friendly, accelerated and optimized tool for constructing gene co-expression networks that can fully harness the parallel nature of GPU (Graphic Processing Unit) architectures. Genetic entropies were exploited to filter out genes with no or small expression changes in the raw data preprocessing step. Pearson correlation coefficients were then calculated. After that, we normalized these coefficients and employed the False Discovery Rate to control the multiple tests. At last, modules identification was conducted to construct the co-expression networks. All of these calculations were implemented on a GPU. We also compressed the coefficient matrix to save space. We compared the performance of the GPU implementation with those of multi-core CPU implementations with 16 CPU threads, single-thread C/C++ implementation and single-thread R implementation. Our results show that GPU implementation largely outperforms single-thread C/C++ implementation and single-thread R implementation, and GPU implementation outperforms multi-core CPU implementation when the number of genes increases. With the test dataset containing 16,000 genes and 590 individuals, we can achieve greater than 63 times the speed using a GPU implementation compared with a single-thread R implementation when 50 percent of genes were filtered out and about 80 times the speed when no genes were filtered out.
Collapse
Affiliation(s)
- Meimei Liang
- Institute of Bioinformatics, Zhejiang University, Hangzhou, Zhejiang, RP China, 310058
| | - Futao Zhang
- Institute of Bioinformatics, Zhejiang University, Hangzhou, Zhejiang, RP China, 310058
| | - Gulei Jin
- Institute of Bioinformatics, Zhejiang University, Hangzhou, Zhejiang, RP China, 310058
| | - Jun Zhu
- Institute of Bioinformatics, Zhejiang University, Hangzhou, Zhejiang, RP China, 310058
- * E-mail:
| |
Collapse
|
13
|
Ramayo-Caldas Y, Ballester M, Fortes MRS, Esteve-Codina A, Castelló A, Noguera JL, Fernández AI, Pérez-Enciso M, Reverter A, Folch JM. From SNP co-association to RNA co-expression: novel insights into gene networks for intramuscular fatty acid composition in porcine. BMC Genomics 2014; 15:232. [PMID: 24666776 PMCID: PMC3987146 DOI: 10.1186/1471-2164-15-232] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2013] [Accepted: 03/21/2014] [Indexed: 12/19/2022] Open
Abstract
Background Fatty acids (FA) play a critical role in energy homeostasis and metabolic diseases; in the context of livestock species, their profile also impacts on meat quality for healthy human consumption. Molecular pathways controlling lipid metabolism are highly interconnected and are not fully understood. Elucidating these molecular processes will aid technological development towards improvement of pork meat quality and increased knowledge of FA metabolism, underpinning metabolic diseases in humans. Results The results from genome-wide association studies (GWAS) across 15 phenotypes were subjected to an Association Weight Matrix (AWM) approach to predict a network of 1,096 genes related to intramuscular FA composition in pigs. To identify the key regulators of FA metabolism, we focused on the minimal set of transcription factors (TF) that the explored the majority of the network topology. Pathway and network analyses pointed towards a trio of TF as key regulators of FA metabolism: NCOA2, FHL2 and EP300. Promoter sequence analyses confirmed that these TF have binding sites for some well-know regulators of lipid and carbohydrate metabolism. For the first time in a non-model species, some of the co-associations observed at the genetic level were validated through co-expression at the transcriptomic level based on real-time PCR of 40 genes in adipose tissue, and a further 55 genes in liver. In particular, liver expression of NCOA2 and EP300 differed between pig breeds (Iberian and Landrace) extreme in terms of fat deposition. Highly clustered co-expression networks in both liver and adipose tissues were observed. EP300 and NCOA2 showed centrality parameters above average in the both networks. Over all genes, co-expression analyses confirmed 28.9% of the AWM predicted gene-gene interactions in liver and 33.0% in adipose tissue. The magnitude of this validation varied across genes, with up to 60.8% of the connections of NCOA2 in adipose tissue being validated via co-expression. Conclusions Our results recapitulate the known transcriptional regulation of FA metabolism, predict gene interactions that can be experimentally validated, and suggest that genetic variants mapped to EP300, FHL2, and NCOA2 modulate lipid metabolism and control energy homeostasis in pigs.
Collapse
Affiliation(s)
- Yuliaxis Ramayo-Caldas
- Centre de Recerca en Agrigenòmica (CRAG), Consorci CSIC-IRTA-UAB-UB, Campus UAB, Bellaterra 08193, Spain.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Zhang S, Wang XJ. Promote Connections of Young Computational Biologists in China. GENOMICS, PROTEOMICS & BIOINFORMATICS 2013; 11:253-6. [PMID: 23835348 PMCID: PMC4357815 DOI: 10.1016/j.gpb.2013.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2013] [Revised: 06/23/2013] [Accepted: 07/01/2013] [Indexed: 02/01/2023]
Affiliation(s)
- Shihua Zhang
- National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- Corresponding authors.
| | - Xiu-Jie Wang
- Center for Molecular Systems Biology, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
- Corresponding authors.
| |
Collapse
|