1
|
Azevedo LM, de Oliveira RR, Dos Reis GL, de Campos Rume G, Alvarenga JP, Gutiérrez RM, de Carvalho Costa J, Chalfun-Junior A. Hormonal crosstalk during the reproductive stage of Coffea arabica: interactions among gibberellin, abscisic acid, and ethylene. PLANTA 2025; 261:110. [PMID: 40223003 DOI: 10.1007/s00425-025-04679-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/28/2025] [Accepted: 03/27/2025] [Indexed: 04/15/2025]
Abstract
MAIN CONCLUSION The application of gibberellin and abscisic acid in coffee plants resulted in increased floral bud formation and fruit production by regulating key genes involved in flowering and hormonal biosynthesis pathways. Despite ongoing efforts, understanding hormonal regulation in perennial and woody species with complex phenological cycles, such as Coffea arabica L., remains limited. Given the global importance of coffee, identifying the main regulators of reproductive development is crucial to guarantee production, especially in face of climate change. This study investigated the effects of gibberellin (GA) and abscisic acid (ABA) at different concentrations (5, 25 and 100 ppm) in the reproductive development of C. arabica. Phenological analyses, molecular identification of genes involved in GA and ABA biosynthesis, degradation, and signaling, as well as gene expression profiling in leaves and floral buds during floral induction and development, were conducted. Promoter analysis of CaFT, quantification of 1-aminocyclopropane-1-carboxylate (ACC), enzymatic activity of ACC oxidase (ACO), and ethylene content were also assessed. Results showed that GA irrespective of concentration and ABA at 25 ppm applied during the main period of floral induction (March) significantly increased the number of floral buds, with ABA also accelerating the development. Similarly, applying these regulators in plants with floral buds at more advanced stages (August) increased the number of floral buds and fruit production in the GA (5 and 100 ppm) and ABA (25 and 100 ppm) treatments. Phylogenetic and molecular analyses identified genes related to GA and ABA biosynthesis, degradation, and signaling in coffee plants. GA and ABA treatments affected the expression of genes related to floral induction and organ formation, such as CaDELLA in March, which may relate to the increased number of floral buds. Moreover, in August, plants treated with 5 and 100 ppm GA and 100 ppm ABA showed up-regulation of CaFT1 expression, likely due to the down-regulation of CaCO during this period. In addition to GA-ABA interactions, our results suggest that GA promotes ACC accumulation in leaves in August, which may act as a mobile signal transported to floral buds, where its conversion to ethylene could regulate anthesis, highlighting a GA-ACC-ethylene interaction in coffee flowering. However, no significant differences in ethylene biosynthesis were observed in March with the application of these hormones, underscoring the incipient role of ethylene during floral induction in coffee. These results suggest reciprocal regulation of floral development by GA-ABA pathways in a dose-dependent manner and interacting with other hormonal pathways such as the ethylene biosynthesis in leaves and floral buds. These findings provide new insights into the hormonal regulation of coffee flowering, guiding field practices and breeding programs to maximize coffee production.
Collapse
Affiliation(s)
- Lillian Magalhães Azevedo
- Laboratory of Plant Molecular Physiology, Plant Physiology Sector, Institute of Biology, Federal University of Lavras (UFLA), Lavras, Minas Gerais, Brazil
| | - Raphael Ricon de Oliveira
- Laboratory of Plant Molecular Physiology, Plant Physiology Sector, Institute of Biology, Federal University of Lavras (UFLA), Lavras, Minas Gerais, Brazil
- Department of Biological Sciences, State University of Santa Cruz (UESC), Ilhéus, Bahia, Brazil
| | - Gabriel Lasmar Dos Reis
- Laboratory of Plant Molecular Physiology, Plant Physiology Sector, Institute of Biology, Federal University of Lavras (UFLA), Lavras, Minas Gerais, Brazil
| | - Gabriel de Campos Rume
- Laboratory of Plant Molecular Physiology, Plant Physiology Sector, Institute of Biology, Federal University of Lavras (UFLA), Lavras, Minas Gerais, Brazil
| | - Joyce Pereira Alvarenga
- Laboratory of Plant Molecular Physiology, Plant Physiology Sector, Institute of Biology, Federal University of Lavras (UFLA), Lavras, Minas Gerais, Brazil
| | - Robert Márquez Gutiérrez
- Laboratory of Plant Molecular Physiology, Plant Physiology Sector, Institute of Biology, Federal University of Lavras (UFLA), Lavras, Minas Gerais, Brazil
| | - Júlia de Carvalho Costa
- Laboratory of Plant Molecular Physiology, Plant Physiology Sector, Institute of Biology, Federal University of Lavras (UFLA), Lavras, Minas Gerais, Brazil
| | - Antonio Chalfun-Junior
- Laboratory of Plant Molecular Physiology, Plant Physiology Sector, Institute of Biology, Federal University of Lavras (UFLA), Lavras, Minas Gerais, Brazil.
| |
Collapse
|
2
|
Hecker D, Lauber M, Behjati Ardakani F, Ashrafiyan S, Manz Q, Kersting J, Hoffmann M, Schulz MH, List M. Computational tools for inferring transcription factor activity. Proteomics 2023; 23:e2200462. [PMID: 37706624 DOI: 10.1002/pmic.202200462] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 08/11/2023] [Accepted: 08/22/2023] [Indexed: 09/15/2023]
Abstract
Transcription factors (TFs) are essential players in orchestrating the regulatory landscape in cells. Still, their exact modes of action and dependencies on other regulatory aspects remain elusive. Since TFs act cell type-specific and each TF has its own characteristics, untangling their regulatory interactions from an experimental point of view is laborious and convoluted. Thus, there is an ongoing development of computational tools that estimate transcription factor activity (TFA) from a variety of data modalities, either based on a mapping of TFs to their putative target genes or in a genome-wide, gene-unspecific fashion. These tools can help to gain insights into TF regulation and to prioritize candidates for experimental validation. We want to give an overview of available computational tools that estimate TFA, illustrate examples of their application, debate common result validation strategies, and discuss assumptions and concomitant limitations.
Collapse
Affiliation(s)
- Dennis Hecker
- Goethe University Frankfurt, Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, Frankfurt am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, Frankfurt am Main, Germany
| | - Michael Lauber
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Fatemeh Behjati Ardakani
- Goethe University Frankfurt, Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, Frankfurt am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, Frankfurt am Main, Germany
| | - Shamim Ashrafiyan
- Goethe University Frankfurt, Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, Frankfurt am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, Frankfurt am Main, Germany
| | - Quirin Manz
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Johannes Kersting
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
- GeneSurge GmbH, München, Germany
| | - Markus Hoffmann
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
- Institute for Advanced Study, Technical University of Munich, Garching, Germany
- National Institute of Diabetes, Digestive, and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, USA
| | - Marcel H Schulz
- Goethe University Frankfurt, Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner site Rhein-Main, Frankfurt am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, Frankfurt am Main, Germany
| | - Markus List
- Big Data in BioMedicine Group, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| |
Collapse
|
3
|
Liu C, Wang Z, Wang J, Liu C, Wang M, Ngo V, Wang W. Predicting regional somatic mutation rates using DNA motifs. PLoS Comput Biol 2023; 19:e1011536. [PMID: 37782656 PMCID: PMC10569533 DOI: 10.1371/journal.pcbi.1011536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Revised: 10/12/2023] [Accepted: 09/20/2023] [Indexed: 10/04/2023] Open
Abstract
How the locus-specificity of epigenetic modifications is regulated remains an unanswered question. A contributing mechanism is that epigenetic enzymes are recruited to specific loci by DNA binding factors recognizing particular sequence motifs (referred to as epi-motifs). Using these motifs to predict biological outputs depending on local epigenetic state such as somatic mutation rates would confirm their functionality. Here, we used DNA motifs including known TF motifs and epi-motifs as a surrogate of epigenetic signals to predict somatic mutation rates in 13 cancers at an average 23kbp resolution. We implemented an interpretable neural network model, called contextual regression, to successfully learn the universal relationship between mutations and DNA motifs, and uncovered motifs that are most impactful on the regional mutation rates such as TP53 and epi-motifs associated with H3K9me3. Furthermore, we identified genomic regions with significantly higher mutation rates than the expected values in each individual tumor and demonstrated that such cancer-related regions can accurately predict cancer types. Interestingly, we found that the same mutation signatures often have different contributions to cancer-related and cancer-independent regions, and we also identified the motifs with the most contribution to each mutation signature.
Collapse
Affiliation(s)
- Cong Liu
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California, United States of America
| | - Zengmiao Wang
- State Key Laboratory of Remote Sensing Science, Center for Global Change and Public Health, Faculty of Geographical Science, Beijing Normal University, Beijing, China
| | - Jun Wang
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California, United States of America
| | - Chengyu Liu
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California, United States of America
| | - Mengchi Wang
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, California, United States of America
| | - Vu Ngo
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, California, United States of America
| | - Wei Wang
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California, United States of America
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, California, United States of America
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, California, United States of America
| |
Collapse
|
4
|
Li X, Lappalainen T, Bussemaker HJ. Identifying genetic regulatory variants that affect transcription factor activity. CELL GENOMICS 2023; 3:100382. [PMID: 37719147 PMCID: PMC10504674 DOI: 10.1016/j.xgen.2023.100382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 05/19/2023] [Accepted: 07/21/2023] [Indexed: 09/19/2023]
Abstract
Genetic variants affecting gene expression levels in humans have been mapped in the Genotype-Tissue Expression (GTEx) project. Trans-acting variants impacting many genes simultaneously through a shared transcription factor (TF) are of particular interest. Here, we developed a generalized linear model (GLM) to estimate protein-level TF activity levels in an individual-specific manner from GTEx RNA sequencing (RNA-seq) profiles. It uses observed differential gene expression after TF perturbation as a predictor and, by analyzing differential expression within pairs of neighboring genes, controls for the confounding effect of variation in chromatin state along the genome. We inferred genotype-specific activities for 55 TFs across 49 tissues. Subsequently performing genome-wide association analysis on this virtual trait revealed TF activity quantitative trait loci (aQTLs) that, as a set, are enriched for functional features. Altogether, the set of tools we introduce here highlights the potential of genetic association studies for cellular endophenotypes based on a network-based multi-omics approach. The transparent peer review record is available.
Collapse
Affiliation(s)
- Xiaoting Li
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - Tuuli Lappalainen
- New York Genome Center, New York, NY 10013, USA
- Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Harmen J. Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| |
Collapse
|
5
|
Wang Z, Zhu S, Jia Y, Wang Y, Kubota N, Fujiwara N, Gordillo R, Lewis C, Zhu M, Sharma T, Li L, Zeng Q, Lin YH, Hsieh MH, Gopal P, Wang T, Hoare M, Campbell P, Hoshida Y, Zhu H. Positive selection of somatically mutated clones identifies adaptive pathways in metabolic liver disease. Cell 2023; 186:1968-1984.e20. [PMID: 37040760 PMCID: PMC10321862 DOI: 10.1016/j.cell.2023.03.014] [Citation(s) in RCA: 39] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 12/08/2022] [Accepted: 03/14/2023] [Indexed: 04/13/2023]
Abstract
Somatic mutations in nonmalignant tissues accumulate with age and injury, but whether these mutations are adaptive on the cellular or organismal levels is unclear. To interrogate genes in human metabolic disease, we performed lineage tracing in mice harboring somatic mosaicism subjected to nonalcoholic steatohepatitis (NASH). Proof-of-concept studies with mosaic loss of Mboat7, a membrane lipid acyltransferase, showed that increased steatosis accelerated clonal disappearance. Next, we induced pooled mosaicism in 63 known NASH genes, allowing us to trace mutant clones side by side. This in vivo tracing platform, which we coined MOSAICS, selected for mutations that ameliorate lipotoxicity, including mutant genes identified in human NASH. To prioritize new genes, additional screening of 472 candidates identified 23 somatic perturbations that promoted clonal expansion. In validation studies, liver-wide deletion of Tbx3, Bcl6, or Smyd2 resulted in protection against hepatic steatosis. Selection for clonal fitness in mouse and human livers identifies pathways that regulate metabolic disease.
Collapse
Affiliation(s)
- Zixi Wang
- Children's Research Institute, Departments of Pediatrics and Internal Medicine, Center for Regenerative Science and Medicine, Simmons Comprehensive Cancer Center, Children's Research Institute Mouse Genome Engineering Core, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Shijia Zhu
- Liver Tumor Translational Research Program, Simmons Comprehensive Cancer Center, Division of Digestive and Liver Diseases, Department of Internal Medicine, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Yuemeng Jia
- Children's Research Institute, Departments of Pediatrics and Internal Medicine, Center for Regenerative Science and Medicine, Simmons Comprehensive Cancer Center, Children's Research Institute Mouse Genome Engineering Core, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Yunguan Wang
- Quantitative Biomedical Research Center, Department of Population and Data Sciences, Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Naoto Kubota
- Liver Tumor Translational Research Program, Simmons Comprehensive Cancer Center, Division of Digestive and Liver Diseases, Department of Internal Medicine, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Naoto Fujiwara
- Liver Tumor Translational Research Program, Simmons Comprehensive Cancer Center, Division of Digestive and Liver Diseases, Department of Internal Medicine, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Ruth Gordillo
- Touchstone Diabetes Center, Department of Internal Medicine, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Cheryl Lewis
- Tissue Management Shared Resource, Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Min Zhu
- Children's Research Institute, Departments of Pediatrics and Internal Medicine, Center for Regenerative Science and Medicine, Simmons Comprehensive Cancer Center, Children's Research Institute Mouse Genome Engineering Core, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Tripti Sharma
- Children's Research Institute, Departments of Pediatrics and Internal Medicine, Center for Regenerative Science and Medicine, Simmons Comprehensive Cancer Center, Children's Research Institute Mouse Genome Engineering Core, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Lin Li
- Children's Research Institute, Departments of Pediatrics and Internal Medicine, Center for Regenerative Science and Medicine, Simmons Comprehensive Cancer Center, Children's Research Institute Mouse Genome Engineering Core, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Qiyu Zeng
- Children's Research Institute, Departments of Pediatrics and Internal Medicine, Center for Regenerative Science and Medicine, Simmons Comprehensive Cancer Center, Children's Research Institute Mouse Genome Engineering Core, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Yu-Hsuan Lin
- Children's Research Institute, Departments of Pediatrics and Internal Medicine, Center for Regenerative Science and Medicine, Simmons Comprehensive Cancer Center, Children's Research Institute Mouse Genome Engineering Core, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Meng-Hsiung Hsieh
- Children's Research Institute, Departments of Pediatrics and Internal Medicine, Center for Regenerative Science and Medicine, Simmons Comprehensive Cancer Center, Children's Research Institute Mouse Genome Engineering Core, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Purva Gopal
- Department of Pathology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Tao Wang
- Quantitative Biomedical Research Center, Department of Population and Data Sciences, Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Matt Hoare
- University of Cambridge Department of Medicine, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK; University of Cambridge Early Cancer Institute, Hutchison Research Centre, Cambridge Biomedical Campus, Cambridge CB2 0XZ, UK
| | - Peter Campbell
- Cancer Genome Project, Wellcome Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Yujin Hoshida
- Liver Tumor Translational Research Program, Simmons Comprehensive Cancer Center, Division of Digestive and Liver Diseases, Department of Internal Medicine, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Hao Zhu
- Children's Research Institute, Departments of Pediatrics and Internal Medicine, Center for Regenerative Science and Medicine, Simmons Comprehensive Cancer Center, Children's Research Institute Mouse Genome Engineering Core, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.
| |
Collapse
|
6
|
Wang Z, Zhu S, Jia Y, Wang Y, Kubota N, Fujiwara N, Gordillo R, Lewis C, Zhu M, Sharma T, Li L, Zeng Q, Lin YH, Hsieh MH, Gopal P, Wang T, Hoare M, Campbell P, Hoshida Y, Zhu H. Positive selection of somatically mutated clones identifies adaptive pathways in metabolic liver disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.20.533505. [PMID: 36993727 PMCID: PMC10055219 DOI: 10.1101/2023.03.20.533505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
Abstract
Somatic mutations in non-malignant tissues accumulate with age and insult, but whether these mutations are adaptive on the cellular or organismal levels is unclear. To interrogate mutations found in human metabolic disease, we performed lineage tracing in mice harboring somatic mosaicism subjected to non-alcoholic steatohepatitis (NASH). Proof-of-concept studies with mosaic loss of Mboat7 , a membrane lipid acyltransferase, showed that increased steatosis accelerated clonal disappearance. Next, we induced pooled mosaicism in 63 known NASH genes, allowing us to trace mutant clones side-by-side. This in vivo tracing platform, which we coined MOSAICS, selected for mutations that ameliorate lipotoxicity, including mutant genes identified in human NASH. To prioritize new genes, additional screening of 472 candidates identified 23 somatic perturbations that promoted clonal expansion. In validation studies, liver-wide deletion of Bcl6, Tbx3, or Smyd2 resulted in protection against NASH. Selection for clonal fitness in mouse and human livers identifies pathways that regulate metabolic disease. Highlights Mosaic Mboat7 mutations that increase lipotoxicity lead to clonal disappearance in NASH. In vivo screening can identify genes that alter hepatocyte fitness in NASH. Mosaic Gpam mutations are positively selected due to reduced lipogenesis. In vivo screening of transcription factors and epifactors identified new therapeutic targets in NASH.
Collapse
|
7
|
Ahmad N, Muhammad J, Khan K, Ali W, Fazal H, Ali M, Rahman LU, Khan H, Uddin MN, Abbasi BH, Hano C. Silver and gold nanoparticles induced differential antimicrobial potential in calli cultures of Prunella vulgaris. BMC Chem 2022; 16:20. [PMID: 35337384 PMCID: PMC8957128 DOI: 10.1186/s13065-022-00816-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 03/14/2022] [Indexed: 11/13/2022] Open
Abstract
Background Prunella vulgaris is medicinally important plant containing high-valued chemical metabolites like Prunellin which belong to family Lamiaceae and it is also known as self-heal. In this research, calli culture were exposed to differential ratios of gold (Au) and silver (Ag) nanoparticles (1:1, 1:2, 1:3, 2:1 and 3:1) along with naphthalene acetic acid (2.0 mg NAA) to investigate its antimicrobial potential. A well diffusion method was used for antimicrobial properties. Results Here, two concentrations (1 and 2 mg/6 µl) of all treated calli cultures and wild plants were used against Escherichia coli, Pseudomonas aeruginosa, Salmonella typhi, Bacillus atrophaeus, Bacillus subtilis, Agrobacterium tumefaciens, Erwinia caratovora and Candida albicans. Dimethyl sulfoxide (DMSO) and antibiotics were used as negative and positive controls. Here, the calli exposed to gold (Au) nanoparticles (NPs) and 2.0 mg naphthalene acetic acid (NAA) displayed the highest activity (25.7 mm) against Salmonella typhi than other extracts, which was considered the most susceptible species, while Agrobacterium tumefaciens and Candida albicans was the most resistance species. A possible mechanism of calli induced nanoparticles was also investigated for cytoplasmic leakage. Conclusion From the above data it is concluded that Prunella vulgaris is medicinally important plant for the development of anti-microbial drugs using nanotechnology and applicable in various pharmaceutical research.
Collapse
Affiliation(s)
- Nisar Ahmad
- Centre for Biotechnology and Microbiology, University of Swat, Swat, 19200, Pakistan.
| | - Jan Muhammad
- Centre for Biotechnology and Microbiology, University of Swat, Swat, 19200, Pakistan
| | - Khalil Khan
- Centre for Biotechnology and Microbiology, University of Swat, Swat, 19200, Pakistan
| | - Wajid Ali
- Centre for Biotechnology and Microbiology, University of Swat, Swat, 19200, Pakistan
| | - Hina Fazal
- Pakistan Council of Scientific and Industrial Research (PCSIR) Laboratories Complex, Peshawar, 25120, Pakistan
| | - Mohammad Ali
- Centre for Biotechnology and Microbiology, University of Swat, Swat, 19200, Pakistan
| | - Latif-Ur Rahman
- Institute of Chemical Sciences, University of Peshawar, Peshawar, 25120, Pakistan
| | - Hayat Khan
- Centre for Biotechnology and Microbiology, University of Swat, Swat, 19200, Pakistan
| | - Muhammad Nazir Uddin
- Centre for Biotechnology and Microbiology, University of Swat, Swat, 19200, Pakistan
| | - Bilal Haider Abbasi
- Department of biotechnology, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad, 45320, Pakistan.
| | - Christophe Hano
- Université d'Orléans, Laboratoire de Biologie des Ligneux et des Grandes Cultures (LBLGC), INRA USC1328, 28000, Chartres, France
| |
Collapse
|
8
|
Benner P, Vingron M. Quantifying the tissue-specific regulatory information within enhancer DNA sequences. NAR Genom Bioinform 2021; 3:lqab095. [PMID: 34729474 PMCID: PMC8557370 DOI: 10.1093/nargab/lqab095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 09/23/2021] [Accepted: 09/28/2021] [Indexed: 12/04/2022] Open
Abstract
Recent efforts to measure epigenetic marks across a wide variety of different cell types and tissues provide insights into the cell type-specific regulatory landscape. We use these data to study whether there exists a correlate of epigenetic signals in the DNA sequence of enhancers and explore with computational methods to what degree such sequence patterns can be used to predict cell type-specific regulatory activity. By constructing classifiers that predict in which tissues enhancers are active, we are able to identify sequence features that might be recognized by the cell in order to regulate gene expression. While classification performances vary greatly between tissues, we show examples where our classifiers correctly predict tissue-specific regulation from sequence alone. We also show that many of the informative patterns indeed harbor transcription factor footprints.
Collapse
Affiliation(s)
- Philipp Benner
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestraße 73, 14195 Berlin, Germany
| | - Martin Vingron
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestraße 73, 14195 Berlin, Germany
| |
Collapse
|
9
|
Ma CZ, Brent MR. Inferring TF activities and activity regulators from gene expression data with constraints from TF perturbation data. Bioinformatics 2021; 37:1234-1245. [PMID: 33135076 PMCID: PMC8189679 DOI: 10.1093/bioinformatics/btaa947] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Revised: 09/26/2020] [Accepted: 10/27/2020] [Indexed: 12/20/2022] Open
Abstract
Motivation The activity of a transcription factor (TF) in a sample of cells is the extent to which it is exerting its regulatory potential. Many methods of inferring TF activity from gene expression data have been described, but due to the lack of appropriate large-scale datasets, systematic and objective validation has not been possible until now. Results We systematically evaluate and optimize the approach to TF activity inference in which a gene expression matrix is factored into a condition-independent matrix of control strengths and a condition-dependent matrix of TF activity levels. We find that expression data in which the activities of individual TFs have been perturbed are both necessary and sufficient for obtaining good performance. To a considerable extent, control strengths inferred using expression data from one growth condition carry over to other conditions, so the control strength matrices derived here can be used by others. Finally, we apply these methods to gain insight into the upstream factors that regulate the activities of yeast TFs Gcr2, Gln3, Gcn4 and Msn2. Availability and implementation Evaluation code and data are available at https://doi.org/10.5281/zenodo.4050573. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Cynthia Z Ma
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA
| | - Michael R Brent
- Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO 63110, USA.,Department of Computer Science and Engineering, Washington University, St. Louis, MO 63130, USA.,Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| |
Collapse
|
10
|
Zhang Q, Chen F, Wu S, Liang H. A simple yet powerful test for assessing goodness-of-fit of high-dimensional linear models. Stat Med 2021; 40:3153-3166. [PMID: 33792070 DOI: 10.1002/sim.8968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Revised: 01/16/2021] [Accepted: 03/13/2021] [Indexed: 11/06/2022]
Abstract
We evaluate the validity of a projection-based test checking linear models when the number of covariates tends to infinity, and analyze two gene expression datasets. We show that the test is still consistent and derive the asymptotic distributions under the null and alternative hypotheses. The asymptotic properties are almost the same as those when the number of covariates is fixed as long as p/n → 0 with additional mild assumptions. The test dramatically gains dimension reduction, and its numerical performance is remarkable.
Collapse
Affiliation(s)
- Qi Zhang
- School of Mathematics and Statistics, Qingdao University, Shandong, China
| | - Feifei Chen
- Center for Statistics and Data Science, Beijing Normal University, Zhuhai, China
| | - Shunyao Wu
- College of Computer Science and Technology, Qingdao University, Shandong, China
| | - Hua Liang
- Department of Statistics, George Washington University, Washington, District of Columbia, USA
| |
Collapse
|
11
|
Identification of Cis-Regulatory Sequences Controlling Pollen-Specific Expression of Hydroxyproline-Rich Glycoprotein Genes in Arabidopsis thaliana. PLANTS 2020; 9:plants9121751. [PMID: 33322028 PMCID: PMC7763877 DOI: 10.3390/plants9121751] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Revised: 11/26/2020] [Accepted: 12/07/2020] [Indexed: 02/06/2023]
Abstract
Hydroxyproline-rich glycoproteins (HRGPs) are a superfamily of plant cell wall structural proteins that function in various aspects of plant growth and development, including pollen tube growth. We have previously characterized protein sequence signatures for three family members in the HRGP superfamily: the hyperglycosylated arabinogalactan-proteins (AGPs), the moderately glycosylated extensins (EXTs), and the lightly glycosylated proline-rich proteins (PRPs). However, the mechanism of pollen-specific HRGP gene expression remains unexplored. To this end, we developed an integrative analysis pipeline combining RNA-seq gene expression and promoter sequences to identify cis-regulatory motifs responsible for pollen-specific expression of HRGP genes in Arabidopsis thaliana. Specifically, we mined the public RNA-seq datasets and identified 13 pollen-specific HRGP genes. Ensemble motif discovery identified 15 conserved promoter elements between A.thaliana and A. lyrata. Motif scanning revealed two pollen related transcription factors: GATA12 and brassinosteroid (BR) signaling pathway regulator BZR1. Finally, we performed a regression analysis and demonstrated that the 15 motifs provided a good model of HRGP gene expression in pollen (R = 0.61). In conclusion, we performed the first integrative analysis of cis-regulatory motifs in pollen-specific HRGP genes, revealing important insights into transcriptional regulation in pollen tissue.
Collapse
|
12
|
Wylie DC, Hofmann HA, Zemelman BV. SArKS: de novo discovery of gene expression regulatory motif sites and domains by suffix array kernel smoothing. Bioinformatics 2020; 35:3944-3952. [PMID: 30903136 DOI: 10.1093/bioinformatics/btz198] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Revised: 03/04/2019] [Accepted: 03/20/2019] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION We set out to develop an algorithm that can mine differential gene expression data to identify candidate cell type-specific DNA regulatory sequences. Differential expression is usually quantified as a continuous score-fold-change, test-statistic, P-value-comparing biological classes. Unlike existing approaches, our de novo strategy, termed SArKS, applies non-parametric kernel smoothing to uncover promoter motif sites that correlate with elevated differential expression scores. SArKS detects motif k-mers by smoothing sequence scores over sequence similarity. A second round of smoothing over spatial proximity reveals multi-motif domains (MMDs). Discovered motif sites can then be merged or extended based on adjacency within MMDs. False positive rates are estimated and controlled by permutation testing. RESULTS We applied SArKS to published gene expression data representing distinct neocortical neuron classes in Mus musculus and interneuron developmental states in Homo sapiens. When benchmarked against several existing algorithms using a cross-validation procedure, SArKS identified larger motif sets that formed the basis for regression models with higher correlative power. AVAILABILITY AND IMPLEMENTATION https://github.com/denniscwylie/sarks. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dennis C Wylie
- Center for Computational Biology and Bioinformatics, University of Texas at Austin, Austin, TX, USA
| | - Hans A Hofmann
- Center for Computational Biology and Bioinformatics, University of Texas at Austin, Austin, TX, USA.,Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, USA.,Department of Integrative Biology, University of Texas at Austin, Austin, TX, USA.,Institute for Neuroscience, University of Texas at Austin, Austin, TX, USA
| | - Boris V Zemelman
- Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, TX, USA.,Institute for Neuroscience, University of Texas at Austin, Austin, TX, USA.,Department of Neuroscience, University of Texas at Austin, Austin, TX, USA.,Center for Learning and Memory, University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
13
|
Grote A, Li Y, Liu C, Voronin D, Geber A, Lustigman S, Unnasch TR, Welch L, Ghedin E. Prediction pipeline for discovery of regulatory motifs associated with Brugia malayi molting. PLoS Negl Trop Dis 2020; 14:e0008275. [PMID: 32574217 PMCID: PMC7337397 DOI: 10.1371/journal.pntd.0008275] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Revised: 07/06/2020] [Accepted: 04/07/2020] [Indexed: 11/19/2022] Open
Abstract
Filarial nematodes can cause debilitating diseases in humans. They have complicated life cycles involving an insect vector and mammalian hosts, and they go through a number of developmental molts. While whole genome sequences of parasitic worms are now available, very little is known about transcription factor (TF) binding sites and their cognate transcription factors that play a role in regulating development. To address this gap, we developed a novel motif prediction pipeline, Emotif Alpha, that integrates ten different motif discovery algorithms, multiple statistical tests, and a comparative analysis of conserved elements between the filarial worms Brugia malayi and Onchocerca volvulus, and the free-living nematode Caenorhabditis elegans. We identified stage-specific TF binding motifs in B. malayi, with a particular focus on those potentially involved in the L3-L4 molt, a stage important for the establishment of infection in the mammalian host. Using an in vitro molting system, we tested and validated three of these motifs demonstrating the accuracy of the motif prediction pipeline. Diseases caused by parasitic worms such as the filariae are among the leading causes of morbidity in the developing world. Very little is known about how development is regulated in these vector-transmitted parasites. We have developed a computational method to identify motifs that correspond to transcription factor binding sites in the genome of the parasitic worm, Brugia malayi, one of the causative agents of lymphatic filariasis. Using this approach, we were able to predict stage-specific transcription factor binding sites involved in a stage of the molting process important for the establishment of the infection. We validated the role of these motifs using an in vitro molting system.
Collapse
Affiliation(s)
- Alexandra Grote
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
| | - Yichao Li
- School of Computer Science and Electrical Engineering, Ohio University, Athens, Ohio, United States of America
| | - Canhui Liu
- Center for Global Infectious Disease Research, University of South Florida, Tampa, FL, Florida, United States of America
| | - Denis Voronin
- Laboratory of Molecular Parasitology, Lindsley F. Kimball Research Institute, New York Blood Center, New York, New York, United States of America
| | - Adam Geber
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
| | - Sara Lustigman
- Laboratory of Molecular Parasitology, Lindsley F. Kimball Research Institute, New York Blood Center, New York, New York, United States of America
| | - Thomas R. Unnasch
- Center for Global Infectious Disease Research, University of South Florida, Tampa, FL, Florida, United States of America
| | - Lonnie Welch
- School of Computer Science and Electrical Engineering, Ohio University, Athens, Ohio, United States of America
- * E-mail: (LW); (EG)
| | - Elodie Ghedin
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
- Department of Epidemiology, School of Global Public Health, New York University, New York, New York, United States of America
- * E-mail: (LW); (EG)
| |
Collapse
|
14
|
Foster JM, Grote A, Mattick J, Tracey A, Tsai YC, Chung M, Cotton JA, Clark TA, Geber A, Holroyd N, Korlach J, Li Y, Libro S, Lustigman S, Michalski ML, Paulini M, Rogers MB, Teigen L, Twaddle A, Welch L, Berriman M, Dunning Hotopp JC, Ghedin E. Sex chromosome evolution in parasitic nematodes of humans. Nat Commun 2020; 11:1964. [PMID: 32327641 PMCID: PMC7181701 DOI: 10.1038/s41467-020-15654-6] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Accepted: 03/20/2020] [Indexed: 11/09/2022] Open
Abstract
Sex determination mechanisms often differ even between related species yet the evolution of sex chromosomes remains poorly understood in all but a few model organisms. Some nematodes such as Caenorhabditis elegans have an XO sex determination system while others, such as the filarial parasite Brugia malayi, have an XY mechanism. We present a complete B. malayi genome assembly and define Nigon elements shared with C. elegans, which we then map to the genomes of other filarial species and more distantly related nematodes. We find a remarkable plasticity in sex chromosome evolution with several distinct cases of neo-X and neo-Y formation, X-added regions, and conversion of autosomes to sex chromosomes from which we propose a model of chromosome evolution across different nematode clades. The phylum Nematoda offers a new and innovative system for gaining a deeper understanding of sex chromosome evolution.
Collapse
Affiliation(s)
- Jeremy M Foster
- Division of Protein Expression & Modification, New England Biolabs, Ipswich, MA, 01938, USA
| | - Alexandra Grote
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA
| | - John Mattick
- Institute for Genome Science, University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| | - Alan Tracey
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | | | - Matthew Chung
- Institute for Genome Science, University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| | - James A Cotton
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | | | - Adam Geber
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA
| | - Nancy Holroyd
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | | | - Yichao Li
- School of Electrical Engineering and Computer Science, Ohio University, Athens, OH, 45701, USA
| | - Silvia Libro
- Division of Protein Expression & Modification, New England Biolabs, Ipswich, MA, 01938, USA
| | - Sara Lustigman
- Laboratory of Molecular Parasitology, Lindsley F. Kimball Research Institute, New York Blood Center, New York, NY, 10065, USA
| | - Michelle L Michalski
- Department of Biology and Microbiology, University of Wisconsin Oshkosh, Oshkosh, WI, 54901, USA
| | - Michael Paulini
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Matthew B Rogers
- Department of Surgery, UPMC Children's Hospital of Pittsburgh, Pittsburgh, PA, 15224, USA
| | - Laura Teigen
- Department of Biology and Microbiology, University of Wisconsin Oshkosh, Oshkosh, WI, 54901, USA
| | - Alan Twaddle
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA
| | - Lonnie Welch
- School of Electrical Engineering and Computer Science, Ohio University, Athens, OH, 45701, USA
| | - Matthew Berriman
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Julie C Dunning Hotopp
- Institute for Genome Science, University of Maryland School of Medicine, Baltimore, MD, 21201, USA.
- Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD, 21201, USA.
- Greenebaum Cancer Center, University of Maryland School of Medicine, Baltimore, MD, 21201, USA.
| | - Elodie Ghedin
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA.
- Department of Epidemiology, School of Global Public Health, New York University, New York, NY, 10003, USA.
| |
Collapse
|
15
|
Silverman EK, Schmidt HHHW, Anastasiadou E, Altucci L, Angelini M, Badimon L, Balligand JL, Benincasa G, Capasso G, Conte F, Di Costanzo A, Farina L, Fiscon G, Gatto L, Gentili M, Loscalzo J, Marchese C, Napoli C, Paci P, Petti M, Quackenbush J, Tieri P, Viggiano D, Vilahur G, Glass K, Baumbach J. Molecular networks in Network Medicine: Development and applications. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2020; 12:e1489. [PMID: 32307915 DOI: 10.1002/wsbm.1489] [Citation(s) in RCA: 128] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2019] [Revised: 02/29/2020] [Accepted: 03/20/2020] [Indexed: 12/14/2022]
Abstract
Network Medicine applies network science approaches to investigate disease pathogenesis. Many different analytical methods have been used to infer relevant molecular networks, including protein-protein interaction networks, correlation-based networks, gene regulatory networks, and Bayesian networks. Network Medicine applies these integrated approaches to Omics Big Data (including genetics, epigenetics, transcriptomics, metabolomics, and proteomics) using computational biology tools and, thereby, has the potential to provide improvements in the diagnosis, prognosis, and treatment of complex diseases. We discuss briefly the types of molecular data that are used in molecular network analyses, survey the analytical methods for inferring molecular networks, and review efforts to validate and visualize molecular networks. Successful applications of molecular network analysis have been reported in pulmonary arterial hypertension, coronary heart disease, diabetes mellitus, chronic lung diseases, and drug development. Important knowledge gaps in Network Medicine include incompleteness of the molecular interactome, challenges in identifying key genes within genetic association regions, and limited applications to human diseases. This article is categorized under: Models of Systems Properties and Processes > Mechanistic Models Translational, Genomic, and Systems Medicine > Translational Medicine Analytical and Computational Methods > Analytical Methods Analytical and Computational Methods > Computational Methods.
Collapse
Affiliation(s)
- Edwin K Silverman
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Harald H H W Schmidt
- Department of Pharmacology and Personalized Medicine, School of Mental Health and Neuroscience, Faculty of Health, Medicine and Life Science, Maastricht University, Maastricht, The Netherlands
| | - Eleni Anastasiadou
- Department of Experimental Medicine, Sapienza University of Rome, Rome, Italy
| | - Lucia Altucci
- Department of Precision Medicine, University of Campania 'Luigi Vanvitelli', Naples, Italy
| | - Marco Angelini
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
| | - Lina Badimon
- Cardiovascular Program-ICCC, IR-Hospital de la Santa Creu i Sant Pau, CiberCV, IIB-Sant Pau, Autonomous University of Barcelona, Barcelona, Spain
| | - Jean-Luc Balligand
- Pole of Pharmacology and Therapeutics (FATH), Institute for Clinical and Experimental Research (IREC), UCLouvain, Brussels, Belgium
| | - Giuditta Benincasa
- Department of Advanced Clinical and Surgical Sciences, University of Campania "Luigi Vanvitelli", Naples, Italy
| | - Giovambattista Capasso
- Department of Translational Medical Sciences, University of Campania "L. Vanvitelli", Naples, Italy.,BIOGEM, Ariano Irpino, Italy
| | - Federica Conte
- Institute for Systems Analysis and Computer Science "Antonio Ruberti", National Research Council, Rome, Italy
| | - Antonella Di Costanzo
- Department of Precision Medicine, University of Campania 'Luigi Vanvitelli', Naples, Italy
| | - Lorenzo Farina
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
| | - Giulia Fiscon
- Institute for Systems Analysis and Computer Science "Antonio Ruberti", National Research Council, Rome, Italy
| | - Laurent Gatto
- de Duve Institute, Brussels, Belgium.,Institute for Experimental and Clinical Research (IREC), UCLouvain, Brussels, Belgium
| | - Michele Gentili
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
| | - Joseph Loscalzo
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA.,Division of Cardiovascular Medicine, Brigham and Women's Hospital, Boston, Massachusetts, USA
| | - Cinzia Marchese
- Department of Experimental Medicine, Sapienza University of Rome, Rome, Italy
| | - Claudio Napoli
- Department of Advanced Clinical and Surgical Sciences, University of Campania "Luigi Vanvitelli", Naples, Italy
| | - Paola Paci
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
| | - Manuela Petti
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
| | - John Quackenbush
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Paolo Tieri
- CNR National Research Council of Italy, IAC Institute for Applied Computing, Rome, Italy
| | - Davide Viggiano
- BIOGEM, Ariano Irpino, Italy.,Department of Medicine and Health Sciences, University of Molise, Campobasso, Italy
| | - Gemma Vilahur
- Cardiovascular Program-ICCC, IR-Hospital de la Santa Creu i Sant Pau, CiberCV, IIB-Sant Pau, Autonomous University of Barcelona, Barcelona, Spain
| | - Kimberly Glass
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Jan Baumbach
- Department of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Maximus-von-Imhof-Forum 3, Freising, Germany.,Institute of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
16
|
Deciphering eukaryotic gene-regulatory logic with 100 million random promoters. Nat Biotechnol 2019; 38:56-65. [PMID: 31792407 PMCID: PMC6954276 DOI: 10.1038/s41587-019-0315-8] [Citation(s) in RCA: 161] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Accepted: 10/16/2019] [Indexed: 11/26/2022]
Abstract
How transcription factors (TFs) interpret cis-regulatory DNA sequence to control gene expression remains unclear, largely because past studies using native and engineered sequences had insufficient scale. Here, we measure the expression output of >100 million synthetic yeast promoter sequences that are fully random. These sequences yield diverse, reproducible expression levels that can be explained by their chance inclusion of functional TF binding sites. We use machine learning to build interpretable models of transcriptional regulation that predict ~94% of the expression driven from independent test promoters and ~89% of the expression driven from native yeast promoter fragments. These models allow us to characterize each TF’s specificity, activity, and interactions with chromatin. TF activity depends on binding-site strand, position, DNA helical face and chromatin context. Notably, expression level is influenced by weak regulatory interactions, which confound designed-sequence studies. Our analyses show that massive-throughput assays of fully random DNA can provide the big data necessary to develop complex, predictive models of gene regulation. Gene expression levels in yeast are predicted using a massive dataset on promoters with random sequences.
Collapse
|
17
|
Read DF, Cook K, Lu YY, Le Roch KG, Noble WS. Predicting gene expression in the human malaria parasite Plasmodium falciparum using histone modification, nucleosome positioning, and 3D localization features. PLoS Comput Biol 2019; 15:e1007329. [PMID: 31509524 PMCID: PMC6756558 DOI: 10.1371/journal.pcbi.1007329] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2019] [Revised: 09/23/2019] [Accepted: 08/12/2019] [Indexed: 12/02/2022] Open
Abstract
Empirical evidence suggests that the malaria parasite Plasmodium falciparum employs a broad range of mechanisms to regulate gene transcription throughout the organism's complex life cycle. To better understand this regulatory machinery, we assembled a rich collection of genomic and epigenomic data sets, including information about transcription factor (TF) binding motifs, patterns of covalent histone modifications, nucleosome occupancy, GC content, and global 3D genome architecture. We used these data to train machine learning models to discriminate between high-expression and low-expression genes, focusing on three distinct stages of the red blood cell phase of the Plasmodium life cycle. Our results highlight the importance of histone modifications and 3D chromatin architecture in Plasmodium transcriptional regulation and suggest that AP2 transcription factors may play a limited regulatory role, perhaps operating in conjunction with epigenetic factors.
Collapse
Affiliation(s)
- David F. Read
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Kate Cook
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Yang Y. Lu
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Karine G. Le Roch
- Department of Molecular, Cell and Systems Biology, University of California, Riverside, California, United States of America
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| |
Collapse
|
18
|
Tonnessen BW, Bossa-Castro AM, Mauleon R, Alexandrov N, Leach JE. Shared cis-regulatory architecture identified across defense response genes is associated with broad-spectrum quantitative resistance in rice. Sci Rep 2019; 9:1536. [PMID: 30733489 PMCID: PMC6367480 DOI: 10.1038/s41598-018-38195-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Accepted: 12/18/2018] [Indexed: 12/30/2022] Open
Abstract
Plant disease resistance that is durable and effective against diverse pathogens (broad-spectrum) is essential to stabilize crop production. Such resistance is frequently controlled by Quantitative Trait Loci (QTL), and often involves differential regulation of Defense Response (DR) genes. In this study, we sought to understand how expression of DR genes is orchestrated, with the long-term goal of enabling genome-wide breeding for more effective and durable resistance. We identified short sequence motifs in rice promoters that are shared across Broad-Spectrum DR (BS-DR) genes co-expressed after challenge with three major rice pathogens (Magnaporthe oryzae, Rhizoctonia solani, and Xanthomonas oryzae pv. oryzae) and several chemical elicitors. Specific groupings of these BS-DR-associated motifs, called cis-Regulatory Modules (CRMs), are enriched in DR gene promoters, and the CRMs include cis-elements known to be involved in disease resistance. Polymorphisms in CRMs occur in promoters of genes in resistant relative to susceptible BS-DR haplotypes providing evidence that these CRMs have a predictive role in the contribution of other BS-DR genes to resistance. Therefore, we predict that a CRM signature within BS-DR gene promoters can be used as a marker for future breeding practices to enrich for the most responsive and effective BS-DR genes across the genome.
Collapse
Affiliation(s)
| | | | - Ramil Mauleon
- International Rice Research Institute, Manila, Philippines
| | | | - Jan E Leach
- Colorado State University, Fort Collins, CO, USA.
| |
Collapse
|
19
|
Hoff P, Yu C. Exact adaptive confidence intervals for linear regression coefficients. Electron J Stat 2019. [DOI: 10.1214/18-ejs1517] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
20
|
Khalili A, Vidyashankar AN. Hypothesis testing in finite mixture of regressions: Sparsity and model selection uncertainty. CAN J STAT 2018. [DOI: 10.1002/cjs.11467] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Abbas Khalili
- Department of Mathematics and Statistics; McGill University, 805 Sherbrooke Street, West Montreal; Quebec H3A 0B9 Canada
| | - Anand N. Vidyashankar
- Department of Statistics, Volgeneau School of Engineering; George Mason University, 4400 University Drive; MS 4A7, Fairfax Virginia 22030 U.S.A
| |
Collapse
|
21
|
Gao Z, Ruan J. Computational modeling of in vivo and in vitro protein-DNA interactions by multiple instance learning. Bioinformatics 2018; 33:2097-2105. [PMID: 28334224 DOI: 10.1093/bioinformatics/btx115] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2016] [Accepted: 02/27/2017] [Indexed: 12/25/2022] Open
Abstract
Motivation The study of transcriptional regulation is still difficult yet fundamental in molecular biology research. While the development of both in vivo and in vitro profiling techniques have significantly enhanced our knowledge of transcription factor (TF)-DNA interactions, computational models of TF-DNA interactions are relatively simple and may not reveal sufficient biological insight. In particular, supervised learning based models for TF-DNA interactions attempt to map sequence-level features ( k -mers) to binding event but usually ignore the location of k -mers, which can cause data fragmentation and consequently inferior model performance. Results Here, we propose a novel algorithm based on the so-called multiple-instance learning (MIL) paradigm. MIL breaks each DNA sequence into multiple overlapping subsequences and models each subsequence separately, therefore implicitly takes into consideration binding site locations, resulting in both higher accuracy and better interpretability of the models. The result from both in vivo and in vitro TF-DNA interaction data show that our approach significantly outperform conventional single-instance learning based algorithms. Importantly, the models learned from in vitro data using our approach can predict in vivo binding with very good accuracy. In addition, the location information obtained by our method provides additional insight for motif finding results from ChIP-Seq data. Finally, our approach can be easily combined with other state-of-the-art TF-DNA interaction modeling methods. Availability and Implementation http://www.cs.utsa.edu/∼jruan/MIL/. Contact jianhua.ruan@utsa.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhen Gao
- Department of Computer Science, University of Texas at San Antonio, San Antonio, TX, USA
| | - Jianhua Ruan
- Department of Computer Science, University of Texas at San Antonio, San Antonio, TX, USA
| |
Collapse
|
22
|
Dumont S, Le Pennec S, Donnart A, Teusan R, Steenman M, Chevalier C, Houlgatte R, Savagner F. Transcriptional orchestration of mitochondrial homeostasis in a cellular model of PGC-1-related coactivator-dependent thyroid tumor. Oncotarget 2018; 9:15883-15894. [PMID: 29662614 PMCID: PMC5882305 DOI: 10.18632/oncotarget.24633] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Accepted: 02/26/2018] [Indexed: 12/03/2022] Open
Abstract
The PGC-1 (Peroxisome proliferator-activated receptor Gamma Coactivator-1) family of coactivators (PGC-1α, PGC-1β, and PRC) plays a central role in the transcriptional control of mitochondrial biogenesis and oxidative phosphorylation (OXPHOS) processes. These coactivators integrate mitochondrial energy production into cell metabolism using complementary pathways. The XTC.UC1 cell line is a mitochondria-rich model of thyroid tumors whose biogenesis is almost exclusively dependent on PRC. Here we aim to propose an integrative view of the cellular pathways regulated by PRC through integration of cDNA and miRNA microarray data and chromatin immunoprecipitation results obtained from XTC.UC1 cells invalidated for PRC. This study showes that PRC induces a complex network of cellular functions interacting with at least one to five of the studied transcription factors (Estrogen Related Receptor alpha, ERR1; Nuclear-Respiratory Factors, NRF1 and NRF2; cAMP Response Element Binding, CREB; and Ying Yang, YY1). Our data confirm that ERR1 is a key partner of PRC in the regulation of mitochondrial functions and suggest a potential role of this complex in RNA processing. PRC is also involved in transcriptional regulatory complexes targeting 12 miRNAs, five of which are involved in the control of the OXPHOS process. Our findings demonstrate that the PRC coactivator can act in complex with several transcription factors and regulate miRNA expression to control the fine regulation of main metabolic functions in the cell. Therefore, in PGC-1α/β-associated pathologies, PRC, as a metabolic sensor, may ensure mitochondrial homeostasis.
Collapse
Affiliation(s)
- Solenne Dumont
- L'institut du Thorax, INSERM, CNRS, UNIV Nantes, BP 70721, 44007 NANTES Cedex 1, France
| | | | - Audrey Donnart
- L'institut du Thorax, INSERM, CNRS, UNIV Nantes, BP 70721, 44007 NANTES Cedex 1, France
| | - Raluca Teusan
- L'institut du Thorax, INSERM, CNRS, UNIV Nantes, BP 70721, 44007 NANTES Cedex 1, France
| | - Marja Steenman
- L'institut du Thorax, INSERM, CNRS, UNIV Nantes, BP 70721, 44007 NANTES Cedex 1, France
| | | | - Rémi Houlgatte
- Inserm UMR 954, Faculté de Médecine, BP 184, 54505 VANDOEUVRE-LÈS-NANCY Cedex, France
| | | |
Collapse
|
23
|
|
24
|
Song J, Bjarnason J, Surette MG. The identification of functional motifs in temporal gene expression analysis. Evol Bioinform Online 2017. [DOI: 10.1177/117693430500100008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
The identification of transcription factor binding sites is essential to the understanding of the regulation of gene expression and the reconstruction of genetic regulatory networks. The in silico identification of cis-regulatory motifs is challenging due to sequence variability and lack of sufficient data to generate consensus motifs that are of quantitative or even qualitative predictive value. To determine functional motifs in gene expression, we propose a strategy to adopt false discovery rate (FDR) and estimate motif effects to evaluate combinatorial analysis of motif candidates and temporal gene expression data. The method decreases the number of predicted motifs, which can then be confirmed by genetic analysis. To assess the method we used simulated motif/expression data to evaluate parameters. We applied this approach to experimental data for a group of iron responsive genes in Salmonella typhimurium 14028S. The method identified known and potentially new ferric-uptake regulator (Fur) binding sites. In addition, we identified uncharacterized functional motif candidates that correlated with specific patterns of expression. A SAS code for the simulation and analysis gene expression data is available from the first author upon request.
Collapse
Affiliation(s)
- Jiuzhou Song
- Department of Animal and Avian Sciences, and University of Maryland, Maryland 20742, USA
| | - Jaime Bjarnason
- Department of Microbiology and Infectious Diseases, and Department of Biochemistry and Molecular Biology, Health Sciences Centre, University of Calgary, Calgary, AB, Canada, T2N 4N1
| | - Michael G. Surette
- Department of Microbiology and Infectious Diseases, and Department of Biochemistry and Molecular Biology, Health Sciences Centre, University of Calgary, Calgary, AB, Canada, T2N 4N1
| |
Collapse
|
25
|
Zhang LQ, Li QZ, Su WX, Jin W. Predicting gene expression level by the transcription factor binding signals in human embryonic stem cells. Biosystems 2016; 150:92-98. [DOI: 10.1016/j.biosystems.2016.08.011] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2016] [Revised: 08/17/2016] [Accepted: 08/18/2016] [Indexed: 11/28/2022]
|
26
|
Brent MR. Past Roadblocks and New Opportunities in Transcription Factor Network Mapping. Trends Genet 2016; 32:736-750. [PMID: 27720190 DOI: 10.1016/j.tig.2016.08.009] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2016] [Revised: 08/12/2016] [Accepted: 08/16/2016] [Indexed: 12/11/2022]
Abstract
One of the principal mechanisms by which cells differentiate and respond to changes in external signals or conditions is by changing the activity levels of transcription factors (TFs). This changes the transcription rates of target genes via the cell's TF network, which ultimately contributes to reconfiguring cellular state. Since microarrays provided our first window into global cellular state, computational biologists have eagerly attacked the problem of mapping TF networks, a key part of the cell's control circuitry. In retrospect, however, steady-state mRNA abundance levels were a poor substitute for TF activity levels and gene transcription rates. Likewise, mapping TF binding through chromatin immunoprecipitation proved less predictive of functional regulation and less amenable to systematic elucidation of complete networks than originally hoped. This review explains these roadblocks and the current, unprecedented blossoming of new experimental techniques built on second-generation sequencing, which hold out the promise of rapid progress in TF network mapping.
Collapse
Affiliation(s)
- Michael R Brent
- Departments of Computer Science and Genetics and Center for Genome Sciences and Systems Biology, Washington University, , Saint Louis, MO, USA.
| |
Collapse
|
27
|
Affiliation(s)
- Zhen Pang
- Department of Applied Mathematics; Hong Kong Polytechnic University; Hong Kong
| | - Bingqing Lin
- College of Mathematics and Computational Science; Shenzhen University; China
| | - Jiming Jiang
- Department of Statistics; University of California; Davis CA USA
| |
Collapse
|
28
|
Zhang J. Screening and clustering of sparse regressions with finite non-Gaussian mixtures. Biometrics 2016; 73:540-550. [PMID: 27599032 DOI: 10.1111/biom.12585] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2015] [Revised: 07/01/2016] [Accepted: 08/01/2016] [Indexed: 11/29/2022]
Abstract
This article proposes a method to address the problem that can arise when covariates in a regression setting are not Gaussian, which may give rise to approximately mixture-distributed errors, or when a true mixture of regressions produced the data. The method begins with non-Gaussian mixture-based marginal variable screening, followed by fitting a full but relatively smaller mixture regression model to the selected data with help of a new penalization scheme. Under certain regularity conditions, the new screening procedure is shown to possess a sure screening property even when the population is heterogeneous. We further prove that there exists an elbow point in the associated scree plot which results in a consistent estimator of the set of active covariates in the model. By simulations, we demonstrate that the new procedure can substantially improve the performance of the existing procedures in the content of variable screening and data clustering. By applying the proposed procedure to motif data analysis in molecular biology, we demonstrate that the new method holds promise in practice.
Collapse
Affiliation(s)
- Jian Zhang
- School of Mathematics, Statistics and Actuarial Science, University of Kent, Canterbury, Kent CT2 7NF, UK
| |
Collapse
|
29
|
Mandozzi J, Bühlmann P. Hierarchical Testing in the High-Dimensional Setting With Correlated Variables. J Am Stat Assoc 2016. [DOI: 10.1080/01621459.2015.1007209] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
30
|
Topol A, Zhu S, Hartley BJ, English J, Hauberg ME, Tran N, Rittenhouse CA, Simone A, Ruderfer DM, Johnson J, Readhead B, Hadas Y, Gochman PA, Wang YC, Shah H, Cagney G, Rapoport J, Gage FH, Dudley JT, Sklar P, Mattheisen M, Cotter D, Fang G, Brennand KJ. Dysregulation of miRNA-9 in a Subset of Schizophrenia Patient-Derived Neural Progenitor Cells. Cell Rep 2016; 15:1024-1036. [PMID: 27117414 PMCID: PMC4856588 DOI: 10.1016/j.celrep.2016.03.090] [Citation(s) in RCA: 99] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2015] [Revised: 03/02/2016] [Accepted: 03/27/2016] [Indexed: 01/19/2023] Open
Abstract
Converging evidence indicates that microRNAs (miRNAs) may contribute to disease risk for schizophrenia (SZ). We show that microRNA-9 (miR-9) is abundantly expressed in control neural progenitor cells (NPCs) but also significantly downregulated in a subset of SZ NPCs. We observed a strong correlation between miR-9 expression and miR-9 regulatory activity in NPCs as well as between miR-9 levels/activity, neural migration, and diagnosis. Overexpression of miR-9 was sufficient to ameliorate a previously reported neural migration deficit in SZ NPCs, whereas knockdown partially phenocopied aberrant migration in control NPCs. Unexpectedly, proteomic- and RNA sequencing (RNA-seq)-based analysis revealed that these effects were mediated primarily by small changes in expression of indirect miR-9 targets rather than large changes in direct miR-9 targets; these indirect targets are enriched for migration-associated genes. Together, these data indicate that aberrant levels and activity of miR-9 may be one of the many factors that contribute to SZ risk, at least in a subset of patients.
Collapse
Affiliation(s)
- Aaron Topol
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, 1425 Madison Avenue, New York, NY 10029, USA
| | - Shijia Zhu
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, 1425 Madison Avenue, New York, NY 10029, USA
| | - Brigham J Hartley
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, 1425 Madison Avenue, New York, NY 10029, USA; Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, 1425 Madison Avenue, New York, NY 10029, USA
| | - Jane English
- Department of Psychiatry, Royal College of Surgeons in Ireland, Beaumont Hospital, Beaumont, Dublin 9, Ireland
| | - Mads E Hauberg
- Department of Biomedicine and Centre for Integrative Sequencing (iSEQ), Aarhus University, Wilhelm Meyers Allé 4, Aarhus 8000 C, Denmark; The Lundbeck Foundation Initiative for Integrative Psychiatric Research (iPSYCH), Aarhus University, Wilhelm Meyers Allé 4, Aarhus 8000 C, Denmark
| | - Ngoc Tran
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, 1425 Madison Avenue, New York, NY 10029, USA
| | - Chelsea Ann Rittenhouse
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, 1425 Madison Avenue, New York, NY 10029, USA
| | - Anthony Simone
- Laboratory of Genetics, Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA 92037, USA
| | - Douglas M Ruderfer
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, 1425 Madison Avenue, New York, NY 10029, USA
| | - Jessica Johnson
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, 1425 Madison Avenue, New York, NY 10029, USA
| | - Ben Readhead
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, 1425 Madison Avenue, New York, NY 10029, USA
| | - Yoav Hadas
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, 1425 Madison Avenue, New York, NY 10029, USA
| | - Peter A Gochman
- Childhood Psychiatry Branch, National Institute of Mental Health, National Institutes of Health, Bethesda, MD 20892, USA
| | - Ying-Chih Wang
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, 1425 Madison Avenue, New York, NY 10029, USA
| | - Hardik Shah
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, 1425 Madison Avenue, New York, NY 10029, USA
| | - Gerard Cagney
- School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland
| | - Judith Rapoport
- Childhood Psychiatry Branch, National Institute of Mental Health, National Institutes of Health, Bethesda, MD 20892, USA
| | - Fred H Gage
- Laboratory of Genetics, Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA 92037, USA
| | - Joel T Dudley
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, 1425 Madison Avenue, New York, NY 10029, USA
| | - Pamela Sklar
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, 1425 Madison Avenue, New York, NY 10029, USA
| | - Manuel Mattheisen
- Department of Biomedicine and Centre for Integrative Sequencing (iSEQ), Aarhus University, Wilhelm Meyers Allé 4, Aarhus 8000 C, Denmark; The Lundbeck Foundation Initiative for Integrative Psychiatric Research (iPSYCH), Aarhus University, Wilhelm Meyers Allé 4, Aarhus 8000 C, Denmark
| | - David Cotter
- Department of Psychiatry, Royal College of Surgeons in Ireland, Beaumont Hospital, Beaumont, Dublin 9, Ireland
| | - Gang Fang
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, 1425 Madison Avenue, New York, NY 10029, USA.
| | - Kristen J Brennand
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, 1425 Madison Avenue, New York, NY 10029, USA; Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, 1425 Madison Avenue, New York, NY 10029, USA.
| |
Collapse
|
31
|
Single-Cell Co-expression Analysis Reveals Distinct Functional Modules, Co-regulation Mechanisms and Clinical Outcomes. PLoS Comput Biol 2016; 12:e1004892. [PMID: 27100869 PMCID: PMC4839722 DOI: 10.1371/journal.pcbi.1004892] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2016] [Accepted: 03/31/2016] [Indexed: 12/12/2022] Open
Abstract
Co-expression analysis has been employed to predict gene function, identify functional modules, and determine tumor subtypes. Previous co-expression analysis was mainly conducted at bulk tissue level. It is unclear whether co-expression analysis at the single-cell level will provide novel insights into transcriptional regulation. Here we developed a computational approach to compare glioblastoma expression profiles at the single-cell level with those obtained from bulk tumors. We found that the co-expressed genes observed in single cells and bulk tumors have little overlap and show distinct characteristics. The co-expressed genes identified in bulk tumors tend to have similar biological functions, and are enriched for intrachromosomal interactions with synchronized promoter activity. In contrast, single-cell co-expressed genes are enriched for known protein-protein interactions, and are regulated through interchromosomal interactions. Moreover, gene members of some protein complexes are co-expressed only at the bulk level, while those of other complexes are co-expressed at both single-cell and bulk levels. Finally, we identified a set of co-expressed genes that can predict the survival of glioblastoma patients. Our study highlights that comparative analyses of single-cell and bulk gene expression profiles enable us to identify functional modules that are regulated at different levels and hold great translational potential.
Collapse
|
32
|
Tuncbag N, Gosline SJC, Kedaigle A, Soltis AR, Gitter A, Fraenkel E. Network-Based Interpretation of Diverse High-Throughput Datasets through the Omics Integrator Software Package. PLoS Comput Biol 2016; 12:e1004879. [PMID: 27096930 PMCID: PMC4838263 DOI: 10.1371/journal.pcbi.1004879] [Citation(s) in RCA: 114] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2015] [Accepted: 03/23/2016] [Indexed: 02/07/2023] Open
Abstract
High-throughput, ‘omic’ methods provide sensitive measures of biological responses to perturbations. However, inherent biases in high-throughput assays make it difficult to interpret experiments in which more than one type of data is collected. In this work, we introduce Omics Integrator, a software package that takes a variety of ‘omic’ data as input and identifies putative underlying molecular pathways. The approach applies advanced network optimization algorithms to a network of thousands of molecular interactions to find high-confidence, interpretable subnetworks that best explain the data. These subnetworks connect changes observed in gene expression, protein abundance or other global assays to proteins that may not have been measured in the screens due to inherent bias or noise in measurement. This approach reveals unannotated molecular pathways that would not be detectable by searching pathway databases. Omics Integrator also provides an elegant framework to incorporate not only positive data, but also negative evidence. Incorporating negative evidence allows Omics Integrator to avoid unexpressed genes and avoid being biased toward highly-studied hub proteins, except when they are strongly implicated by the data. The software is comprised of two individual tools, Garnet and Forest, that can be run together or independently to allow a user to perform advanced integration of multiple types of high-throughput data as well as create condition-specific subnetworks of protein interactions that best connect the observed changes in various datasets. It is available at http://fraenkel.mit.edu/omicsintegrator and on GitHub at https://github.com/fraenkel-lab/OmicsIntegrator.
Collapse
Affiliation(s)
- Nurcan Tuncbag
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Sara J. C. Gosline
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Amanda Kedaigle
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Anthony R. Soltis
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Anthony Gitter
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Ernest Fraenkel
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
33
|
Maynou J, Pairó E, Marco S, Perera A. Sequence information gain based motif analysis. BMC Bioinformatics 2015; 16:377. [PMID: 26553056 PMCID: PMC4640167 DOI: 10.1186/s12859-015-0811-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2014] [Accepted: 10/30/2015] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND The detection of regulatory regions in candidate sequences is essential for the understanding of the regulation of a particular gene and the mechanisms involved. This paper proposes a novel methodology based on information theoretic metrics for finding regulatory sequences in promoter regions. RESULTS This methodology (SIGMA) has been tested on genomic sequence data for Homo sapiens and Mus musculus. SIGMA has been compared with different publicly available alternatives for motif detection, such as MEME/MAST, Biostrings (Bioconductor package), MotifRegressor, and previous work such Qresiduals projections or information theoretic based detectors. Comparative results, in the form of Receiver Operating Characteristic curves, show how, in 70% of the studied Transcription Factor Binding Sites, the SIGMA detector has a better performance and behaves more robustly than the methods compared, while having a similar computational time. The performance of SIGMA can be explained by its parametric simplicity in the modelling of the non-linear co-variability in the binding motif positions. CONCLUSIONS Sequence Information Gain based Motif Analysis is a generalisation of a non-linear model of the cis-regulatory sequences detection based on Information Theory. This generalisation allows us to detect transcription factor binding sites with maximum performance disregarding the covariability observed in the positions of the training set of sequences. SIGMA is freely available to the public at http://b2slab.upc.edu.
Collapse
Affiliation(s)
- Joan Maynou
- Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial, Universitat Politècnica de Catalunya, Pau Gargallo, 5, Barcelona, 08028, Spain.
- CIBER de Bioingeniería, Biomateriales y Biomedicina, Spain.
| | - Erola Pairó
- Institute for BioEngineering of Catalonia, balidiri Reixach 4-6, Barcelona, 08028, Spain.
- Electronics Department in the University of Barcelona (UB), Martí i Franquès, 1, Barcelona, 08028, Spain.
| | - Santiago Marco
- Institute for BioEngineering of Catalonia, balidiri Reixach 4-6, Barcelona, 08028, Spain.
- Electronics Department in the University of Barcelona (UB), Martí i Franquès, 1, Barcelona, 08028, Spain.
| | - Alexandre Perera
- Departament d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial, Universitat Politècnica de Catalunya, Pau Gargallo, 5, Barcelona, 08028, Spain.
- CIBER de Bioingeniería, Biomateriales y Biomedicina, Spain.
| |
Collapse
|
34
|
Understanding Transcription Factor Regulation by Integrating Gene Expression and DNase I Hypersensitive Sites. BIOMED RESEARCH INTERNATIONAL 2015; 2015:757530. [PMID: 26425553 PMCID: PMC4573618 DOI: 10.1155/2015/757530] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Accepted: 04/16/2015] [Indexed: 12/20/2022]
Abstract
Transcription factors are proteins that bind to DNA sequences to regulate gene transcription. The transcription factor binding sites are short DNA sequences (5-20 bp long) specifically bound by one or more transcription factors. The identification of transcription factor binding sites and prediction of their function continue to be challenging problems in computational biology. In this study, by integrating the DNase I hypersensitive sites with known position weight matrices in the TRANSFAC database, the transcription factor binding sites in gene regulatory region are identified. Based on the global gene expression patterns in cervical cancer HeLaS3 cell and HelaS3-ifnα4h cell (interferon treatment on HeLaS3 cell for 4 hours), we present a model-based computational approach to predict a set of transcription factors that potentially cause such differential gene expression. Significantly, 6 out 10 predicted functional factors, including IRF, IRF-2, IRF-9, IRF-1 and IRF-3, ICSBP, belong to interferon regulatory factor family and upregulate the gene expression levels responding to the interferon treatment. Another factor, ISGF-3, is also a transcriptional activator induced by interferon alpha. Using the different transcription factor binding sites selected criteria, the prediction result of our model is consistent. Our model demonstrated the potential to computationally identify the functional transcription factors in gene regulation.
Collapse
|
35
|
Xiao N, Xu QS. Multi-step adaptive elastic-net: reducing false positives in high-dimensional variable selection. J STAT COMPUT SIM 2015. [DOI: 10.1080/00949655.2015.1016944] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
36
|
Repunte-Canonigo V, Shin W, Vendruscolo LF, Lefebvre C, van der Stap L, Kawamura T, Schlosburg JE, Alvarez M, Koob GF, Califano A, Sanna PP. Identifying candidate drivers of alcohol dependence-induced excessive drinking by assembly and interrogation of brain-specific regulatory networks. Genome Biol 2015; 16:68. [PMID: 25886852 PMCID: PMC4410476 DOI: 10.1186/s13059-015-0593-5] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2014] [Accepted: 01/21/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A systems biology approach based on the assembly and interrogation of gene regulatory networks, or interactomes, was used to study neuroadaptation processes associated with the transition to alcohol dependence at the molecular level. RESULTS Using a rat model of dependent and non-dependent alcohol self-administration, we reverse engineered a global transcriptional regulatory network during protracted abstinence, a period when relapse rates are highest. We then interrogated the network to identify master regulator genes that mechanistically regulate brain region-specific signatures associated with dependent and non-dependent alcohol self-administration. Among these, the gene coding for the glucocorticoid receptor was independently identified as a master regulator in multiple brain regions, including the medial prefrontal cortex, nucleus accumbens, central nucleus of the amygdala, and ventral tegmental area, consistent with the view that brain reward and stress systems are dysregulated during protracted abstinence. Administration of the glucocorticoid antagonist mifepristone in either the nucleus accumbens or ventral tegmental area selectively decreased dependent, excessive, alcohol self-administration in rats but had no effect on non-dependent, moderate, alcohol self-administration. CONCLUSIONS Our study suggests that assembly and analysis of regulatory networks is an effective strategy for the identification of key regulators of long-term neuroplastic changes within specific brain regions that play a functional role in alcohol dependence. More specifically, our results support a key role for regulatory networks downstream of the glucocorticoid receptor in excessive alcohol drinking during protracted alcohol abstinence.
Collapse
Affiliation(s)
- Vez Repunte-Canonigo
- Molecular and Integrative Neuroscience Department, The Scripps Research Institute, La Jolla, CA, USA.
| | - William Shin
- Department of Biological Sciences, Columbia University, New York, NY, 10027, USA. .,Department of Systems Biology, Herbert Irving Comprehensive Cancer Center, Columbia University, New York, NY, 10032, USA.
| | - Leandro F Vendruscolo
- Committee for the Neurobiology of Addictive Disorders, The Scripps Research Institute, La Jolla, CA, USA. .,Current affiliation: Intramural Research Program, NIDA-NIH, Baltimore, MD, 21224, USA.
| | - Celine Lefebvre
- Department of Systems Biology, Herbert Irving Comprehensive Cancer Center, Columbia University, New York, NY, 10032, USA. .,Current affiliation: Inserm Unit U981, Gustave Roussy Institute, Villejuif, France.
| | - Lena van der Stap
- Molecular and Integrative Neuroscience Department, The Scripps Research Institute, La Jolla, CA, USA.
| | - Tomoya Kawamura
- Molecular and Integrative Neuroscience Department, The Scripps Research Institute, La Jolla, CA, USA.
| | - Joel E Schlosburg
- Committee for the Neurobiology of Addictive Disorders, The Scripps Research Institute, La Jolla, CA, USA.
| | - Mariano Alvarez
- Department of Systems Biology, Herbert Irving Comprehensive Cancer Center, Columbia University, New York, NY, 10032, USA.
| | - George F Koob
- Committee for the Neurobiology of Addictive Disorders, The Scripps Research Institute, La Jolla, CA, USA. .,Current affiliation: National Institute on Alcohol Abuse and Alcoholism, Rockville, MD, 20852, USA.
| | - Andrea Califano
- Department of Systems Biology, Herbert Irving Comprehensive Cancer Center, Columbia University, New York, NY, 10032, USA. .,Department of Biomedical Informatics, Herbert Irving Comprehensive Cancer Center, Columbia University, New York, NY, 10032, USA. .,Institute for Cancer Genetics, Herbert Irving Comprehensive Cancer Center, Columbia University, New York, NY, 10032, USA. .,Department of Biochemistry and Molecular Biophysics, Hammer Health Sciences Center, Columbia University, New York, NY, 10032, USA. .,Cancer Regulatory Network Program, Herbert Irving Comprehensive Cancer Center, Columbia University, New York, NY, 10032, USA. .,The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA, 92037, USA.
| | - Pietro Paolo Sanna
- Molecular and Integrative Neuroscience Department, The Scripps Research Institute, La Jolla, CA, USA.
| |
Collapse
|
37
|
Abstract
Genome-wide transcription factor (TF) binding profiles differ dramatically between cell types. However, not much is known about the relationship between cell-type-specific binding patterns and gene expression. A recent study demonstrated how the same TFs can have functional roles when binding to largely non-overlapping genomic regions in hematopoietic progenitor and mast cells. Cell-type specific binding profiles of shared TFs are therefore not merely the consequence of opportunistic and functionally irrelevant binding to accessible chromatin, but instead have the potential to make meaningful contributions to cell-type specific transcriptional programs.
Collapse
Affiliation(s)
- Felicia S L Ng
- a Department of Haematology; Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical Research ; Cambridge University ; Cambridge , UK
| | | | | |
Collapse
|
38
|
Budden DM, Hurley DG, Crampin EJ. Predictive modelling of gene expression from transcriptional regulatory elements. Brief Bioinform 2014; 16:616-28. [PMID: 25231769 DOI: 10.1093/bib/bbu034] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2014] [Accepted: 08/20/2014] [Indexed: 12/15/2022] Open
Abstract
Predictive modelling of gene expression provides a powerful framework for exploring the regulatory logic underpinning transcriptional regulation. Recent studies have demonstrated the utility of such models in identifying dysregulation of gene and miRNA expression associated with abnormal patterns of transcription factor (TF) binding or nucleosomal histone modifications (HMs). Despite the growing popularity of such approaches, a comparative review of the various modelling algorithms and feature extraction methods is lacking. We define and compare three methods of quantifying pairwise gene-TF/HM interactions and discuss their suitability for integrating the heterogeneous chromatin immunoprecipitation (ChIP)-seq binding patterns exhibited by TFs and HMs. We then construct log-linear and ϵ-support vector regression models from various mouse embryonic stem cell (mESC) and human lymphoblastoid (GM12878) data sets, considering both ChIP-seq- and position weight matrix- (PWM)-derived in silico TF-binding. The two algorithms are evaluated both in terms of their modelling prediction accuracy and ability to identify the established regulatory roles of individual TFs and HMs. Our results demonstrate that TF-binding and HMs are highly predictive of gene expression as measured by mRNA transcript abundance, irrespective of algorithm or cell type selection and considering both ChIP-seq and PWM-derived TF-binding. As we encourage other researchers to explore and develop these results, our framework is implemented using open-source software and made available as a preconfigured bootable virtual environment.
Collapse
|
39
|
Ho YY, Baechler EC, Ortmann W, Behrens TW, Graham RR, Bhangale TR, Pan W. Using gene expression to improve the power of genome-wide association analysis. Hum Hered 2014; 78:94-103. [PMID: 25096029 PMCID: PMC4152945 DOI: 10.1159/000362837] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2013] [Accepted: 04/14/2014] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND/AIMS Genome-wide association (GWA) studies have reported susceptible regions in the human genome for many common diseases and traits; however, these loci only explain a minority of trait heritability. To boost the power of a GWA study, substantial research endeavors have been focused on integrating other available genomic information in the analysis. Advances in high through-put technologies have generated a wealth of genomic data and made combining SNP and gene expression data become feasible. RESULTS In this paper, we propose a novel procedure to incorporate gene expression information into GWA analysis. This procedure utilizes weights constructed by gene expression measurements to adjust p values from a GWA analysis. RESULTS from simulation analyses indicate that the proposed procedures may achieve substantial power gains, while controlling family-wise type I error rates at the nominal level. To demonstrate the implementation of our proposed approach, we apply the weight adjustment procedure to a GWA study on serum interferon-regulated chemokine levels in systemic lupus erythematosus patients. The study results can provide valuable insights for the functional interpretation of GWA signals. AVAILABILITY The R source code for implementing the proposed weighting procedure is available at http://www.biostat.umn.edu/∼yho/research.html.
Collapse
Affiliation(s)
- Yen-Yi Ho
- Division of Biostatistics, University of Minnesota
| | | | | | | | | | | | - Wei Pan
- Division of Biostatistics, University of Minnesota
| |
Collapse
|
40
|
Zhang D, Wang G, Wang Y. Transcriptional regulation prediction of antiestrogen resistance in breast cancer based on RNA polymerase II binding data. BMC Bioinformatics 2014; 15 Suppl 2:S10. [PMID: 24564526 PMCID: PMC4015922 DOI: 10.1186/1471-2105-15-s2-s10] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Background Although endocrine therapy impedes estrogen-ER signaling pathway and thus reduces breast cancer mortality, patients remain at continued risk of relapse after tamoxifen or other endocrine therapies. Understanding the mechanisms of endocrine resistance, particularly the role of transcriptional regulation is very important and necessary. Methods We propose a two-step workflow based on linear model to investigate the significant differences between MCF7 and OHT cells stimulated by 17β-estradiol (E2) respect to regulatory transcription factors (TFs) and their interactions. We additionally compared predicted regulatory TFs based on RNA polymerase II (PolII) binding quantity data and gene expression data, which were taken from MCF7/MCF7+E2 and OHT/OHT+E2 cell lines following the same analysis workflow. Enrichment analysis concerning diseases and cell functions and regulatory pattern analysis of different motifs of the same TF also were performed. Results The results showed PolII data could provide more information and predict more recognizably important regulatory TFs. Large differences in TF regulatory mode were found between two cell lines. Through verified through GO annotation, enrichment analysis and related literature regarding these TFs, we found some regulatory TFs such as AP-1, C/EBP, FoxA1, GATA1, Oct-1 and NF-κB, maintained OHT cells through molecular interactions or signaling pathways that were different from the surviving MCF7 cells. From TF regulatory interaction network, we identified E2F, E2F-1 and AP-2 as hub-TFs in MCF7 cells; whereas, in addition to E2F and E2F-1, we identified C/EBP and Oct-1 as hub-TFs in OHT cells. Notably, we found the regulatory patterns of different motifs of the same TF were very different from one another sometimes. Conclusions We inferred some regulatory TFs, such as AP-1 and NF-κB, cooperated with ER through both genomic action and non-genomic action. The TFs that were involved in both protein-protein interactions and signaling pathways could be one of the key resistant mechanisms of endocrine therapy and thus also could be new treatment targets for endocrine resistance. Our flexible workflow could be integrated into an existing analytical framework and guide biologists to further determine underlying mechanisms in human diseases.
Collapse
|
41
|
Discovering transcription and splicing networks in myelodysplastic syndromes. PLoS One 2013; 8:e79118. [PMID: 24244432 PMCID: PMC3828332 DOI: 10.1371/journal.pone.0079118] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2013] [Accepted: 09/17/2013] [Indexed: 11/19/2022] Open
Abstract
More and more transcription factors and their motifs have been reported and linked to specific gene expression levels. However, focusing only on transcription is not sufficient for mechanism research. Most genes, especially in eukaryotes, are alternatively spliced to different isoforms. Some of these isoforms increase the biodiversity of proteins. From this viewpoint, transcription and splicing are two of important mechanisms to modulate expression levels of isoforms. To integrate these two kinds of regulation, we built a linear regression model to select a subset of transcription factors and splicing factors for each co-expressed isoforms using least-angle regression approach. Then, we applied this method to investigate the mechanism of myelodysplastic syndromes (MDS), a precursor lesion of acute myeloid leukemia. Results suggested that expression levels of most isoforms were regulated by a set of selected regulatory factors. Some of the detected factors, such as EGR1 and STAT family, are highly correlated with progression of MDS. We discovered that the splicing factor SRSF11 experienced alternative splicing switch, and in turn induced different amino acid sequences between MDS and controls. This splicing switch causes two different splicing mechanisms. Polymerase Chain Reaction experiments also confirmed that one of its isoforms was over-expressed in MDS. We analyzed the regulatory networks constructed from the co-expressed isoforms and their regulatory factors in MDS. Many of these networks were enriched in the herpes simplex infection pathway which involves many splicing factors, and pathways in cancers and acute or chronic myeloid leukemia.
Collapse
|
42
|
Glass K, Huttenhower C, Quackenbush J, Yuan GC. Passing messages between biological networks to refine predicted interactions. PLoS One 2013; 8:e64832. [PMID: 23741402 PMCID: PMC3669401 DOI: 10.1371/journal.pone.0064832] [Citation(s) in RCA: 142] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2012] [Accepted: 04/17/2013] [Indexed: 01/10/2023] Open
Abstract
Regulatory network reconstruction is a fundamental problem in computational biology. There are significant limitations to such reconstruction using individual datasets, and increasingly people attempt to construct networks using multiple, independent datasets obtained from complementary sources, but methods for this integration are lacking. We developed PANDA (Passing Attributes between Networks for Data Assimilation), a message-passing model using multiple sources of information to predict regulatory relationships, and used it to integrate protein-protein interaction, gene expression, and sequence motif data to reconstruct genome-wide, condition-specific regulatory networks in yeast as a model. The resulting networks were not only more accurate than those produced using individual data sets and other existing methods, but they also captured information regarding specific biological mechanisms and pathways that were missed using other methodologies. PANDA is scalable to higher eukaryotes, applicable to specific tissue or cell type data and conceptually generalizable to include a variety of regulatory, interaction, expression, and other genome-scale data. An implementation of the PANDA algorithm is available at www.sourceforge.net/projects/panda-net.
Collapse
Affiliation(s)
- Kimberly Glass
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - John Quackenbush
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - Guo-Cheng Yuan
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
43
|
Rowe HM, Kapopoulou A, Corsinotti A, Fasching L, Macfarlan TS, Tarabay Y, Viville S, Jakobsson J, Pfaff SL, Trono D. TRIM28 repression of retrotransposon-based enhancers is necessary to preserve transcriptional dynamics in embryonic stem cells. Genome Res 2013; 23:452-61. [PMID: 23233547 PMCID: PMC3589534 DOI: 10.1101/gr.147678.112] [Citation(s) in RCA: 122] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2012] [Accepted: 12/06/2012] [Indexed: 02/03/2023]
Abstract
TRIM28 is critical for the silencing of endogenous retroviruses (ERVs) in embryonic stem (ES) cells. Here, we reveal that an essential impact of this process is the protection of cellular gene expression in early embryos from perturbation by cis-acting activators contained within these retroelements. In TRIM28-depleted ES cells, repressive chromatin marks at ERVs are replaced by histone modifications typical of active enhancers, stimulating transcription of nearby cellular genes, notably those harboring bivalent promoters. Correspondingly, ERV-derived sequences can repress or enhance expression from an adjacent promoter in transgenic embryos depending on their TRIM28 sensitivity in ES cells. TRIM28-mediated control of ERVs is therefore crucial not just to prevent retrotransposition, but more broadly to safeguard the transcriptional dynamics of early embryos.
Collapse
Affiliation(s)
- Helen M. Rowe
- School of Life Sciences and Frontiers in Genetics Program, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
| | - Adamandia Kapopoulou
- School of Life Sciences and Frontiers in Genetics Program, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
- Swiss Bioinformatics Institute, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
| | - Andrea Corsinotti
- School of Life Sciences and Frontiers in Genetics Program, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
| | - Liana Fasching
- Wallenberg Neuroscience Center, Lund University, BMC A11, 221 84 Lund, Sweden
| | - Todd S. Macfarlan
- Gene Expression Laboratory and the Howard Hughes Medical Institute, The Salk Institute for Biological Studies, La Jolla, California 92037, USA
| | - Yara Tarabay
- Institute of Genetics and Molecular and Cellular Biology (IGBMC), University of Strasbourg, BP10142, Illkirch Cedex, France
| | - Stéphane Viville
- Institute of Genetics and Molecular and Cellular Biology (IGBMC), University of Strasbourg, BP10142, Illkirch Cedex, France
| | - Johan Jakobsson
- Wallenberg Neuroscience Center, Lund University, BMC A11, 221 84 Lund, Sweden
| | - Samuel L. Pfaff
- Gene Expression Laboratory and the Howard Hughes Medical Institute, The Salk Institute for Biological Studies, La Jolla, California 92037, USA
| | - Didier Trono
- School of Life Sciences and Frontiers in Genetics Program, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
| |
Collapse
|
44
|
Vangala RK, Ravindran V, Ghatge M, Shanker J, Arvind P, Bindu H, Shekar M, Rao VS. Integrative bioinformatics analysis of genomic and proteomic approaches to understand the transcriptional regulatory program in coronary artery disease pathways. PLoS One 2013; 8:e57193. [PMID: 23468932 PMCID: PMC3585295 DOI: 10.1371/journal.pone.0057193] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2012] [Accepted: 01/18/2013] [Indexed: 11/19/2022] Open
Abstract
Patients with cardiovascular disease show a panel of differentially regulated serum biomarkers indicative of modulation of several pathways from disease onset to progression. Few of these biomarkers have been proposed for multimarker risk prediction methods. However, the underlying mechanism of the expression changes and modulation of the pathways is not yet addressed in entirety. Our present work focuses on understanding the regulatory mechanisms at transcriptional level by identifying the core and specific transcription factors that regulate the coronary artery disease associated pathways. Using the principles of systems biology we integrated the genomics and proteomics data with computational tools. We selected biomarkers from 7 different pathways based on their association with the disease and assayed 24 biomarkers along with gene expression studies and built network modules which are highly regulated by 5 core regulators PPARG, EGR1, ETV1, KLF7 and ESRRA. These network modules in turn comprise of biomarkers from different pathways showing that the core regulatory transcription factors may work together in differential regulation of several pathways potentially leading to the disease. This kind of analysis can enhance the elucidation of mechanisms in the disease and give better strategies of developing multimarker module based risk predictions.
Collapse
Affiliation(s)
- Rajani Kanth Vangala
- Tata Proteomics and Coagulation Department, Thrombosis Research Institute, Bangalore, Karnataka, India.
| | | | | | | | | | | | | | | |
Collapse
|
45
|
Wen J, Chen Z, Cai X. A biophysical model for identifying splicing regulatory elements and their interactions. PLoS One 2013; 8:e54885. [PMID: 23382993 PMCID: PMC3559881 DOI: 10.1371/journal.pone.0054885] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2012] [Accepted: 12/17/2012] [Indexed: 11/18/2022] Open
Abstract
Alternative splicing (AS) of precursor mRNA (pre-mRNA) is a crucial step in the expression of most eukaryotic genes. Splicing factors (SFs) play an important role in AS regulation by binding to the cis-regulatory elements on the pre-mRNA. Although many splicing factors (SFs) and their binding sites have been identified, their combinatorial regulatory effects remain to be elucidated. In this paper, we derive a biophysical model for AS regulation that integrates combinatorial signals of cis-acting splicing regulatory elements (SREs) and their interactions. We also develop a systematic framework for model inference. Applying the biophysical model to a human RNA-Seq data set, we demonstrate that our model can explain 49.1%–66.5% variance of the data, which is comparable to the best result achieved by biophysical models for transcription. In total, we identified 119 SRE pairs between different regions of cassette exons that may regulate exon or intron definition in splicing, and 77 SRE pairs from the same region that may arise from a long motif or two different SREs bound by different SFs. Particularly, putative binding sites of polypyrimidine tract-binding protein (PTB), heterogeneous nuclear ribonucleoprotein (hnRNP) F/H and E/K are identified as interacting SRE pairs, and have been shown to be consistent with the interaction models proposed in previous experimental results. These results show that our biophysical model and inference method provide a means of quantitative modeling of splicing regulation and is a useful tool for identifying SREs and their interactions. The software package for model inference is available under an open source license.
Collapse
Affiliation(s)
- Ji Wen
- Department of Electrical and Computer Engineering, University of Miami, Coral Gables, Florida, United States of America
| | - Zhibin Chen
- Department of Microbiology and Immunology, University of Miami, Miami, Florida, United States of America
| | - Xiaodong Cai
- Department of Electrical and Computer Engineering, University of Miami, Coral Gables, Florida, United States of America
- * E-mail:
| |
Collapse
|
46
|
Cheng C, Alexander R, Min R, Leng J, Yip KY, Rozowsky J, Yan KK, Dong X, Djebali S, Ruan Y, Davis CA, Carninci P, Lassman T, Gingeras TR, Guigó R, Birney E, Weng Z, Snyder M, Gerstein M. Understanding transcriptional regulation by integrative analysis of transcription factor binding data. Genome Res 2013; 22:1658-67. [PMID: 22955978 PMCID: PMC3431483 DOI: 10.1101/gr.136838.111] [Citation(s) in RCA: 140] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Statistical models have been used to quantify the relationship between gene expression and transcription factor (TF) binding signals. Here we apply the models to the large-scale data generated by the ENCODE project to study transcriptional regulation by TFs. Our results reveal a notable difference in the prediction accuracy of expression levels of transcription start sites (TSSs) captured by different technologies and RNA extraction protocols. In general, the expression levels of TSSs with high CpG content are more predictable than those with low CpG content. For genes with alternative TSSs, the expression levels of downstream TSSs are more predictable than those of the upstream ones. Different TF categories and specific TFs vary substantially in their contributions to predicting expression. Between two cell lines, the differential expression of TSS can be precisely reflected by the difference of TF-binding signals in a quantitative manner, arguing against the conventional on-and-off model of TF binding. Finally, we explore the relationships between TF-binding signals and other chromatin features such as histone modifications and DNase hypersensitivity for determining expression. The models imply that these features regulate transcription in a highly coordinated manner.
Collapse
Affiliation(s)
- Chao Cheng
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Bayarsaihan D, Makeyev AV, Enkhmandakh B. Epigenetic modulation by TFII-I during embryonic stem cell differentiation. J Cell Biochem 2013; 113:3056-60. [PMID: 22628223 DOI: 10.1002/jcb.24202] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
TFII-I transcription factors play an essential role during early vertebrate embryogenesis. Genome-wide mapping studies by ChIP-seq and ChIP-chip revealed that TFII-I primes multiple genomic loci in mouse embryonic stem cells and embryonic tissues. Moreover, many TFII-I-bound regions co-localize with H3K4me3/K27me3 bivalent chromatin within the promoters of lineage-specific genes. This minireview provides a summary of current knowledge regarding the function of TFII-I in epigenetic control of stem cell differentiation.
Collapse
Affiliation(s)
- Dashzeveg Bayarsaihan
- Center for Regenerative Medicine and Skeletal Development, Department of Reconstructive Sciences, School of Dentistry, University of Connecticut, Farmington, CT 06030, USA.
| | | | | |
Collapse
|
48
|
Sahu TK, Rao AR, Vasisht S, Singh N, Singh UP. Computational approaches, databases and tools for in silico motif discovery. Interdiscip Sci 2012; 4:239-255. [PMID: 23354813 DOI: 10.1007/s12539-012-0141-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2011] [Revised: 04/12/2012] [Accepted: 06/13/2012] [Indexed: 06/01/2023]
Abstract
Motifs are the biologically significant fragments of nucleotide or peptide sequences in a specific pattern. Motifs are categorized as structural motifs and sequence motifs. These are discovered by phylogenetic studies of similar genes across species. Structural motifs are formed by three dimensional arrangements of amino acids consisting of two or more α helices or β strands whereas sequence motifs are formed by the nucleotide fragments appearing in the exons of a gene. The arrangement of residues in structural motifs may not be continuous while it is continuous in sequence motifs. Sequence motifs may encode to the structural motifs. The algorithms used for motif discovery are important part of the bio-computational studies. The purpose of motif discovery is to identify patterns in biopolymer (nucleotide or protein) sequences to understand the structure and function of the molecules and their evolutionary aspects. The main aim of this paper is to provide systematic compilation of a review on different approaches, databases and tools used in motif discovery.
Collapse
Affiliation(s)
- Tanmaya Kumar Sahu
- Centre for Agricultural Bioinformatics, Indian Agricultural Statistics Research Institute, New Delhi, India
| | | | | | | | | |
Collapse
|
49
|
Lee KP, Piskurewicz U, Turečková V, Carat S, Chappuis R, Strnad M, Fankhauser C, Lopez-Molina L. Spatially and genetically distinct control of seed germination by phytochromes A and B. Genes Dev 2012; 26:1984-96. [PMID: 22948663 DOI: 10.1101/gad.194266.112] [Citation(s) in RCA: 89] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Phytochromes phyB and phyA mediate a remarkable developmental switch whereby, early upon seed imbibition, canopy light prevents phyB-dependent germination, whereas later on, it stimulates phyA-dependent germination. Using a seed coat bedding assay where the growth of dissected embryos is monitored under the influence of dissected endosperm, allowing combinatorial use of mutant embryos and endosperm, we show that canopy light specifically inactivates phyB activity in the endosperm to override phyA-dependent signaling in the embryo. This interference involves abscisic acid (ABA) release from the endosperm and distinct spatial activities of phytochrome signaling components. Under the canopy, endospermic ABA opposes phyA signaling through the transcription factor (TF) ABI5, which shares with the TF PIF1 several target genes that negatively regulate germination in the embryo. ABI5 enhances the expression of phytochrome signaling genes PIF1, SOMNUS, GAI, and RGA, but also of ABA and gibberellic acid (GA) metabolic genes. Over time, weaker ABA-dependent responses eventually enable phyA-dependent germination, a distinct type of germination driven solely by embryonic growth.
Collapse
Affiliation(s)
- Keun Pyo Lee
- Département de Biologie Végétale, 30, quai Ernest-Ansermet-Sciences III, Université de Genève, 1211 Genève 4, Switzerland
| | | | | | | | | | | | | | | |
Collapse
|
50
|
Zhong W, Zhang T, Zhu Y, Liu JS. CORRELATION PURSUIT: FORWARD STEPWISE VARIABLE SELECTION FOR INDEX MODELS. J R Stat Soc Series B Stat Methodol 2012; 74:849-870. [PMID: 23243388 PMCID: PMC3519449 DOI: 10.1111/j.1467-9868.2011.01026.x] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
In this article, a stepwise procedure, correlation pursuit (COP), is developed for variable selection under the sufficient dimension reduction framework, in which the response variable Y is influenced by the predictors X(1), X(2), …, X(p) through an unknown function of a few linear combinations of them. Unlike linear stepwise regression, COP does not impose a special form of relationship (such as linear) between the response variable and the predictor variables. The COP procedure selects variables that attain the maximum correlation between the transformed response and the linear combination of the variables. Various asymptotic properties of the COP procedure are established, and in particular, its variable selection performance under diverging number of predictors and sample size has been investigated. The excellent empirical performance of the COP procedure in comparison with existing methods are demonstrated by both extensive simulation studies and a real example in functional genomics.
Collapse
Affiliation(s)
- Wenxuan Zhong
- Department of Statistics, University of Illinois at Urbana Champaign, Champaign, IL 61820
| | - Tingting Zhang
- Department of Statistics, University of Virginia, Charlottesville, VA 22904
| | - Yu Zhu
- Department of Statistics, Purdue University, West Lafayette, IN 47907
| | - Jun S. Liu
- Department of Statistics, Harvard University, Cambridge, MA 02138
| |
Collapse
|