1
|
Lebatteux D, Soudeyns H, Boucoiran I, Gantt S, Diallo AB. Machine learning-based approach KEVOLVE efficiently identifies SARS-CoV-2 variant-specific genomic signatures. PLoS One 2024; 19:e0296627. [PMID: 38241279 PMCID: PMC10798494 DOI: 10.1371/journal.pone.0296627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 12/07/2023] [Indexed: 01/21/2024] Open
Abstract
Machine learning was shown to be effective at identifying distinctive genomic signatures among viral sequences. These signatures are defined as pervasive motifs in the viral genome that allow discrimination between species or variants. In the context of SARS-CoV-2, the identification of these signatures can assist in taxonomic and phylogenetic studies, improve in the recognition and definition of emerging variants, and aid in the characterization of functional properties of polymorphic gene products. In this paper, we assess KEVOLVE, an approach based on a genetic algorithm with a machine-learning kernel, to identify multiple genomic signatures based on minimal sets of k-mers. In a comparative study, in which we analyzed large SARS-CoV-2 genome dataset, KEVOLVE was more effective at identifying variant-discriminative signatures than several gold-standard statistical tools. Subsequently, these signatures were characterized using a new extension of KEVOLVE (KANALYZER) to highlight variations of the discriminative signatures among different classes of variants, their genomic location, and the mutations involved. The majority of identified signatures were associated with known mutations among the different variants, in terms of functional and pathological impact based on available literature. Here we showed that KEVOLVE is a robust machine learning approach to identify discriminative signatures among SARS-CoV-2 variants, which are frequently also biologically relevant, while bypassing multiple sequence alignments. The source code of the method and additional resources are available at: https://github.com/bioinfoUQAM/KEVOLVE.
Collapse
Affiliation(s)
- Dylan Lebatteux
- Department of Computer Science, Université du Québec à Montréal, Montréal, Québec, Canada
| | - Hugo Soudeyns
- CHU Sainte-Justine Research Centre, Montréal, Québec, Canada
- Department of Microbiology, Infectious Diseases and Immunology, Faculty of Medicine, Université de Montréal, Montréal, Québec, Canada
- Department of Pediatrics, Faculty of Medicine, Université du Québec à Montréal, Montréal, Québec, Canada
| | - Isabelle Boucoiran
- Department of Obstetrics and Gynecology, Faculty of Medicine, Université de Montréal, Montreal, Quebec, Canada
| | - Soren Gantt
- CHU Sainte-Justine Research Centre, Montréal, Québec, Canada
- Department of Microbiology, Infectious Diseases and Immunology, Faculty of Medicine, Université de Montréal, Montréal, Québec, Canada
| | | |
Collapse
|
2
|
Vahed M, Vahed M, Garmire LX. BML: a versatile web server for bipartite motif discovery. Brief Bioinform 2021; 23:6490318. [PMID: 34974623 PMCID: PMC8769915 DOI: 10.1093/bib/bbab536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Revised: 11/18/2021] [Accepted: 11/19/2021] [Indexed: 11/28/2022] Open
Abstract
Motif discovery and characterization are important for gene regulation analysis. The lack of intuitive and integrative web servers impedes the effective use of motifs. Most motif discovery web tools are either not designed for non-expert users or lacking optimization steps when using default settings. Here we describe bipartite motifs learning (BML), a parameter-free web server that provides a user-friendly portal for online discovery and analysis of sequence motifs, using high-throughput sequencing data as the input. BML utilizes both position weight matrix and dinucleotide weight matrix, the latter of which enables the expression of the interdependencies of neighboring bases. With input parameters concerning the motifs are given, the BML achieves significantly higher accuracy than other available tools for motif finding. When no parameters are given by non-expert users, unlike other tools, BML employs a learning method to identify motifs automatically and achieve accuracy comparable to the scenario where the parameters are set. The BML web server is freely available at http://motif.t-ridership.com/ (https://github.com/Mohammad-Vahed/BML).
Collapse
Affiliation(s)
- Mohammad Vahed
- Department of Pathology & Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles (UCLA), California, USA.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, 48105, USA
| | - Majid Vahed
- Pharmaceutical Sciences Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Lana X Garmire
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, 48105, USA
| |
Collapse
|
3
|
Ma X, Zheng B, Wang J, Li G, Cao S, Wen Y, Huang X, Zuo Z, Zhong Z, Gu Y. Quinolone Resistance of Actinobacillus pleuropneumoniae Revealed through Genome and Transcriptome Analyses. Int J Mol Sci 2021; 22:ijms221810036. [PMID: 34576206 PMCID: PMC8472844 DOI: 10.3390/ijms221810036] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 09/14/2021] [Accepted: 09/15/2021] [Indexed: 12/16/2022] Open
Abstract
Actinobacillus pleuropneumoniae is a pathogen that infects pigs and poses a serious threat to the pig industry. The emergence of quinolone-resistant strains of A.pleuropneumoniae further limits the choice of treatment. However, the mechanisms behind quinolone resistance in A.pleuropneumoniae remain unclear. The genomes of a ciprofloxacin-resistant strain, A. pleuropneumoniae SC1810 and its isogenic drug-sensitive counterpart were sequenced and analyzed using various bioinformatics tools, revealing 559 differentially expressed genes. The biological membrane, plasmid-mediated quinolone resistance genes and quinolone resistance-determining region were detected. Upregulated expression of efflux pump genes led to ciprofloxacin resistance. The expression of two porins, OmpP2B and LamB, was significantly downregulated in the mutant. Three nonsynonymous mutations in the mutant strain disrupted the water–metal ion bridge, subsequently reducing the affinity of the quinolone–enzyme complex for metal ions and leading to cross-resistance to multiple quinolones. The mechanism of quinolone resistance in A. pleuropneumoniae may involve inhibition of expression of the outer membrane protein genes ompP2B and lamB to decrease drug influx, overexpression of AcrB in the efflux pump to enhance its drug-pumping ability, and mutation in the quinolone resistance-determining region to weaken the binding of the remaining drugs. These findings will provide new potential targets for treatment.
Collapse
Affiliation(s)
- Xiaoping Ma
- Key Laboratory of Animal Disease and Human Health of Sichuan Province, College of Veterinary Medicine, Sichuan Agricultural University, Chengdu 611130, China; (X.M.); (B.Z.); (J.W.); (G.L.); (Z.Z.); (Z.Z.)
- Research Center of Swine Disease, College of Veterinary Medicine, Sichuan Agricultural University, Chengdu 611130, China; (Y.W.); (X.H.)
| | - Bowen Zheng
- Key Laboratory of Animal Disease and Human Health of Sichuan Province, College of Veterinary Medicine, Sichuan Agricultural University, Chengdu 611130, China; (X.M.); (B.Z.); (J.W.); (G.L.); (Z.Z.); (Z.Z.)
| | - Jiafan Wang
- Key Laboratory of Animal Disease and Human Health of Sichuan Province, College of Veterinary Medicine, Sichuan Agricultural University, Chengdu 611130, China; (X.M.); (B.Z.); (J.W.); (G.L.); (Z.Z.); (Z.Z.)
| | - Gen Li
- Key Laboratory of Animal Disease and Human Health of Sichuan Province, College of Veterinary Medicine, Sichuan Agricultural University, Chengdu 611130, China; (X.M.); (B.Z.); (J.W.); (G.L.); (Z.Z.); (Z.Z.)
- Bioengineering Department, Sichuan Water Conservancy Vocational College, Chengdu 611231, China
| | - Sanjie Cao
- Research Center of Swine Disease, College of Veterinary Medicine, Sichuan Agricultural University, Chengdu 611130, China; (Y.W.); (X.H.)
- Correspondence: (S.C.); (Y.G.)
| | - Yiping Wen
- Research Center of Swine Disease, College of Veterinary Medicine, Sichuan Agricultural University, Chengdu 611130, China; (Y.W.); (X.H.)
| | - Xiaobo Huang
- Research Center of Swine Disease, College of Veterinary Medicine, Sichuan Agricultural University, Chengdu 611130, China; (Y.W.); (X.H.)
| | - Zhicai Zuo
- Key Laboratory of Animal Disease and Human Health of Sichuan Province, College of Veterinary Medicine, Sichuan Agricultural University, Chengdu 611130, China; (X.M.); (B.Z.); (J.W.); (G.L.); (Z.Z.); (Z.Z.)
| | - Zhijun Zhong
- Key Laboratory of Animal Disease and Human Health of Sichuan Province, College of Veterinary Medicine, Sichuan Agricultural University, Chengdu 611130, China; (X.M.); (B.Z.); (J.W.); (G.L.); (Z.Z.); (Z.Z.)
| | - Yu Gu
- College of Life Sciences, Sichuan Agricultural University, Chengdu 611130, China
- Correspondence: (S.C.); (Y.G.)
| |
Collapse
|
4
|
PbCSE1 promotes lignification during stone cell development in pear (Pyrus bretschneideri) fruit. Sci Rep 2021; 11:9450. [PMID: 33941813 PMCID: PMC8093294 DOI: 10.1038/s41598-021-88825-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2020] [Accepted: 04/12/2021] [Indexed: 11/17/2022] Open
Abstract
Pear [Pyrus bretschneideri cv. Dangshan Su] fruit quality is not always satisfactory owing to the presence of stone cells, and lignin is the main component of stone cells in pear fruits. Caffeoyl shikimate esterase (CSE) is a key enzyme in the lignin biosynthesis. Although CSE-like genes have been isolated from a variety of plant species, their orthologs are not characterized in pear. In this study, the CSE gene family (PbCSE) from P. bretschneideri was identified. According to the physiological data and quantitative RT-PCR (qRT-PCR), PbCSE1 was associated with lignin deposition and stone cell formation. The overexpression of PbCSE1 increased the lignin content in pear fruits. Relative to wild-type (WT) Arabidopsis, the overexpression of PbCSE1 delayed growth, increased the lignin deposition and lignin content in stems. Simultaneously, the expression of lignin biosynthetic genes were also increased in pear fruits and Arabidopsis. These results demonstrated that PbCSE1 plays an important role in cell lignification and will provide a potential molecular strategy to improve the quality of pear fruits.
Collapse
|
5
|
Yalcin D, Otu HH. An Unbiased Predictive Model to Detect DNA Methylation Propensity of CpG Islands in the Human Genome. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200724145835] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Epigenetic repression mechanisms play an important role in gene
regulation, specifically in cancer development. In many cases, a CpG island’s (CGI) susceptibility
or resistance to methylation is shown to be contributed by local DNA sequence features.
Objective:
To develop unbiased machine learning models–individually and combined for different
biological features–that predict the methylation propensity of a CGI.
Methods:
We developed our model consisting of CGI sequence features on a dataset of 75
sequences (28 prone, 47 resistant) representing a genome-wide methylation structure. We tested
our model on two independent datasets that are chromosome (132 sequences) and disease (70
sequences) specific.
Results:
We provided improvements in prediction accuracy over previous models. Our results
indicate that combined features better predict the methylation propensity of a CGI (area under the
curve (AUC) ~0.81). Our global methylation classifier performs well on independent datasets
reaching an AUC of ~0.82 for the complete model and an AUC of ~0.88 for the model using select
sequences that better represent their classes in the training set. We report certain de novo motifs
and transcription factor binding site (TFBS) motifs that are consistently better in separating prone
and resistant CGIs.
Conclusion:
Predictive models for the methylation propensity of CGIs lead to a better
understanding of disease mechanisms and can be used to classify genes based on their tendency to
contain methylation prone CGIs, which may lead to preventative treatment strategies. MATLAB®
and Python™ scripts used for model building, prediction, and downstream analyses are available
at https://github.com/dicleyalcin/methylProp_predictor.
Collapse
Affiliation(s)
- Dicle Yalcin
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE, 68588, United States
| | - Hasan H. Otu
- Department of Electrical and Computer Engineering, University of Nebraska-Lincoln, Lincoln, NE, 68588, United States
| |
Collapse
|
6
|
Cagirici HB, Budak H, Sen TZ. Genome-wide discovery of G-quadruplexes in barley. Sci Rep 2021; 11:7876. [PMID: 33846409 PMCID: PMC8041835 DOI: 10.1038/s41598-021-86838-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Accepted: 03/19/2021] [Indexed: 12/04/2022] Open
Abstract
G-quadruplexes (G4s) are four-stranded nucleic acid structures with closely spaced guanine bases forming square planar G-quartets. Aberrant formation of G4 structures has been associated with genomic instability. However, most plant species are lacking comprehensive studies of G4 motifs. In this study, genome-wide identification of G4 motifs in barley was performed, followed by a comparison of genomic distribution and molecular functions to other monocot species, such as wheat, maize, and rice. Similar to the reports on human and some plants like wheat, G4 motifs peaked around the 5′ untranslated region (5′ UTR), the first coding domain sequence, and the first intron start sites on antisense strands. Our comparative analyses in human, Arabidopsis, maize, rice, and sorghum demonstrated that the peak points could be erroneously merged into a single peak when large window sizes are used. We also showed that the G4 distributions around genic regions are relatively similar in the species studied, except in the case of Arabidopsis. G4 containing genes in monocots showed conserved molecular functions for transcription initiation and hydrolase activity. Additionally, we provided examples of imperfect G4 motifs.
Collapse
Affiliation(s)
- H Busra Cagirici
- Crop Improvement and Genetics Research Unit, Western Regional Research Center, U.S. Department of Agriculture - Agricultural Research Service, 800 Buchanan St, Albany, CA, 94710, USA
| | - Hikmet Budak
- Montana BioAg Inc., Missoula, MT, USA.,Agrogen, LLC., Omaha, NE, USA
| | - Taner Z Sen
- Crop Improvement and Genetics Research Unit, Western Regional Research Center, U.S. Department of Agriculture - Agricultural Research Service, 800 Buchanan St, Albany, CA, 94710, USA.
| |
Collapse
|
7
|
Hennigs JK, Cao A, Li CG, Shi M, Mienert J, Miyagawa K, Körbelin J, Marciano DP, Chen PI, Roughley M, Elliott MV, Harper RL, Bill M, Chappell J, Moonen JR, Diebold I, Wang L, Snyder MP, Rabinovitch M. PPARγ-p53-Mediated Vasculoregenerative Program to Reverse Pulmonary Hypertension. Circ Res 2021; 128:401-418. [PMID: 33322916 PMCID: PMC7908816 DOI: 10.1161/circresaha.119.316339] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
RATIONALE In pulmonary arterial hypertension (PAH), endothelial dysfunction and obliterative vascular disease are associated with DNA damage and impaired signaling of BMPR2 (bone morphogenetic protein type 2 receptor) via two downstream transcription factors, PPARγ (peroxisome proliferator-activated receptor gamma), and p53. OBJECTIVE We investigated the vasculoprotective and regenerative potential of a newly identified PPARγ-p53 transcription factor complex in the pulmonary endothelium. METHODS AND RESULTS In this study, we identified a pharmacologically inducible vasculoprotective mechanism in pulmonary arterial and lung MV (microvascular) endothelial cells in response to DNA damage and oxidant stress regulated in part by a BMPR2 dependent transcription factor complex between PPARγ and p53. Chromatin immunoprecipitation sequencing and RNA-sequencing established an inducible PPARγ-p53 mediated regenerative program regulating 19 genes involved in lung endothelial cell survival, angiogenesis and DNA repair including, EPHA2 (ephrin type-A receptor 2), FHL2 (four and a half LIM domains protein 2), JAG1 (jagged 1), SULF2 (extracellular sulfatase Sulf-2), and TIGAR (TP53-inducible glycolysis and apoptosis regulator). Expression of these genes was partially impaired when the PPARγ-p53 complex was pharmacologically disrupted or when BMPR2 was reduced in pulmonary artery endothelial cells (PAECs) subjected to oxidative stress. In endothelial cell-specific Bmpr2-knockout mice unable to stabilize p53 in endothelial cells under oxidative stress, Nutlin-3 rescued endothelial p53 and PPARγ-p53 complex formation and induced target genes, such as APLN (apelin) and JAG1, to regenerate pulmonary microvessels and reverse pulmonary hypertension. In PAECs from BMPR2 mutant PAH patients, pharmacological induction of p53 and PPARγ-p53 genes repaired damaged DNA utilizing genes from the nucleotide excision repair pathway without provoking PAEC apoptosis. CONCLUSIONS We identified a novel therapeutic strategy that activates a vasculoprotective gene regulation program in PAECs downstream of dysfunctional BMPR2 to rehabilitate PAH PAECs, regenerate pulmonary microvessels, and reverse disease. Our studies pave the way for p53-based vasculoregenerative therapies for PAH by extending the therapeutic focus to PAEC dysfunction and to DNA damage associated with PAH progression.
Collapse
Affiliation(s)
- Jan K. Hennigs
- Vera Moulton Wall Center for Pulmonary Vascular Disease, Stanford University School of Medicine, Stanford, CA 94305, USA
- Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Pneumology & Center for Pulmonary Arterial Hypertension Hamburg
- II. Department of Medicine, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany
| | - Aiqin Cao
- Vera Moulton Wall Center for Pulmonary Vascular Disease, Stanford University School of Medicine, Stanford, CA 94305, USA
- Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Caiyun G. Li
- Vera Moulton Wall Center for Pulmonary Vascular Disease, Stanford University School of Medicine, Stanford, CA 94305, USA
- Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Radiation Oncology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Minyi Shi
- Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Julia Mienert
- Department of Pneumology & Center for Pulmonary Arterial Hypertension Hamburg
- II. Department of Medicine, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany
| | - Kazuya Miyagawa
- Vera Moulton Wall Center for Pulmonary Vascular Disease, Stanford University School of Medicine, Stanford, CA 94305, USA
- Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Jakob Körbelin
- Department of Pneumology & Center for Pulmonary Arterial Hypertension Hamburg
- II. Department of Medicine, University Medical Center Hamburg-Eppendorf, 20246 Hamburg, Germany
| | - David P. Marciano
- Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Pin-I Chen
- Vera Moulton Wall Center for Pulmonary Vascular Disease, Stanford University School of Medicine, Stanford, CA 94305, USA
- Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Matthew Roughley
- Vera Moulton Wall Center for Pulmonary Vascular Disease, Stanford University School of Medicine, Stanford, CA 94305, USA
- Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Matthew V. Elliott
- Vera Moulton Wall Center for Pulmonary Vascular Disease, Stanford University School of Medicine, Stanford, CA 94305, USA
- Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Rebecca L. Harper
- Vera Moulton Wall Center for Pulmonary Vascular Disease, Stanford University School of Medicine, Stanford, CA 94305, USA
- Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Matthew Bill
- Vera Moulton Wall Center for Pulmonary Vascular Disease, Stanford University School of Medicine, Stanford, CA 94305, USA
- Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - James Chappell
- Vera Moulton Wall Center for Pulmonary Vascular Disease, Stanford University School of Medicine, Stanford, CA 94305, USA
- Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Jan-Renier Moonen
- Vera Moulton Wall Center for Pulmonary Vascular Disease, Stanford University School of Medicine, Stanford, CA 94305, USA
- Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Isabel Diebold
- Vera Moulton Wall Center for Pulmonary Vascular Disease, Stanford University School of Medicine, Stanford, CA 94305, USA
- Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Lingli Wang
- Vera Moulton Wall Center for Pulmonary Vascular Disease, Stanford University School of Medicine, Stanford, CA 94305, USA
- Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Michael P Snyder
- Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Marlene Rabinovitch
- Cardiovascular Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
8
|
Sultan I, Fromion V, Schbath S, Nicolas P. Statistical modelling of bacterial promoter sequences for regulatory motif discovery with the help of transcriptome data: application to Listeria monocytogenes. J R Soc Interface 2020; 17:20200600. [PMID: 33023397 PMCID: PMC7653377 DOI: 10.1098/rsif.2020.0600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Accepted: 09/10/2020] [Indexed: 11/12/2022] Open
Abstract
Automatic de novo identification of the main regulons of a bacterium from genome and transcriptome data remains a challenge. To address this task, we propose a statistical model that can use information on exact positions of the transcription start sites and condition-dependent expression profiles. The central idea of this model is to improve the probabilistic representation of the promoter DNA sequences by incorporating covariates summarizing expression profiles (e.g. coordinates in projection spaces or hierarchical clustering trees). A dedicated trans-dimensional Markov chain Monte Carlo algorithm adjusts the width and palindromic properties of the corresponding position-weight matrices, the number of parameters to describe exact position relative to the transcription start site, and chooses the expression covariates relevant for each motif. All parameters are estimated simultaneously, for many motifs and many expression covariates. The method is applied to a dataset of transcription start sites and expression profiles available for Listeria monocytogenes. The results validate the approach and provide a new global view of the transcription regulatory network of this important pathogen. Remarkably, a previously unreported motif is found in promoter regions of ribosomal protein genes, suggesting a role in the regulation of growth.
Collapse
Affiliation(s)
- Ibrahim Sultan
- Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France
| | | | | | - Pierre Nicolas
- Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France
| |
Collapse
|
9
|
Zhao G, Guo L, Zhang Y, Gao L, Ma LJ. Identifying TF Binding Motifs from a Partial Set of Target Genes and its Application to Regulatory Network Inference. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1211-1221. [PMID: 30475725 DOI: 10.1109/tcbb.2018.2882377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Motif identification has been one of the most widely studied problems in bioinformatics. Many methods have been developed to discover binding motifs from a large set of genes. But when the given genes are only a partial set of target genes, the statistical significance usually contains a bias towards the input. If we can identify the TF binding motif from a partial set of target genes, we can save the labor costs and resources for doing many experiments. In this paper, we propose a method MISA (Motif Identification through Segments Assembly) to identify binding motifs from a subset of target genes. By ranking and assembling the segments, MISA discovers a set of binding motifs with the best length to fit our proposed objective function. We also predict the additional target genes as an application of regulatory network inference. We compare our approach with two widely used methods MEME and AlignACE by analyzing both the quality of the binding motif and network inference. Using two model organisms S. cerevisiae and E. coli, we show that with 20 percent of the target genes (minimum sample size of 20), we can achieve a motif similarity of 82 percent with the known motifs. Our results also show that 73 percent of target genes on average can be correctly predicted without introducing many false target genes.
Collapse
|
10
|
Zhou H, Mehta S, Srivastava SP, Grabinska K, Zhang X, Wong C, Hedayat A, Perrotta P, Fernández-Hernando C, Sessa WC, Goodwin JE. Endothelial cell-glucocorticoid receptor interactions and regulation of Wnt signaling. JCI Insight 2020; 5:131384. [PMID: 32051336 DOI: 10.1172/jci.insight.131384] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2019] [Accepted: 01/02/2020] [Indexed: 12/12/2022] Open
Abstract
Vascular inflammation is present in many cardiovascular diseases, and exogenous glucocorticoids have traditionally been used as a therapy to suppress inflammation. However, recent data have shown that endogenous glucocorticoids, acting through the endothelial glucocorticoid receptor, act as negative regulators of inflammation. Here, we performed ChIP for the glucocorticoid receptor, followed by next-generation sequencing in mouse endothelial cells to investigate how the endothelial glucocorticoid receptor regulates vascular inflammation. We identified a role of the Wnt signaling pathway in this setting and show that loss of the endothelial glucocorticoid receptor results in upregulation of Wnt signaling both in vitro and in vivo using our validated mouse model. Furthermore, we demonstrate glucocorticoid receptor regulation of a key gene in the Wnt pathway, Frzb, via a glucocorticoid response element gleaned from our genomic data. These results suggest a role for endothelial Wnt signaling modulation in states of vascular inflammation.
Collapse
Affiliation(s)
- Han Zhou
- Department of Pediatrics.,Vascular Biology and Therapeutics Program
| | | | | | - Kariona Grabinska
- Vascular Biology and Therapeutics Program.,Department of Pharmacology
| | - Xinbo Zhang
- Vascular Biology and Therapeutics Program.,Integrative Cell Signaling and Neurobiology of Metabolism Program.,Department of Comparative Medicine, and
| | | | - Ahmad Hedayat
- Department of Pediatrics.,Vascular Biology and Therapeutics Program
| | - Paola Perrotta
- Vascular Biology and Therapeutics Program.,Department of Pharmacology
| | - Carlos Fernández-Hernando
- Vascular Biology and Therapeutics Program.,Integrative Cell Signaling and Neurobiology of Metabolism Program.,Department of Comparative Medicine, and.,Department of Pathology, Yale University School of Medicine, New Haven, Connecticut, USA
| | - William C Sessa
- Vascular Biology and Therapeutics Program.,Department of Pharmacology
| | - Julie E Goodwin
- Department of Pediatrics.,Vascular Biology and Therapeutics Program
| |
Collapse
|
11
|
Bottini S, Pratella D, Grandjean V, Repetto E, Trabucchi M. Recent computational developments on CLIP-seq data analysis and microRNA targeting implications. Brief Bioinform 2019; 19:1290-1301. [PMID: 28605404 PMCID: PMC6291801 DOI: 10.1093/bib/bbx063] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2017] [Indexed: 01/18/2023] Open
Abstract
Cross-Linking
Immunoprecipitation associated to
high-throughput sequencing (CLIP-seq) is a technique used to
identify RNA directly bound to RNA-binding proteins across the entire transcriptome in
cell or tissue samples. Recent technological and computational advances permit the
analysis of many CLIP-seq samples simultaneously, allowing us to reveal the comprehensive
network of RNA–protein interaction and to integrate it to other genome-wide analyses.
Therefore, the design and quality management of the CLIP-seq analyses are of critical
importance to extract clean and biological meaningful information from CLIP-seq
experiments. The application of CLIP-seq technique to Argonaute 2 (Ago2) protein, the main
component of the microRNA (miRNA)-induced silencing complex, reveals the direct binding
sites of miRNAs, thus providing insightful information about the role played by miRNA(s).
In this review, we summarize and discuss the most recent computational methods for
CLIP-seq analysis, and discuss their impact on Ago2/miRNA-binding site identification and
prediction with a regard toward human pathologies.
Collapse
Affiliation(s)
- Silvia Bottini
- Université Côte d'Azur, Inserm, C3M, 151 route de St-Antoine-de-Ginestière, B.P. 2 3194, 06204 Nice, France
| | - David Pratella
- Université Côte d'Azur, Inserm, C3M, 151 route de St-Antoine-de-Ginestière, B.P. 2 3194, 06204 Nice, France
| | - Valerie Grandjean
- Université Côte d'Azur, Inserm, C3M, 151 route de St-Antoine-de-Ginestière, B.P. 2 3194, 06204 Nice, France
| | - Emanuela Repetto
- Université Côte d'Azur, Inserm, C3M, 151 route de St-Antoine-de-Ginestière, B.P. 2 3194, 06204 Nice, France
| | - Michele Trabucchi
- Université Côte d'Azur, Inserm, C3M, 151 route de St-Antoine-de-Ginestière, B.P. 2 3194, 06204 Nice, France
| |
Collapse
|
12
|
Lebatteux D, Remita AM, Diallo AB. Toward an Alignment-Free Method for Feature Extraction and Accurate Classification of Viral Sequences. J Comput Biol 2019; 26:519-535. [PMID: 31050550 DOI: 10.1089/cmb.2018.0239] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The classification of pathogens in emerging and re-emerging viruses represents major interests in taxonomic studies, functional genomics, host-pathogen interplay, prevention, and disease treatments. It consists of assigning a given sequence to its related group of known sequences sharing similar characteristics and traits. The challenges to such classification could be associated with several virus properties including recombination, mutation rate, multiplicity of motifs, and diversity. In domains such as pathogen monitoring and surveillance, it is important to detect and quantify known and novel taxa without exploiting the full and accurate alignments or virus family profiles. In this study, we propose an alignment-free method, CASTOR-KRFE, to detect discriminating subsequences within known pathogen sequences to classify accurately unknown pathogen sequences. This method includes three major steps: (1) vectorization of known viral genomic sequences based on k-mers to constitute the potential features, (2) efficient way of pattern extraction and evaluation maximizing classification performance, and (3) prediction of the minimal set of features fitting a given criterion (threshold of performance metric and maximum number of features). We assessed this method through a jackknife data partitioning on a dozen of various virus data sets, covering the seven major virus groups and including influenza virus, Ebola virus, human immunodeficiency virus 1, hepatitis C virus, hepatitis B virus, and human papillomavirus. CASTOR-KRFE provides a weighted average F-measure >0.96 over a wide range of viruses. Our method also shows better performance on complex virus data sets than multiple subsequences extractor for classification (MISSEL), a subsequence extraction method, and the Discriminative mode of MEME patterns extraction tool.
Collapse
Affiliation(s)
- Dylan Lebatteux
- Department of Computer Science, Université du Québec à Montréal, Montreal, Canada
| | - Amine M Remita
- Department of Computer Science, Université du Québec à Montréal, Montreal, Canada
| | | |
Collapse
|
13
|
Gao J, Byrd AK, Zybailov BL, Marecki JC, Guderyon MJ, Edwards AD, Chib S, West KL, Waldrip ZJ, Mackintosh SG, Gao Z, Putnam AA, Jankowsky E, Raney KD. DEAD-box RNA helicases Dbp2, Ded1 and Mss116 bind to G-quadruplex nucleic acids and destabilize G-quadruplex RNA. Chem Commun (Camb) 2019; 55:4467-4470. [PMID: 30855040 PMCID: PMC6459694 DOI: 10.1039/c8cc10091h] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
We identified 29 G-quadruplex binding proteins by affinity purification and quantitative LC-MS/MS. We demonstrated that the DEAD-box RNA helicases Dbp2, Ded1 and Mss116 preferentially bind to G-quadruplex nucleic acids in vitro and destabilize RNA quadruplexes, suggesting new potential roles for these helicases in disruption of quadruplex structures in RNA.
Collapse
Affiliation(s)
- Jun Gao
- Department of Biochemistry and Molecular Biology, College of Medicine, University of Arkansas for Medical Sciences, 4301 West Markham Street (Slot 516), Little Rock, Arkansas 72205, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Hashim FA, Mabrouk MS, Al-Atabany W. Review of Different Sequence Motif Finding Algorithms. Avicenna J Med Biotechnol 2019; 11:130-148. [PMID: 31057715 PMCID: PMC6490410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2018] [Accepted: 05/26/2018] [Indexed: 11/05/2022] Open
Abstract
The DNA motif discovery is a primary step in many systems for studying gene function. Motif discovery plays a vital role in identification of Transcription Factor Binding Sites (TFBSs) that help in learning the mechanisms for regulation of gene expression. Over the past decades, different algorithms were used to design fast and accurate motif discovery tools. These algorithms are generally classified into consensus or probabilistic approaches that many of them are time-consuming and easily trapped in a local optimum. Nature-inspired algorithms and many of combinatorial algorithms are recently proposed to overcome these problems. This paper presents a general classification of motif discovery algorithms with new sub-categories that facilitate building a successful motif discovery algorithm. It also presents a summary of comparison between them.
Collapse
Affiliation(s)
- Fatma A. Hashim
- Department of Biomedical Engineering, Helwan University, Egypt
| | - Mai S. Mabrouk
- Department of Biomedical Engineering, Misr University for Science and Technology (MUST), Egypt
| | | |
Collapse
|
15
|
Hashim FA, Mabrouk MS, Atabany WA. Comparative Analysis of DNA Motif Discovery Algorithms: A Systemic Review. CURRENT CANCER THERAPY REVIEWS 2019. [DOI: 10.2174/1573394714666180417161728] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Bioinformatics is an interdisciplinary field that combines biology and information
technology to study how to deal with the biological data. The DNA motif discovery
problem is the main challenge of genome biology and its importance is directly proportional to increasing
sequencing technologies which produce large amounts of data. DNA motif is a repeated
portion of DNA sequences of major biological interest with important structural and functional
features. Motif discovery plays a vital role in the antibody-biomarker identification which is useful
for diagnosis of disease and to identify Transcription Factor Binding Sites (TFBSs) that help in
learning the mechanisms for regulation of gene expression. Recently, scientists discovered that the
TFs have a mutation rate five times higher than the flanking sequences, so motif discovery also
has a crucial role in cancer discovery.
Methods:
Over the past decades, many attempts use different algorithms to design fast and accurate
motif discovery tools. These algorithms are generally classified into consensus or probabilistic
approach.
Results:
Many of DNA motif discovery algorithms are time-consuming and easily trapped in a local
optimum.
Conclusion:
Nature-inspired algorithms and many of combinatorial algorithms are recently proposed
to overcome the problems of consensus and probabilistic approaches. This paper presents a
general classification of motif discovery algorithms with new sub-categories. It also presents a
summary comparison between them.
Collapse
Affiliation(s)
- Fatma A. Hashim
- Department of Biomedical Engineering, Helwan University, Helwan, Egypt
| | - Mai S. Mabrouk
- Department of Biomedical Engineering, Misr University for Science and Technology (MUST), Cairo, Egypt
| | | |
Collapse
|
16
|
Abstract
Designing the expression cassettes with desired properties remains the most important consideration of gene engineering technology. One of the challenges for predictive gene expression is the modeling of synthetic gene switches to regulate one or more target genes which would directly respond to specific chemical, environmental, and physiological stimuli. Assessment of natural promoter, high-throughput sequencing, and modern biotech inventory aided in deciphering the structure of cis elements and molding the native cis elements into desired synthetic promoter. Synthetic promoters which are molded by rearrangement of cis motifs can greatly benefit plant biotechnology applications. This review gives a glimpse of the manual in vivo gene regulation through synthetic promoters. It summarizes the integrative design strategy of synthetic promoters and enumerates five approaches for constructing synthetic promoters. Insights into the pattern of cis regulatory elements in the pursuit of desirable "gene switches" to date has also been reevaluated. Joint strategies of bioinformatics modeling and randomized biochemical synthesis are addressed in an effort to construct synthetic promoters for intricate gene regulation.
Collapse
|
17
|
Liu C, Liu B, Zhang Y, Jiang F, Ren Y, Li S, Wang H, Fan W. Ancient horizontally transferred genes in the genome of California two-spot octopus, Octopus bimaculoides. Gene 2018; 667:34-44. [PMID: 29738840 DOI: 10.1016/j.gene.2018.05.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2017] [Revised: 04/10/2018] [Accepted: 05/02/2018] [Indexed: 11/28/2022]
Abstract
Horizontal gene transfer (HGT), a mechanism that shares genetic material between the host and donor from separated offspring branches, has been described as a means of producing novel and beneficial phenotypes for the host organisms. However, in molluscs, the second most diverse group, the existence of HGT is still controversial. In the present study, 12 HGT genes were identified from California two-spot octopus Octopus bimaculoides based on a similarity search, phylogenetic construction, gene composition analysis and PCR (Polymerase Chain Reaction) validation. Based on the phylogenetic topologies, ten HGT genes were identified to have been transferred into the possible molluscan ancestor, possibly before its radiation. Furthermore, most of the donor organisms were predicted to be familiar bacteria in marine environments. These horizontally transferred genes were under a strong negative selection and could be transcribed in octopus functionally. The predicted biochemical functions of these genes include metabolism, neurotransmission, immune defense and tissue integrity. Seven Zn-metalloproteinases were validated as the main type of HGT genes in octopus with divergent motif composition, intron presence and phylogenetic relationship to the endogenous ones. Furthermore, the functions of Zn-metalloproteinase were predicted to be responsible for immune defense and tissue remolding. Three HGT genes were distributed mainly in the nervous system and were predicted to regulate the neurotransmission through glia-neuronal interactions. The results collectively indicated the existence of HGT in molluscs and its potential contribution to the evolution of octopus with regards to functional innovation and adaptability.
Collapse
Affiliation(s)
- Conghui Liu
- Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China.
| | - Bo Liu
- Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| | - Yan Zhang
- Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| | - Fan Jiang
- Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| | - Yuwei Ren
- Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| | - Shuqu Li
- Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| | - Hengchao Wang
- Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| | - Wei Fan
- Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China.
| |
Collapse
|
18
|
Conghui L, Bo L, Yan Z, Fan J, Yuwei R, Shuqu L, Hengchao W, Wei F. Data on horizontally transferred genes in California two-spot octopus, Octopus bimaculoides. Data Brief 2018; 19:1274-1286. [PMID: 29942828 PMCID: PMC6011040 DOI: 10.1016/j.dib.2018.05.132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2018] [Accepted: 05/23/2018] [Indexed: 11/17/2022] Open
Abstract
Horizontal gene transfer (HGT), a mechanism that shares genetic material between the host and donor from separated offspring branches, has been described as a means of producing novel and beneficial phenotypes for the host organisms. In the present study, 12 HGT genes were identified from California two-spot octopus Octopus bimaculoides based on a similarity search, phylogenetic construction, gene composition analysis and PCR (Polymerase Chain Reaction) validation. The data collected from the HGT genes from octopus, indicating the phylogenetic incongruences, CodonW analysis, PCR products, detailed motifs and organisms used in screening. In phylogenetic screening, those genes were nested within bacteria homologs and identified as HGT genes transferred from the bacteria to the octopus. The motifs were similar in proteins of the horizontally acquired Zn-metalloproteinases, but differed to endogenous proteins. CodonW was employed to investigate the codon usage bias between HGT genes and other genes in the octopus genome. In PCR validation, all the HGT genes could be produced as amplified fragments. The results collectively indicated the existence of HGT in molluscs and its potential l contribution to the evolution of octopus with regards to functional innovation and adaptability.
Collapse
|
19
|
Lee NK, Azizan FL, Wong YS, Omar N. DeepFinder: An integration of feature-based and deep learning approach for DNA motif discovery. BIOTECHNOL BIOTEC EQ 2018. [DOI: 10.1080/13102818.2018.1438209] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open
Affiliation(s)
- Nung Kion Lee
- Department of Cognitive Sciences, Faculty of Cognitive Sciences and Human Development, Universiti Malaysia Sarawak, Kota Samarahan, Sarawak, Malaysia
| | - Farah Liyana Azizan
- Centre For Pre-University Studies, Universiti Malaysia Sarawak, Kota Samarahan, Sarawak, Malaysia
| | - Yu Shiong Wong
- Department of Cognitive Sciences, Faculty of Cognitive Sciences and Human Development, Universiti Malaysia Sarawak, Kota Samarahan, Sarawak, Malaysia
| | - Norshafarina Omar
- Department of Cognitive Sciences, Faculty of Cognitive Sciences and Human Development, Universiti Malaysia Sarawak, Kota Samarahan, Sarawak, Malaysia
| |
Collapse
|
20
|
Dezhsetan S. Genome scanning for identification and mapping of receptor-like kinase (RLK) gene superfamily in Solanum tuberosum. PHYSIOLOGY AND MOLECULAR BIOLOGY OF PLANTS : AN INTERNATIONAL JOURNAL OF FUNCTIONAL PLANT BIOLOGY 2017; 23:755-765. [PMID: 29158626 PMCID: PMC5671453 DOI: 10.1007/s12298-017-0471-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2017] [Revised: 09/13/2017] [Accepted: 09/18/2017] [Indexed: 05/19/2023]
Abstract
Receptor-like kinases (RLKs) are a key class of genes that contribute to diverse phenomena from plant development to defense responses. The availability of completed potato genome sequences provide an excellent opportunity to identify and characterize RLK gene superfamily in this lineage. We identified 747 non-redundant RLK genes in the potato genome that were classified into 52 subfamilies, of which 58% members organized into tandem repeats. Nine of potato RLK subfamilies organized into tandem repeats. Also, six subfamilies exhibited lineage-specific expansion compared to Arabidopsis. The majority of RLK genes were physically organized within heterogeneous and homogeneous clusters on chromosomes and were unevenly distributed on the genome. Chromosome 2, 3 and 7 contained the highest number of RLK genes and the most underrepresented chromosomes were chromosome 8, 10 and 11. Taken together, our results provide a framework for future efforts on comparative, evolutionary and functional studies of the members of RLK superfamily.
Collapse
|
21
|
Alcántara-Silva R, Alvarado-Hermida M, Díaz-Contreras G, Sánchez-Barrios M, Carrera S, Galván SC. PISMA: A Visual Representation of Motif Distribution in DNA Sequences. Bioinform Biol Insights 2017; 11:1177932217700907. [PMID: 28469418 PMCID: PMC5390925 DOI: 10.1177/1177932217700907] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2016] [Accepted: 02/19/2017] [Indexed: 11/17/2022] Open
Abstract
Background: Because the graphical presentation and analysis of motif distribution can provide insights for experimental hypothesis, PISMA aims at identifying motifs on DNA sequences, counting and showing them graphically. The motif length ranges from 2 to 10 bases, and the DNA sequences range up to 10 kb. The motif distribution is shown as a bar-code–like, as a gene-map–like, and as a transcript scheme. Results: We obtained graphical schemes of the CpG site distribution from 91 human papillomavirus genomes. Also, we present 2 analyses: one of DNA motifs associated with either methylation-resistant or methylation-sensitive CpG islands and another analysis of motifs associated with exosome RNA secretion. Availability and Implementation: PISMA is developed in Java; it is executable in any type of hardware and in diverse operating systems. PISMA is freely available to noncommercial users. The English version and the User Manual are provided in Supplementary Files 1 and 2, and a Spanish version is available at www.biomedicas.unam.mx/wp-content/software/pisma.zip and www.biomedicas.unam.mx/wp-content/pdf/manual/pisma.pdf .
Collapse
Affiliation(s)
- Rogelio Alcántara-Silva
- División de Ingeniería Eléctrica, Facultad de Ingeniería, Universidad Nacional Autónoma de México (UNAM), México City, México
| | - Moisés Alvarado-Hermida
- División de Ingeniería Eléctrica, Facultad de Ingeniería, Universidad Nacional Autónoma de México (UNAM), México City, México
| | - Gibrán Díaz-Contreras
- División de Ingeniería Eléctrica, Facultad de Ingeniería, Universidad Nacional Autónoma de México (UNAM), México City, México
| | - Martha Sánchez-Barrios
- Unidad de Posgrado, Facultad de Química, Universidad Nacional Autónoma de México (UNAM), México City, México
| | - Samantha Carrera
- Faculty of Biology, Medicine and Health, The University of Manchester, UK
| | - Silvia Carolina Galván
- Instituto de Investigaciones Biomédicas, Universidad Nacional Autónoma de México (UNAM), México City, México
| |
Collapse
|
22
|
Kumar V, Yadav AN, Verma P, Sangwan P, Saxena A, Kumar K, Singh B. β-Propeller phytases: Diversity, catalytic attributes, current developments and potential biotechnological applications. Int J Biol Macromol 2017; 98:595-609. [PMID: 28174082 DOI: 10.1016/j.ijbiomac.2017.01.134] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2016] [Revised: 01/26/2017] [Accepted: 01/31/2017] [Indexed: 02/02/2023]
Abstract
Phytases are phosphatases which stepwise remove phosphates from phytic acid or its salts. β-Propeller phytase (BPPhy) belongs to a special class of microbial phytases that is regarded as most diverse, isolated and characterized from different microbes, mainly from Bacillus spp. BPPhy class is unique for its Ca2+-dependent catalytic activity, strict substrate specificity, active at neutral to alkaline pH and high thermostability. Numerous sequence and structure based studies have revealed unique attributes and catalytic properties of this class, as compared to other classes of phytases. Recent studies including cloning and expression and genetic engineering approaches have led to improvements in BPPhy which provide an opportunity for extended utilization of this class of phytases in improving animal nutrition, human health, plant growth promotion, and environmental protection, etc. This review describes the sources and diversity of BPPhy genes, biochemical properties, Ca2+ dependence, current developments in structural elucidation, heterogeneous expression and catalytic improvements, and multifarious applications of BPPhy.
Collapse
Affiliation(s)
- Vinod Kumar
- Department of Biotechnology, Akal College of Agriculture, Eternal University, Baru Sahib, Sirmour 173101, India.
| | - Ajar Nath Yadav
- Department of Biotechnology, Akal College of Agriculture, Eternal University, Baru Sahib, Sirmour 173101, India
| | - Priyanka Verma
- Department of Microbiology, Akal College of Basic Sciences, Eternal University, Baru Sahib, Sirmour 173101, India
| | - Punesh Sangwan
- Department of Biochemistry, Akal College of Basic Sciences, Eternal University, Baru Sahib, Sirmour 173101, India
| | - Abhishake Saxena
- Department of Biotechnology, Akal College of Agriculture, Eternal University, Baru Sahib, Sirmour 173101, India
| | - Krishan Kumar
- Department of Food Technology, Akal College of Agriculture, Eternal University, Baru Sahib, Sirmour 173101, India
| | - Bijender Singh
- Department of Microbiology, Maharshi Dayanand University, Rohtak 124001, India
| |
Collapse
|
23
|
Divergent DNA Methylation Provides Insights into the Evolution of Duplicate Genes in Zebrafish. G3-GENES GENOMES GENETICS 2016; 6:3581-3591. [PMID: 27646705 PMCID: PMC5100857 DOI: 10.1534/g3.116.032243] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
The evolutionary mechanism, fate and function of duplicate genes in various taxa have been widely studied; however, the mechanism underlying the maintenance and divergence of duplicate genes in Danio rerio remains largely unexplored. Whether and how the divergence of DNA methylation between duplicate pairs is associated with gene expression and evolutionary time are poorly understood. In this study, by analyzing bisulfite sequencing (BS-seq) and RNA-seq datasets from public data, we demonstrated that DNA methylation played a critical role in duplicate gene evolution in zebrafish. Initially, we found promoter methylation of duplicate genes generally decreased with evolutionary time as measured by synonymous substitution rate between paralogous duplicates (Ks). Importantly, promoter methylation of duplicate genes was negatively correlated with gene expression. Interestingly, for 665 duplicate gene pairs, one gene was consistently promoter methylated, while the other was unmethylated across nine different datasets we studied. Moreover, one motif enriched in promoter methylated duplicate genes tended to be bound by the transcription repression factor FOXD3, whereas a motif enriched in the promoter unmethylated sequences interacted with the transcription activator Sp1, indicating a complex interaction between the genomic environment and epigenome. Besides, body-methylated genes showed longer length than body-unmethylated genes. Overall, our results suggest that DNA methylation is highly important in the differential expression and evolution of duplicate genes in zebrafish.
Collapse
|
24
|
Zhang Y, Wang P, Yan M. An Entropy-Based Position Projection Algorithm for Motif Discovery. BIOMED RESEARCH INTERNATIONAL 2016; 2016:9127474. [PMID: 27882329 PMCID: PMC5110948 DOI: 10.1155/2016/9127474] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2016] [Revised: 09/20/2016] [Accepted: 10/05/2016] [Indexed: 12/31/2022]
Abstract
Motif discovery problem is crucial for understanding the structure and function of gene expression. Over the past decades, many attempts using consensus and probability training model for motif finding are successful. However, the most existing motif discovery algorithms are still time-consuming or easily trapped in a local optimum. To overcome these shortcomings, in this paper, we propose an entropy-based position projection algorithm, called EPP, which designs a projection process to divide the dataset and explores the best local optimal solution. The experimental results on real DNA sequences, Tompa data, and ChIP-seq data show that EPP is advantageous in dealing with the motif discovery problem and outperforms current widely used algorithms.
Collapse
Affiliation(s)
- Yipu Zhang
- Department of Automation, School of Electronics and Control Engineering, Chang'An University, Xi'an 710064, China
| | - Ping Wang
- Department of Automation, School of Electronics and Control Engineering, Chang'An University, Xi'an 710064, China
| | - Maode Yan
- Department of Automation, School of Electronics and Control Engineering, Chang'An University, Xi'an 710064, China
| |
Collapse
|
25
|
Lin J, Hu Y, Nunez S, Foulkes AS, Cieply B, Xue C, Gerelus M, Li W, Zhang H, Rader DJ, Musunuru K, Li M, Reilly MP. Transcriptome-Wide Analysis Reveals Modulation of Human Macrophage Inflammatory Phenotype Through Alternative Splicing. Arterioscler Thromb Vasc Biol 2016; 36:1434-47. [PMID: 27230130 PMCID: PMC4919157 DOI: 10.1161/atvbaha.116.307573] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2016] [Accepted: 05/17/2016] [Indexed: 12/20/2022]
Abstract
OBJECTIVE Human macrophages can shift phenotype across the inflammatory M1 and reparative M2 spectrum in response to environmental challenges, but the mechanisms promoting inflammatory and cardiometabolic disease-associated M1 phenotypes remain incompletely understood. Alternative splicing (AS) is emerging as an important regulator of cellular function, yet its role in macrophage activation is largely unknown. We investigated the extent to which AS occurs in M1 activation within the cardiometabolic disease context and validated a functional genomic cell model for studying human macrophage-related AS events. APPROACH AND RESULTS From deep RNA-sequencing of resting, M1, and M2 primary human monocyte-derived macrophages, we found 3860 differentially expressed genes in M1 activation and detected 233 M1-induced AS events; the majority of AS events were cell- and M1-specific with enrichment for pathways relevant to macrophage inflammation. Using genetic variant data for 10 cardiometabolic traits, we identified 28 trait-associated variants within the genomic loci of 21 alternatively spliced genes and 15 variants within 7 differentially expressed regulatory splicing factors in M1 activation. Knockdown of 1 such splicing factor, CELF1, in primary human macrophages led to increased inflammatory response to M1 stimulation, demonstrating CELF1's potential modulation of the M1 phenotype. Finally, we demonstrated that an induced pluripotent stem cell-derived macrophage system recapitulates M1-associated AS events and provides a high-fidelity macrophage AS model. CONCLUSIONS AS plays a role in defining macrophage phenotype in a cell- and stimulus-specific fashion. Alternatively spliced genes and splicing factors with trait-associated variants may reveal novel pathways and targets in cardiometabolic diseases.
Collapse
Affiliation(s)
- Jennie Lin
- From the Renal, Electrolyte, and Hypertension Division, Department of Medicine, Perelman School of Medicine (J.L.), Department of Biostatistics and Epidemiology (Y.H., M.L.), Department of Genetics, Perelman School of Medicine (B.C., K.M., D.J.R.), and Cardiovascular Institute, Department of Medicine, Perelman School of Medicine (M.G., W.L., K.M.), University of Pennsylvania, Philadelphia; Irving Institute for Clinical and Translational Research (M.P.R.) and Division of Cardiology, Department of Medicine (C.X., H.Z., M.P.R.), Columbia University Medical Center, New York, NY; and Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA (S.N., A.S.F.).
| | - Yu Hu
- From the Renal, Electrolyte, and Hypertension Division, Department of Medicine, Perelman School of Medicine (J.L.), Department of Biostatistics and Epidemiology (Y.H., M.L.), Department of Genetics, Perelman School of Medicine (B.C., K.M., D.J.R.), and Cardiovascular Institute, Department of Medicine, Perelman School of Medicine (M.G., W.L., K.M.), University of Pennsylvania, Philadelphia; Irving Institute for Clinical and Translational Research (M.P.R.) and Division of Cardiology, Department of Medicine (C.X., H.Z., M.P.R.), Columbia University Medical Center, New York, NY; and Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA (S.N., A.S.F.)
| | - Sara Nunez
- From the Renal, Electrolyte, and Hypertension Division, Department of Medicine, Perelman School of Medicine (J.L.), Department of Biostatistics and Epidemiology (Y.H., M.L.), Department of Genetics, Perelman School of Medicine (B.C., K.M., D.J.R.), and Cardiovascular Institute, Department of Medicine, Perelman School of Medicine (M.G., W.L., K.M.), University of Pennsylvania, Philadelphia; Irving Institute for Clinical and Translational Research (M.P.R.) and Division of Cardiology, Department of Medicine (C.X., H.Z., M.P.R.), Columbia University Medical Center, New York, NY; and Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA (S.N., A.S.F.)
| | - Andrea S Foulkes
- From the Renal, Electrolyte, and Hypertension Division, Department of Medicine, Perelman School of Medicine (J.L.), Department of Biostatistics and Epidemiology (Y.H., M.L.), Department of Genetics, Perelman School of Medicine (B.C., K.M., D.J.R.), and Cardiovascular Institute, Department of Medicine, Perelman School of Medicine (M.G., W.L., K.M.), University of Pennsylvania, Philadelphia; Irving Institute for Clinical and Translational Research (M.P.R.) and Division of Cardiology, Department of Medicine (C.X., H.Z., M.P.R.), Columbia University Medical Center, New York, NY; and Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA (S.N., A.S.F.)
| | - Benjamin Cieply
- From the Renal, Electrolyte, and Hypertension Division, Department of Medicine, Perelman School of Medicine (J.L.), Department of Biostatistics and Epidemiology (Y.H., M.L.), Department of Genetics, Perelman School of Medicine (B.C., K.M., D.J.R.), and Cardiovascular Institute, Department of Medicine, Perelman School of Medicine (M.G., W.L., K.M.), University of Pennsylvania, Philadelphia; Irving Institute for Clinical and Translational Research (M.P.R.) and Division of Cardiology, Department of Medicine (C.X., H.Z., M.P.R.), Columbia University Medical Center, New York, NY; and Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA (S.N., A.S.F.)
| | - Chenyi Xue
- From the Renal, Electrolyte, and Hypertension Division, Department of Medicine, Perelman School of Medicine (J.L.), Department of Biostatistics and Epidemiology (Y.H., M.L.), Department of Genetics, Perelman School of Medicine (B.C., K.M., D.J.R.), and Cardiovascular Institute, Department of Medicine, Perelman School of Medicine (M.G., W.L., K.M.), University of Pennsylvania, Philadelphia; Irving Institute for Clinical and Translational Research (M.P.R.) and Division of Cardiology, Department of Medicine (C.X., H.Z., M.P.R.), Columbia University Medical Center, New York, NY; and Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA (S.N., A.S.F.)
| | - Mark Gerelus
- From the Renal, Electrolyte, and Hypertension Division, Department of Medicine, Perelman School of Medicine (J.L.), Department of Biostatistics and Epidemiology (Y.H., M.L.), Department of Genetics, Perelman School of Medicine (B.C., K.M., D.J.R.), and Cardiovascular Institute, Department of Medicine, Perelman School of Medicine (M.G., W.L., K.M.), University of Pennsylvania, Philadelphia; Irving Institute for Clinical and Translational Research (M.P.R.) and Division of Cardiology, Department of Medicine (C.X., H.Z., M.P.R.), Columbia University Medical Center, New York, NY; and Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA (S.N., A.S.F.)
| | - Wenjun Li
- From the Renal, Electrolyte, and Hypertension Division, Department of Medicine, Perelman School of Medicine (J.L.), Department of Biostatistics and Epidemiology (Y.H., M.L.), Department of Genetics, Perelman School of Medicine (B.C., K.M., D.J.R.), and Cardiovascular Institute, Department of Medicine, Perelman School of Medicine (M.G., W.L., K.M.), University of Pennsylvania, Philadelphia; Irving Institute for Clinical and Translational Research (M.P.R.) and Division of Cardiology, Department of Medicine (C.X., H.Z., M.P.R.), Columbia University Medical Center, New York, NY; and Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA (S.N., A.S.F.)
| | - Hanrui Zhang
- From the Renal, Electrolyte, and Hypertension Division, Department of Medicine, Perelman School of Medicine (J.L.), Department of Biostatistics and Epidemiology (Y.H., M.L.), Department of Genetics, Perelman School of Medicine (B.C., K.M., D.J.R.), and Cardiovascular Institute, Department of Medicine, Perelman School of Medicine (M.G., W.L., K.M.), University of Pennsylvania, Philadelphia; Irving Institute for Clinical and Translational Research (M.P.R.) and Division of Cardiology, Department of Medicine (C.X., H.Z., M.P.R.), Columbia University Medical Center, New York, NY; and Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA (S.N., A.S.F.)
| | - Daniel J Rader
- From the Renal, Electrolyte, and Hypertension Division, Department of Medicine, Perelman School of Medicine (J.L.), Department of Biostatistics and Epidemiology (Y.H., M.L.), Department of Genetics, Perelman School of Medicine (B.C., K.M., D.J.R.), and Cardiovascular Institute, Department of Medicine, Perelman School of Medicine (M.G., W.L., K.M.), University of Pennsylvania, Philadelphia; Irving Institute for Clinical and Translational Research (M.P.R.) and Division of Cardiology, Department of Medicine (C.X., H.Z., M.P.R.), Columbia University Medical Center, New York, NY; and Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA (S.N., A.S.F.)
| | - Kiran Musunuru
- From the Renal, Electrolyte, and Hypertension Division, Department of Medicine, Perelman School of Medicine (J.L.), Department of Biostatistics and Epidemiology (Y.H., M.L.), Department of Genetics, Perelman School of Medicine (B.C., K.M., D.J.R.), and Cardiovascular Institute, Department of Medicine, Perelman School of Medicine (M.G., W.L., K.M.), University of Pennsylvania, Philadelphia; Irving Institute for Clinical and Translational Research (M.P.R.) and Division of Cardiology, Department of Medicine (C.X., H.Z., M.P.R.), Columbia University Medical Center, New York, NY; and Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA (S.N., A.S.F.)
| | - Mingyao Li
- From the Renal, Electrolyte, and Hypertension Division, Department of Medicine, Perelman School of Medicine (J.L.), Department of Biostatistics and Epidemiology (Y.H., M.L.), Department of Genetics, Perelman School of Medicine (B.C., K.M., D.J.R.), and Cardiovascular Institute, Department of Medicine, Perelman School of Medicine (M.G., W.L., K.M.), University of Pennsylvania, Philadelphia; Irving Institute for Clinical and Translational Research (M.P.R.) and Division of Cardiology, Department of Medicine (C.X., H.Z., M.P.R.), Columbia University Medical Center, New York, NY; and Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA (S.N., A.S.F.)
| | - Muredach P Reilly
- From the Renal, Electrolyte, and Hypertension Division, Department of Medicine, Perelman School of Medicine (J.L.), Department of Biostatistics and Epidemiology (Y.H., M.L.), Department of Genetics, Perelman School of Medicine (B.C., K.M., D.J.R.), and Cardiovascular Institute, Department of Medicine, Perelman School of Medicine (M.G., W.L., K.M.), University of Pennsylvania, Philadelphia; Irving Institute for Clinical and Translational Research (M.P.R.) and Division of Cardiology, Department of Medicine (C.X., H.Z., M.P.R.), Columbia University Medical Center, New York, NY; and Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA (S.N., A.S.F.).
| |
Collapse
|
26
|
Regulation of normal B-cell differentiation and malignant B-cell survival by OCT2. Proc Natl Acad Sci U S A 2016; 113:E2039-46. [PMID: 26993806 DOI: 10.1073/pnas.1600557113] [Citation(s) in RCA: 65] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The requirement for the B-cell transcription factor OCT2 (octamer-binding protein 2, encoded by Pou2f2) in germinal center B cells has proved controversial. Here, we report that germinal center B cells are formed normally after depletion of OCT2 in a conditional knockout mouse, but their proliferation is reduced and in vivo differentiation to antibody-secreting plasma cells is blocked. This finding led us to examine the role of OCT2 in germinal center-derived lymphomas. shRNA knockdown showed that almost all diffuse large B-cell lymphoma (DLBCL) cell lines are addicted to the expression of OCT2 and its coactivator OCA-B. Genome-wide chromatin immunoprecipitation (ChIP) analysis and gene-expression profiling revealed the broad transcriptional program regulated by OCT2 that includes the expression of STAT3, IL-10, ELL2, XBP1, MYC, TERT, and ADA. Importantly, genetic alteration of OCT2 is not a requirement for cellular addiction in DLBCL. However, we detected amplifications of the POU2F2 locus in DLBCL tumor biopsies and a recurrent mutation of threonine 223 in the DNA-binding domain of OCT2. This neomorphic mutation subtly alters the DNA-binding preference of OCT2, leading to the transactivation of noncanonical target genes including HIF1a and FCRL3 Finally, by introducing mutations designed to disrupt the OCT2-OCA-B interface, we reveal a requirement for this protein-protein interface that ultimately might be exploited therapeutically. Our findings, combined with the predominantly B-cell-restricted expression of OCT2 and the absence of a systemic phenotype in our knockout mice, suggest that an OCT2-targeted therapeutic strategy would be efficacious in both major subtypes of DLBCL while avoiding systemic toxicity.
Collapse
|
27
|
Intracellular Dynamics of Synucleins: "Here, There and Everywhere". INTERNATIONAL REVIEW OF CELL AND MOLECULAR BIOLOGY 2015; 320:103-69. [PMID: 26614873 DOI: 10.1016/bs.ircmb.2015.07.007] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Synucleins are small, soluble proteins expressed primarily in neural tissue and in certain tumors. The synuclein family consists of three members: α-, β-, and γ-synucleins present only in vertebrates. Members of the synuclein family have high sequence identity, especially in the N-terminal regions. The synuclein gene family came into the spotlight, when one of its members, α-synuclein, was found to be associated with Parkinson's disease and other neurodegenerative disorders, whereas γ-synuclein was linked to several forms of cancer. There are a lot of controversy and exciting debates concerning members of the synuclein family, including their normal functions, toxicity, role in pathology, transmission between cells and intracellular localization. Important findings which remain undisputable for many years are synuclein localization in synapses and their role in the regulation of synaptic vesicle trafficking, whereas their presence and function in mitochondria and nucleus is a debated topic. In this review, we present the data on the localization of synucleins in two intracellular organelles: the nucleus and mitochondria.
Collapse
|
28
|
Zhang Y, Wang P. A Fast Cluster Motif Finding Algorithm for ChIP-Seq Data Sets. BIOMED RESEARCH INTERNATIONAL 2015; 2015:218068. [PMID: 26236718 PMCID: PMC4509496 DOI: 10.1155/2015/218068] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/08/2015] [Accepted: 06/04/2015] [Indexed: 11/17/2022]
Abstract
New high-throughput technique ChIP-seq, coupling chromatin immunoprecipitation experiment with high-throughput sequencing technologies, has extended the identification of binding locations of a transcription factor to the genome-wide regions. However, the most existing motif discovery algorithms are time-consuming and limited to identify binding motifs in ChIP-seq data which normally has the significant characteristics of large scale data. In order to improve the efficiency, we propose a fast cluster motif finding algorithm, named as FCmotif, to identify the (l, d) motifs in large scale ChIP-seq data set. It is inspired by the emerging substrings mining strategy to find the enriched substrings and then searching the neighborhood instances to construct PWM and cluster motifs in different length. FCmotif is not following the OOPS model constraint and can find long motifs. The effectiveness of proposed algorithm has been proved by experiments on the ChIP-seq data sets from mouse ES cells. The whole detection of the real binding motifs and processing of the full size data of several megabytes finished in a few minutes. The experimental results show that FCmotif has advantageous to deal with the (l, d) motif finding in the ChIP-seq data; meanwhile it also demonstrates better performance than other current widely-used algorithms such as MEME, Weeder, ChIPMunk, and DREME.
Collapse
Affiliation(s)
- Yipu Zhang
- Department of Automation, School of Electronics and Control Engineering, Chang'An University, Xi'an 710064, China
| | - Ping Wang
- Department of Automation, School of Electronics and Control Engineering, Chang'An University, Xi'an 710064, China
| |
Collapse
|
29
|
Savojardo C, Martelli PL, Fariselli P, Casadio R. TPpred3 detects and discriminates mitochondrial and chloroplastic targeting peptides in eukaryotic proteins. Bioinformatics 2015; 31:3269-75. [PMID: 26079349 DOI: 10.1093/bioinformatics/btv367] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2015] [Accepted: 06/08/2015] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Molecular recognition of N-terminal targeting peptides is the most common mechanism controlling the import of nuclear-encoded proteins into mitochondria and chloroplasts. When experimental information is lacking, computational methods can annotate targeting peptides, and determine their cleavage sites for characterizing protein localization, function, and mature protein sequences. The problem of discriminating mitochondrial from chloroplastic propeptides is particularly relevant when annotating proteomes of photosynthetic Eukaryotes, endowed with both types of sequences. RESULTS Here, we introduce TPpred3, a computational method that given any Eukaryotic protein sequence performs three different tasks: (i) the detection of targeting peptides; (ii) their classification as mitochondrial or chloroplastic and (iii) the precise localization of the cleavage sites in an organelle-specific framework. Our implementation is based on our TPpred previously introduced. Here, we integrate a new N-to-1 Extreme Learning Machine specifically designed for the classification task (ii). For the last task, we introduce an organelle-specific Support Vector Machine that exploits sequence motifs retrieved with an extensive motif-discovery analysis of a large set of mitochondrial and chloroplastic proteins. We show that TPpred3 outperforms the state-of-the-art methods in all the three tasks. AVAILABILITY AND IMPLEMENTATION The method server and datasets are available at http://tppred3.biocomp.unibo.it. CONTACT gigi@biocomp.unibo.it SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, University of Bologna, Department of Biology, 40126 Bologna, Italy and
| | - Pier Luigi Martelli
- Biocomputing Group, University of Bologna, Department of Biology, 40126 Bologna, Italy and
| | - Piero Fariselli
- Biocomputing Group, University of Bologna, Department of Biology, 40126 Bologna, Italy and Department of Computer Science and Engineering, University of Bologna, 40127 Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, University of Bologna, Department of Biology, 40126 Bologna, Italy and
| |
Collapse
|
30
|
Lihu A, Holban T. A review of ensemble methods for de novo motif discovery in ChIP-Seq data. Brief Bioinform 2015; 16:964-73. [DOI: 10.1093/bib/bbv022] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Indexed: 01/17/2023] Open
|
31
|
Ikebata H, Yoshida R. Repulsive parallel MCMC algorithm for discovering diverse motifs from large sequence sets. Bioinformatics 2015; 31:1561-8. [PMID: 25583120 PMCID: PMC4426842 DOI: 10.1093/bioinformatics/btv017] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2014] [Accepted: 01/06/2015] [Indexed: 11/14/2022] Open
Abstract
Motivation: The motif discovery problem consists of finding recurring patterns of short strings in a set of nucleotide sequences. This classical problem is receiving renewed attention as most early motif discovery methods lack the ability to handle large data of recent genome-wide ChIP studies. New ChIP-tailored methods focus on reducing computation time and pay little regard to the accuracy of motif detection. Unlike such methods, our method focuses on increasing the detection accuracy while maintaining the computation efficiency at an acceptable level. The major advantage of our method is that it can mine diverse multiple motifs undetectable by current methods. Results: The repulsive parallel Markov chain Monte Carlo (RPMCMC) algorithm that we propose is a parallel version of the widely used Gibbs motif sampler. RPMCMC is run on parallel interacting motif samplers. A repulsive force is generated when different motifs produced by different samplers near each other. Thus, different samplers explore different motifs. In this way, we can detect much more diverse motifs than conventional methods can. Through application to 228 transcription factor ChIP-seq datasets of the ENCODE project, we show that the RPMCMC algorithm can find many reliable cofactor interacting motifs that existing methods are unable to discover. Availability and implementation: A C++ implementation of RPMCMC and discovered cofactor motifs for the 228 ENCODE ChIP-seq datasets are available from http://daweb.ism.ac.jp/yoshidalab/motif. Contact:ikebata.hisaki@ism.ac.jp, yoshidar@ism.ac.jp Supplementary information:Supplementary data are available from Bioinformatics online.
Collapse
Affiliation(s)
- Hisaki Ikebata
- Department of Statistical Science, The Graduate University for Advanced Studies (Sokendai), 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan, Department of Statistical Modeling, The Institute of Statistical Mathematics, Research Organization of Information and Systems, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan, JST-CREST, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan, JST-ERATO Sato Live Bio-Forecasting Project, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Khoto-fu 619-0288, Japan and The Thomas N. Sato BioMEC-X Laboratories, Advanced Telecommunications Research Institute International, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Khoto-fu 619-0288, Japan
| | - Ryo Yoshida
- Department of Statistical Science, The Graduate University for Advanced Studies (Sokendai), 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan, Department of Statistical Modeling, The Institute of Statistical Mathematics, Research Organization of Information and Systems, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan, JST-CREST, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan, JST-ERATO Sato Live Bio-Forecasting Project, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Khoto-fu 619-0288, Japan and The Thomas N. Sato BioMEC-X Laboratories, Advanced Telecommunications Research Institute International, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Khoto-fu 619-0288, Japan Department of Statistical Science, The Graduate University for Advanced Studies (Sokendai), 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan, Department of Statistical Modeling, The Institute of Statistical Mathematics, Research Organization of Information and Systems, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan, JST-CREST, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan, JST-ERATO Sato Live Bio-Forecasting Project, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Khoto-fu 619-0288, Japan and The Thomas N. Sato BioMEC-X Laboratories, Advanced Telecommunications Research Institute International, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Khoto-fu 619-0288, Japan Department of Statistical Science, The Graduate University for Advanced Studies (Sokendai), 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan, Department of Statistical Modeling, The Institute of Statistical Mathematics, Research Organization of Information and Systems, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan, JST-CREST, 10-3 Midori-cho, Tachikawa, Tokyo 190-8562, Japan, JST-ERATO Sato Live Bio-Forecasting Project, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Khoto-fu 619-0288, Japan and The Thomas N. Sato BioMEC-X Laboratories, Advanced Telecommunications Research Institute International, 2-2-2 Hikaridai Seika-cho, Soraku-gun, Khoto-fu 619-0288, Japan Depar
| |
Collapse
|
32
|
Milse J, Petri K, Rückert C, Kalinowski J. Transcriptional response of Corynebacterium glutamicum ATCC 13032 to hydrogen peroxide stress and characterization of the OxyR regulon. J Biotechnol 2014; 190:40-54. [DOI: 10.1016/j.jbiotec.2014.07.452] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2014] [Revised: 07/22/2014] [Accepted: 07/29/2014] [Indexed: 11/26/2022]
|
33
|
Abstract
Motivation: The Expectation–Maximization (EM) algorithm has been successfully applied to the problem of transcription factor binding site (TFBS) motif discovery and underlies the most widely used motif discovery algorithms. In the wider field of probabilistic modelling, the stochastic EM (sEM) algorithm has been used to overcome some of the limitations of the EM algorithm; however, the application of sEM to motif discovery has not been fully explored. Results: We present MITSU (Motif discovery by ITerative Sampling and Updating), a novel algorithm for motif discovery, which combines sEM with an improved approximation to the likelihood function, which is unconstrained with regard to the distribution of motif occurrences within the input dataset. The algorithm is evaluated quantitatively on realistic synthetic data and several collections of characterized prokaryotic TFBS motifs and shown to outperform EM and an alternative sEM-based algorithm, particularly in terms of site-level positive predictive value. Availability and implementation: Java executable available for download at http://www.sourceforge.net/p/mitsu-motif/, supported on Linux/OS X. Contact:a.m.kilpatrick@sms.ed.ac.uk
Collapse
Affiliation(s)
- Alastair M Kilpatrick
- School of Informatics, University of Edinburgh, Informatics Forum, Edinburgh EH8 9AB, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JR and MRC Human Genetics Unit, IGMM, University of Edinburgh, Western General Hospital, Edinburgh EH4 2XU, UK
| | - Bruce Ward
- School of Informatics, University of Edinburgh, Informatics Forum, Edinburgh EH8 9AB, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JR and MRC Human Genetics Unit, IGMM, University of Edinburgh, Western General Hospital, Edinburgh EH4 2XU, UK
| | - Stuart Aitken
- School of Informatics, University of Edinburgh, Informatics Forum, Edinburgh EH8 9AB, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3JR and MRC Human Genetics Unit, IGMM, University of Edinburgh, Western General Hospital, Edinburgh EH4 2XU, UK
| |
Collapse
|
34
|
Reyes-Herrera PH, Ficarra E. Computational Methods for CLIP-seq Data Processing. Bioinform Biol Insights 2014; 8:199-207. [PMID: 25336930 PMCID: PMC4196881 DOI: 10.4137/bbi.s16803] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2014] [Revised: 07/29/2014] [Accepted: 08/01/2014] [Indexed: 12/25/2022] Open
Abstract
RNA-binding proteins (RBPs) are at the core of post-transcriptional regulation and thus of gene expression control at the RNA level. One of the principal challenges in the field of gene expression regulation is to understand RBPs mechanism of action. As a result of recent evolution of experimental techniques, it is now possible to obtain the RNA regions recognized by RBPs on a transcriptome-wide scale. In fact, CLIP-seq protocols use the joint action of CLIP, crosslinking immunoprecipitation, and high-throughput sequencing to recover the transcriptome-wide set of interaction regions for a particular protein. Nevertheless, computational methods are necessary to process CLIP-seq experimental data and are a key to advancement in the understanding of gene regulatory mechanisms. Considering the importance of computational methods in this area, we present a review of the current status of computational approaches used and proposed for CLIP-seq data.
Collapse
Affiliation(s)
- Paula H Reyes-Herrera
- Facultad de Ingeniería Electrónica y Biomédica, Universidad Antonio Nariño, Bogotá, Colombia
| | - Elisa Ficarra
- Department of Control and Computer Engineering, Politecnico di Torino, TO, Italy
| |
Collapse
|
35
|
Genome-wide profiling of untranslated regions by paired-end ditag sequencing reveals unexpected transcriptome complexity in yeast. Mol Genet Genomics 2014; 290:217-24. [PMID: 25213602 DOI: 10.1007/s00438-014-0913-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2014] [Accepted: 09/01/2014] [Indexed: 01/01/2023]
Abstract
The identification of structural and functional elements encoded in a genome is a challenging task. Although the transcriptome of budding yeast has been extensively analyzed, the boundaries and untranslated regions of yeast genes remain elusive. To address this least-explored field of yeast genomics, we performed a transcript profiling analysis through paired-end ditag (PET) approach coupled with deep sequencing. With 562,133 PET sequences we accurately defined the boundaries and untranslated regions of 3,409 ORFs, suggesting many yeast genes have multiple transcription start sites (TSSs). We also identified 85 previously uncharacterized transcripts either in intergenic regions or from the opposite strand of reported genomic features. Furthermore, our data revealed the extensive 3' end heterogeneity of yeast genes and identified a novel putative motif for polyadenylation. Our results indicate the yeast transcriptome is more complex than expected. This study would serve as an invaluable resource for elucidating the regulation and evolution of yeast genes.
Collapse
|
36
|
Stoeckius M, Grün D, Kirchner M, Ayoub S, Torti F, Piano F, Herzog M, Selbach M, Rajewsky N. Global characterization of the oocyte-to-embryo transition in Caenorhabditis elegans uncovers a novel mRNA clearance mechanism. EMBO J 2014; 33:1751-66. [PMID: 24957527 DOI: 10.15252/embj.201488769] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
The oocyte-to-embryo transition (OET) is thought to be mainly driven by post-transcriptional gene regulation. However, expression of both RNAs and proteins during the OET has not been comprehensively assayed. Furthermore, specific molecular mechanisms that regulate gene expression during OET are largely unknown. Here, we quantify and analyze transcriptome-wide, expression of mRNAs and thousands of proteins in Caenorhabditis elegans oocytes, 1-cell, and 2-cell embryos. This represents a first comprehensive gene expression atlas during the OET in animals. We discovered a first wave of degradation in which thousands of mRNAs are cleared shortly after fertilization. Sequence analysis revealed a statistically highly significant presence of a polyC motif in the 3' untranslated regions of most of these degraded mRNAs. Transgenic reporter assays demonstrated that this polyC motif is required and sufficient for mRNA degradation after fertilization. We show that orthologs of human polyC-binding protein specifically bind this motif. Our data suggest a mechanism in which the polyC motif and binding partners direct degradation of maternal mRNAs. Our data also indicate that endogenous siRNAs but not miRNAs promote mRNA clearance during the OET.
Collapse
Affiliation(s)
- Marlon Stoeckius
- Systems Biology of Gene Regulatory Elements, Max Delbrück Center Berlin, Berlin, Germany
| | - Dominic Grün
- Systems Biology of Gene Regulatory Elements, Max Delbrück Center Berlin, Berlin, Germany
| | - Marieluise Kirchner
- Cell Signalling and Mass Spectrometry, Max Delbrück Center Berlin, Berlin, Germany
| | - Salah Ayoub
- Systems Biology of Gene Regulatory Elements, Max Delbrück Center Berlin, Berlin, Germany
| | - Francesca Torti
- Systems Biology of Gene Regulatory Elements, Max Delbrück Center Berlin, Berlin, Germany
| | - Fabio Piano
- Center for Genomics and Systems Biology, Department of Biology New York University, New York, NY, USA Division of Science, New York University Abu Dhabi, Abu Dhabi, United Arab Emirates
| | - Margareta Herzog
- Systems Biology of Gene Regulatory Elements, Max Delbrück Center Berlin, Berlin, Germany
| | - Matthias Selbach
- Cell Signalling and Mass Spectrometry, Max Delbrück Center Berlin, Berlin, Germany
| | - Nikolaus Rajewsky
- Systems Biology of Gene Regulatory Elements, Max Delbrück Center Berlin, Berlin, Germany
| |
Collapse
|
37
|
Azmi AM, Al-Ssulami A. Encoded expansion: an efficient algorithm to discover identical string motifs. PLoS One 2014; 9:e95148. [PMID: 24871320 PMCID: PMC4037181 DOI: 10.1371/journal.pone.0095148] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2013] [Accepted: 03/24/2014] [Indexed: 11/19/2022] Open
Abstract
A major task in computational biology is the discovery of short recurring string patterns known as motifs. Most of the schemes to discover motifs are either stochastic or combinatorial in nature. Stochastic approaches do not guarantee finding the correct motifs, while the combinatorial schemes tend to have an exponential time complexity with respect to motif length. To alleviate the cost, the combinatorial approach exploits dynamic data structures such as trees or graphs. Recently (Karci (2009) Efficient automatic exact motif discovery algorithms for biological sequences, Expert Systems with Applications 36:7952-7963) devised a deterministic algorithm that finds all the identical copies of string motifs of all sizes [Formula: see text] in theoretical time complexity of [Formula: see text] and a space complexity of [Formula: see text] where [Formula: see text] is the length of the input sequence and [Formula: see text] is the length of the longest possible string motif. In this paper, we present a significant improvement on Karci's original algorithm. The algorithm that we propose reports all identical string motifs of sizes [Formula: see text] that occur at least [Formula: see text] times. Our algorithm starts with string motifs of size 2, and at each iteration it expands the candidate string motifs by one symbol throwing out those that occur less than [Formula: see text] times in the entire input sequence. We use a simple array and data encoding to achieve theoretical worst-case time complexity of [Formula: see text] and a space complexity of [Formula: see text] Encoding of the substrings can speed up the process of comparison between string motifs. Experimental results on random and real biological sequences confirm that our algorithm has indeed a linear time complexity and it is more scalable in terms of sequence length than the existing algorithms.
Collapse
Affiliation(s)
- Aqil M. Azmi
- Department of Computer Science, College of Computer & Information Sciences, King Saud University, Riyadh, Saudi Arabia
- * E-mail:
| | - Abdulrakeeb Al-Ssulami
- Department of Computer Science, College of Computer & Information Sciences, King Saud University, Riyadh, Saudi Arabia
| |
Collapse
|
38
|
Abstract
The evolutionary mechanisms underlying duplicate gene maintenance and divergence remain highly debated. Epigenetic modifications, such as DNA methylation, may contribute to duplicate gene evolution by facilitating tissue-specific regulation. However, the role of epigenetic divergence on duplicate gene evolution remains little understood. Here we show, using comprehensive data across 10 diverse human tissues, that DNA methylation plays critical roles in several aspects of duplicate gene evolution. We first demonstrate that duplicate genes are initially heavily methylated, before gradually losing DNA methylation as they age. Within each pair, DNA methylation divergence between duplicate partners increases with evolutionary age. Importantly, tissue-specific DNA methylation of duplicates correlates with tissue-specific expression, implicating DNA methylation as a causative factor for functional divergence of duplicate genes. These patterns are apparent in promoters but not in gene bodies, in accord with the complex relationship between gene-body DNA methylation and transcription. Remarkably, many duplicate gene pairs exhibit consistent division of DNA methylation across multiple, divergent tissues: For the majority (73%) of duplicate gene pairs, one partner is always hypermethylated compared with the other. This is indicative of a common underlying determinant of DNA methylation. The division of DNA methylation is also consistent with their chromatin accessibility profiles. Moreover, at least two sequence motifs known to interact with the Sp1 transcription factor mark promoters of more hypomethylated duplicate partners. These results demonstrate critical roles of DNA methylation, as well as complex interaction between genome and epigenome, on duplicate gene evolution.
Collapse
|
39
|
Reid JE, Wernisch L. STEME: a robust, accurate motif finder for large data sets. PLoS One 2014; 9:e90735. [PMID: 24625410 PMCID: PMC3953122 DOI: 10.1371/journal.pone.0090735] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2014] [Accepted: 02/04/2014] [Indexed: 11/19/2022] Open
Abstract
Motif finding is a difficult problem that has been studied for over 20 years. Some older popular motif finders are not suitable for analysis of the large data sets generated by next-generation sequencing. We recently published an efficient approximation (STEME) to the EM algorithm that is at the core of many motif finders such as MEME. This approximation allows the EM algorithm to be applied to large data sets. In this work we describe several efficient extensions to STEME that are based on the MEME algorithm. Together with the original STEME EM approximation, these extensions make STEME a fully-fledged motif finder with similar properties to MEME. We discuss the difficulty of objectively comparing motif finders. We show that STEME performs comparably to existing prominent discriminative motif finders, DREME and Trawler, on 13 sets of transcription factor binding data in mouse ES cells. We demonstrate the ability of STEME to find long degenerate motifs which these discriminative motif finders do not find. As part of our method, we extend an earlier method due to Nagarajan et al. for the efficient calculation of motif E-values. STEME's source code is available under an open source license and STEME is available via a web interface.
Collapse
Affiliation(s)
- John E. Reid
- MRC Biostatistics Unit, Institute of Public Health, Cambridge, United Kingdom
- * E-mail:
| | - Lorenz Wernisch
- MRC Biostatistics Unit, Institute of Public Health, Cambridge, United Kingdom
| |
Collapse
|
40
|
Abstract
MOTIVATION Identifying regulatory elements is a fundamental problem in the field of gene transcription. Motif discovery-the task of identifying the sequence preference of transcription factor proteins, which bind to these elements-is an important step in this challenge. MEME is a popular motif discovery algorithm. Unfortunately, MEME's running time scales poorly with the size of the dataset. Experiments such as ChIP-Seq and DNase-Seq are providing a rich amount of information on the binding preference of transcription factors. MEME cannot discover motifs in data from these experiments in a practical amount of time without a compromising strategy such as discarding a majority of the sequences. RESULTS We present EXTREME, a motif discovery algorithm designed to find DNA-binding motifs in ChIP-Seq and DNase-Seq data. Unlike MEME, which uses the expectation-maximization algorithm for motif discovery, EXTREME uses the online expectation-maximization algorithm to discover motifs. EXTREME can discover motifs in large datasets in a practical amount of time without discarding any sequences. Using EXTREME on ChIP-Seq and DNase-Seq data, we discover many motifs, including some novel and infrequent motifs that can only be discovered by using the entire dataset. Conservation analysis of one of these novel infrequent motifs confirms that it is evolutionarily conserved and possibly functional. AVAILABILITY AND IMPLEMENTATION All source code is available at the Github repository http://github.com/uci-cbcl/EXTREME.
Collapse
Affiliation(s)
- Daniel Quang
- Department of Computer Science, University of California, Irvine, CA 92697, USA and Center for Complex Biological Systems, University of California, Irvine, CA 92697, USADepartment of Computer Science, University of California, Irvine, CA 92697, USA and Center for Complex Biological Systems, University of California, Irvine, CA 92697, USA
| | - Xiaohui Xie
- Department of Computer Science, University of California, Irvine, CA 92697, USA and Center for Complex Biological Systems, University of California, Irvine, CA 92697, USADepartment of Computer Science, University of California, Irvine, CA 92697, USA and Center for Complex Biological Systems, University of California, Irvine, CA 92697, USA
| |
Collapse
|
41
|
Cilingir G, Lau AO, Broschat SL. ApicoAMP: The first computational model for identifying apicoplast-targeted transmembrane proteins in Apicomplexa. J Microbiol Methods 2013; 95:313-9. [DOI: 10.1016/j.mimet.2013.09.017] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2013] [Revised: 09/22/2013] [Accepted: 09/23/2013] [Indexed: 10/26/2022]
|
42
|
Sompallae R, Hofmann O, Maher CA, Gedye C, Behren A, Vitezic M, Daub CO, Devalle S, Caballero OL, Carninci P, Hayashizaki Y, Lawlor ER, Cebon J, Hide W. A comprehensive promoter landscape identifies a novel promoter for CD133 in restricted tissues, cancers, and stem cells. Front Genet 2013; 4:209. [PMID: 24194746 PMCID: PMC3810939 DOI: 10.3389/fgene.2013.00209] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2013] [Accepted: 09/30/2013] [Indexed: 12/02/2022] Open
Abstract
PROM1 is the gene encoding prominin-1 or CD133, an important cell surface marker for the isolation of both normal and cancer stem cells. PROM1 transcripts initiate at a range of transcription start sites (TSS) associated with distinct tissue and cancer expression profiles. Using high resolution Cap Analysis of Gene Expression (CAGE) sequencing we characterize TSS utilization across a broad range of normal and developmental tissues. We identify a novel proximal promoter (P6) within CD133+ melanoma cell lines and stem cells. Additional exon array sampling finds P6 to be active in populations enriched for mesenchyme, neural stem cells and within CD133+ enriched Ewing sarcomas. The P6 promoter is enriched with respect to previously characterized PROM1 promoters for a HMGI/Y (HMGA1) family transcription factor binding site motif and exhibits different epigenetic modifications relative to the canonical promoter region of PROM1.
Collapse
|
43
|
Murigneux V, Saulière J, Roest Crollius H, Le Hir H. Transcriptome-wide identification of RNA binding sites by CLIP-seq. Methods 2013; 63:32-40. [DOI: 10.1016/j.ymeth.2013.03.022] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2013] [Revised: 03/19/2013] [Accepted: 03/21/2013] [Indexed: 11/25/2022] Open
|
44
|
Li G, Zhou L. Genome-wide identification of chromatin transitional regions reveals diverse mechanisms defining the boundary of facultative heterochromatin. PLoS One 2013; 8:e67156. [PMID: 23840609 PMCID: PMC3696093 DOI: 10.1371/journal.pone.0067156] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2013] [Accepted: 05/14/2013] [Indexed: 11/18/2022] Open
Abstract
Due to the self-propagating nature of the heterochromatic modification H3K27me3, chromatin barrier activities are required to demarcate the boundary and prevent it from encroaching into euchromatic regions. Studies in Drosophila and vertebrate systems have revealed several important chromatin barrier elements and their respective binding factors. However, epigenomic data indicate that the binding of these factors are not exclusive to chromatin boundaries. To gain a comprehensive understanding of facultative heterochromatin boundaries, we developed a two-tiered method to identify the Chromatin Transitional Region (CTR), i.e. the nucleosomal region that shows the greatest transition rate of the H3K27me3 modification as revealed by ChIP-Seq. This approach was applied to identify CTRs in Drosophila S2 cells and human HeLa cells. Although many insulator proteins have been characterized in Drosophila, less than half of the CTRs in S2 cells are associated with known insulator proteins, indicating unknown mechanisms remain to be characterized. Our analysis also revealed that the peak binding of insulator proteins are usually 1–2 nucleosomes away from the CTR. Comparison of CTR-associated insulator protein binding sites vs. those in heterochromatic region revealed that boundary-associated binding sites are distinctively flanked by nucleosome destabilizing sequences, which correlates with significant decreased nucleosome density and increased binding intensities of co-factors. Interestingly, several subgroups of boundaries have enhanced H3.3 incorporation but reduced nucleosome turnover rate. Our genome-wide study reveals that diverse mechanisms are employed to define the boundaries of facultative heterochromatin. In both Drosophila and mammalian systems, only a small fraction of insulator protein binding sites co-localize with H3K27me3 boundaries. However, boundary-associated insulator binding sites are distinctively flanked by nucleosome destabilizing sequences, which correlates with significantly decreased nucleosome density and increased binding of co-factors.
Collapse
Affiliation(s)
- Guangyao Li
- Graduate Program in Genetics and Genomics, University of Florida Genetics Institute; Department of Molecular Genetics and Microbiology & University of Florida Shands Cancer Center, College of Medicine, University of Florida. Gainesville, Florida, United States of America
| | - Lei Zhou
- Graduate Program in Genetics and Genomics, University of Florida Genetics Institute; Department of Molecular Genetics and Microbiology & University of Florida Shands Cancer Center, College of Medicine, University of Florida. Gainesville, Florida, United States of America
- * E-mail:
| |
Collapse
|
45
|
Kilpatrick AM, Ward B, Aitken S. MCOIN: a novel heuristic for determining transcription factor binding site motif width. Algorithms Mol Biol 2013; 8:16. [PMID: 23806098 PMCID: PMC3716798 DOI: 10.1186/1748-7188-8-16] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Accepted: 06/24/2013] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND In transcription factor binding site discovery, the true width of the motif to be discovered is generally not known a priori. The ability to compute the most likely width of a motif is therefore a highly desirable property for motif discovery algorithms. However, this is a challenging computational problem as a result of changing model dimensionality at changing motif widths. The complexity of the problem is increased as the discovered model at the true motif width need not be the most statistically significant in a set of candidate motif models. Further, the core motif discovery algorithm used cannot guarantee to return the best possible result at each candidate width. RESULTS We present MCOIN, a novel heuristic for automatically determining transcription factor binding site motif width, based on motif containment and information content. Using realistic synthetic data and previously characterised prokaryotic data, we show that MCOIN outperforms the current most popular method (E-value of the resulting multiple alignment) as a predictor of motif width, based on mean absolute error. MCOIN is also shown to choose models which better match known sites at higher levels of motif conservation, based on ROC analysis. CONCLUSIONS We demonstrate the performance of MCOIN as part of a deterministic motif discovery algorithm and conclude that MCOIN outperforms current methods for determining motif width.
Collapse
Affiliation(s)
- Alastair M Kilpatrick
- School of Informatics, University of Edinburgh, Informatics Forum, 10 Crichton Street, EH8 9AB Edinburgh, Scotland
| | - Bruce Ward
- School of Biological Sciences, University of Edinburgh, Darwin Building, King’s Buildings, Mayfield Road, EH9 3JR Edinburgh, Scotland
| | - Stuart Aitken
- School of Informatics, University of Edinburgh, Informatics Forum, 10 Crichton Street, EH8 9AB Edinburgh, Scotland
| |
Collapse
|
46
|
Han J, Back SH, Hur J, Lin YH, Gildersleeve R, Shan J, Yuan CL, Krokowski D, Wang S, Hatzoglou M, Kilberg MS, Sartor MA, Kaufman RJ. ER-stress-induced transcriptional regulation increases protein synthesis leading to cell death. Nat Cell Biol 2013; 15:481-90. [PMID: 23624402 DOI: 10.1038/ncb2738] [Citation(s) in RCA: 1214] [Impact Index Per Article: 110.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2012] [Accepted: 03/18/2013] [Indexed: 02/07/2023]
Abstract
Protein misfolding in the endoplasmic reticulum (ER) leads to cell death through PERK-mediated phosphorylation of eIF2α, although the mechanism is not understood. ChIP-seq and mRNA-seq of activating transcription factor 4 (ATF4) and C/EBP homologous protein (CHOP), key transcription factors downstream of p-eIF2α, demonstrated that they interact to directly induce genes encoding protein synthesis and the unfolded protein response, but not apoptosis. Forced expression of ATF4 and CHOP increased protein synthesis and caused ATP depletion, oxidative stress and cell death. The increased protein synthesis and oxidative stress were necessary signals for cell death. We show that eIF2α-phosphorylation-attenuated protein synthesis, and not Atf4 mRNA translation, promotes cell survival. These results show that transcriptional induction through ATF4 and CHOP increases protein synthesis leading to oxidative stress and cell death. The findings suggest that limiting protein synthesis will be therapeutic for diseases caused by protein misfolding in the ER.
Collapse
Affiliation(s)
- Jaeseok Han
- Center for Neuroscience, Aging, and Stem Cell Research, Sanford Burnham Medical Research Institute, 10901 North Torrey Pines Road, La Jolla, California 92037, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Natarajan A, Yardimci GG, Sheffield NC, Crawford GE, Ohler U. Predicting cell-type-specific gene expression from regions of open chromatin. Genome Res 2013; 22:1711-22. [PMID: 22955983 PMCID: PMC3431488 DOI: 10.1101/gr.135129.111] [Citation(s) in RCA: 172] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Complex patterns of cell-type-specific gene expression are thought to be achieved by combinatorial binding of transcription factors (TFs) to sequence elements in regulatory regions. Predicting cell-type-specific expression in mammals has been hindered by the oftentimes unknown location of distal regulatory regions. To alleviate this bottleneck, we used DNase-seq data from 19 diverse human cell types to identify proximal and distal regulatory elements at genome-wide scale. Matched expression data allowed us to separate genes into classes of cell-type-specific up-regulated, down-regulated, and constitutively expressed genes. CG dinucleotide content and DNA accessibility in the promoters of these three classes of genes displayed substantial differences, highlighting the importance of including these aspects in modeling gene expression. We associated DNase I hypersensitive sites (DHSs) with genes, and trained classifiers for different expression patterns. TF sequence motif matches in DHSs provided a strong performance improvement in predicting gene expression over the typical baseline approach of using proximal promoter sequences. In particular, we achieved competitive performance when discriminating up-regulated genes from different cell types or genes up- and down-regulated under the same conditions. We identified previously known and new candidate cell-type-specific regulators. The models generated testable predictions of activating or repressive functions of regulators. DNase I footprints for these regulators were indicative of their direct binding to DNA. In summary, we successfully used information of open chromatin obtained by a single assay, DNase-seq, to address the problem of predicting cell-type-specific gene expression in mammalian organisms directly from regulatory sequence.
Collapse
Affiliation(s)
- Anirudh Natarajan
- Program in Computational Biology and Bioinformatics, Duke University, Durham, North Carolina 27708, USA
| | | | | | | | | |
Collapse
|
48
|
Klepper K, Drabløs F. MotifLab: a tools and data integration workbench for motif discovery and regulatory sequence analysis. BMC Bioinformatics 2013; 14:9. [PMID: 23323883 PMCID: PMC3556059 DOI: 10.1186/1471-2105-14-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2012] [Accepted: 01/10/2013] [Indexed: 12/19/2022] Open
Abstract
Background Traditional methods for computational motif discovery often suffer from poor performance. In particular, methods that search for sequence matches to known binding motifs tend to predict many non-functional binding sites because they fail to take into consideration the biological state of the cell. In recent years, genome-wide studies have generated a lot of data that has the potential to improve our ability to identify functional motifs and binding sites, such as information about chromatin accessibility and epigenetic states in different cell types. However, it is not always trivial to make use of this data in combination with existing motif discovery tools, especially for researchers who are not skilled in bioinformatics programming. Results Here we present MotifLab, a general workbench for analysing regulatory sequence regions and discovering transcription factor binding sites and cis-regulatory modules. MotifLab supports comprehensive motif discovery and analysis by allowing users to integrate several popular motif discovery tools as well as different kinds of additional information, including phylogenetic conservation, epigenetic marks, DNase hypersensitive sites, ChIP-Seq data, positional binding preferences of transcription factors, transcription factor interactions and gene expression. MotifLab offers several data-processing operations that can be used to create, manipulate and analyse data objects, and complete analysis workflows can be constructed and automatically executed within MotifLab, including graphical presentation of the results. Conclusions We have developed MotifLab as a flexible workbench for motif analysis in a genomic context. The flexibility and effectiveness of this workbench has been demonstrated on selected test cases, in particular two previously published benchmark data sets for single motifs and modules, and a realistic example of genes responding to treatment with forskolin. MotifLab is freely available at http://www.motiflab.org.
Collapse
Affiliation(s)
- Kjetil Klepper
- Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway.
| | | |
Collapse
|
49
|
Müller-Molina AJ, Schöler HR, Araúzo-Bravo MJ. Comprehensive human transcription factor binding site map for combinatory binding motifs discovery. PLoS One 2012; 7:e49086. [PMID: 23209563 PMCID: PMC3509107 DOI: 10.1371/journal.pone.0049086] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2012] [Accepted: 10/08/2012] [Indexed: 11/18/2022] Open
Abstract
To know the map between transcription factors (TFs) and their binding sites is essential to reverse engineer the regulation process. Only about 10%-20% of the transcription factor binding motifs (TFBMs) have been reported. This lack of data hinders understanding gene regulation. To address this drawback, we propose a computational method that exploits never used TF properties to discover the missing TFBMs and their sites in all human gene promoters. The method starts by predicting a dictionary of regulatory "DNA words." From this dictionary, it distills 4098 novel predictions. To disclose the crosstalk between motifs, an additional algorithm extracts TF combinatorial binding patterns creating a collection of TF regulatory syntactic rules. Using these rules, we narrowed down a list of 504 novel motifs that appear frequently in syntax patterns. We tested the predictions against 509 known motifs confirming that our system can reliably predict ab initio motifs with an accuracy of 81%-far higher than previous approaches. We found that on average, 90% of the discovered combinatorial binding patterns target at least 10 genes, suggesting that to control in an independent manner smaller gene sets, supplementary regulatory mechanisms are required. Additionally, we discovered that the new TFBMs and their combinatorial patterns convey biological meaning, targeting TFs and genes related to developmental functions. Thus, among all the possible available targets in the genome, the TFs tend to regulate other TFs and genes involved in developmental functions. We provide a comprehensive resource for regulation analysis that includes a dictionary of "DNA words," newly predicted motifs and their corresponding combinatorial patterns. Combinatorial patterns are a useful filter to discover TFBMs that play a major role in orchestrating other factors and thus, are likely to lock/unlock cellular functional clusters.
Collapse
Affiliation(s)
- Arnoldo J. Müller-Molina
- Computational Biology and Bioinformatics Group, Max Planck Institute for Molecular Biomedicine, Münster, Germany
| | - Hans R. Schöler
- Department of Cell and Developmental Biology, Max Planck Institute for Molecular Biomedicine, Münster, Germany
- Medical Faculty, University of Münster, Münster, Germany
| | - Marcos J. Araúzo-Bravo
- Computational Biology and Bioinformatics Group, Max Planck Institute for Molecular Biomedicine, Münster, Germany
| |
Collapse
|
50
|
CLIP-seq of eIF4AIII reveals transcriptome-wide mapping of the human exon junction complex. Nat Struct Mol Biol 2012; 19:1124-31. [PMID: 23085716 DOI: 10.1038/nsmb.2420] [Citation(s) in RCA: 150] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2012] [Accepted: 09/21/2012] [Indexed: 12/14/2022]
Abstract
The exon junction complex (EJC) is a central effector of the fate of mRNAs, linking nuclear processing to mRNA transport, translation and surveillance. However, little is known about its transcriptome-wide targets. We used cross-linking and immunoprecipitation methods coupled to high-throughput sequencing (CLIP-seq) in human cells to identify the binding sites of the DEAD-box helicase eIF4AIII, an EJC core component. CLIP reads form peaks that are located mainly in spliced mRNAs. Most expressed exons harbor peaks either in the canonical EJC region, located ~24 nucleotides upstream of exonic junctions, or in other noncanonical regions. Notably, both of these types of peaks are preferentially associated with unstructured and purine-rich sequences containing the motif GAAGA, which is a potential binding site for EJC-associated factors. Therefore, EJC positions vary spatially and quantitatively between exons. This transcriptome-wide mapping of human eIF4AIII reveals unanticipated aspects of the EJC and broadens its potential impact on post-transcriptional regulation.
Collapse
|