1
|
Du Y, Cao L, Wang S, Guo L, Tan L, Liu H, Feng Y, Wu W. Differences in alternative splicing and their potential underlying factors between animals and plants. J Adv Res 2024; 64:83-98. [PMID: 37981087 PMCID: PMC11464654 DOI: 10.1016/j.jare.2023.11.017] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Revised: 08/16/2023] [Accepted: 11/14/2023] [Indexed: 11/21/2023] Open
Abstract
BACKGROUND Alternative splicing (AS), a posttranscriptional process, contributes to the complexity of transcripts from a limited number of genes in a genome, and AS is considered a great source of genetic and phenotypic diversity in eukaryotes. In animals, AS is tightly regulated during the processes of cell growth and differentiation, and its dysregulation is involved in many diseases, including cancers. Likewise, in plants, AS occurs in all stages of plant growth and development, and it seems to play important roles in the rapid reprogramming of genes in response to environmental stressors. To date, the prevalence and functional roles of AS have been extensively reviewed in animals and plants. However, AS differences between animals and plants, especially their underlying molecular mechanisms and impact factors, are anecdotal and rarely reviewed. AIM OF REVIEW This review aims to broaden our understanding of AS roles in a variety of biological processes and provide insights into the underlying mechanisms and impact factors likely leading to AS differences between animals and plants. KEY SCIENTIFIC CONCEPTS OF REVIEW We briefly summarize the roles of AS regulation in physiological and biochemical activities in animals and plants. Then, we underline the differences in the process of AS between plants and animals and especially analyze the potential impact factors, such as gene exon/intron architecture, 5'/3' untranslated regions (UTRs), spliceosome components, chromatin dynamics and transcription speeds, splicing factors [serine/arginine-rich (SR) proteins and heterogeneous nuclear ribonucleoproteins (hnRNPs)], noncoding RNAs, and environmental stimuli, which might lead to the differences. Moreover, we compare the nonsense-mediated mRNA decay (NMD)-mediated turnover of the transcripts with a premature termination codon (PTC) in animals and plants. Finally, we summarize the current AS knowledge published in animals versus plants and discuss the potential development of disease therapies and superior crops in the future.
Collapse
Affiliation(s)
- Yunfei Du
- State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, Lin'an, 311300, Hangzhou, China
| | - Lu Cao
- State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, Lin'an, 311300, Hangzhou, China
| | - Shuo Wang
- State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, Lin'an, 311300, Hangzhou, China
| | - Liangyu Guo
- State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, Lin'an, 311300, Hangzhou, China
| | - Lingling Tan
- State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, Lin'an, 311300, Hangzhou, China
| | - Hua Liu
- State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, Lin'an, 311300, Hangzhou, China
| | - Ying Feng
- Key Laboratory of Nutrition, Metabolism and Food Safety, Shanghai Institute of Nutrition and Health (SINH), Chinese Academy of Sciences (CAS), Shanghai 200032, China.
| | - Wenwu Wu
- State Key Laboratory of Subtropical Silviculture, Zhejiang A&F University, Lin'an, 311300, Hangzhou, China.
| |
Collapse
|
2
|
Pascal C, Zonszain J, Hameiri O, Gargi-Levi C, Lev-Maor G, Tammer L, Levy T, Tarabeih A, Roy VR, Ben-Salmon S, Elbaz L, Eid M, Hakim T, Abu Rabe'a S, Shalev N, Jordan A, Meshorer E, Ast G. Human histone H1 variants impact splicing outcome by controlling RNA polymerase II elongation. Mol Cell 2023; 83:3801-3817.e8. [PMID: 37922872 DOI: 10.1016/j.molcel.2023.10.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 08/17/2023] [Accepted: 10/05/2023] [Indexed: 11/07/2023]
Abstract
Histones shape chromatin structure and the epigenetic landscape. H1, the most diverse histone in the human genome, has 11 variants. Due to the high structural similarity between the H1s, their unique functions in transferring information from the chromatin to mRNA-processing machineries have remained elusive. Here, we generated human cell lines lacking up to five H1 subtypes, allowing us to characterize the genomic binding profiles of six H1 variants. Most H1s bind to specific sites, and binding depends on multiple factors, including GC content. The highly expressed H1.2 has a high affinity for exons, whereas H1.3 binds intronic sequences. H1s are major splicing regulators, especially of exon skipping and intron retention events, through their effects on the elongation of RNA polymerase II (RNAPII). Thus, H1 variants determine splicing fate by modulating RNAPII elongation.
Collapse
Affiliation(s)
- Corina Pascal
- Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Jonathan Zonszain
- Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Ofir Hameiri
- Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Chen Gargi-Levi
- Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Galit Lev-Maor
- Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Luna Tammer
- Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Tamar Levy
- Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Anan Tarabeih
- Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Vanessa Rachel Roy
- Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Stav Ben-Salmon
- Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Liraz Elbaz
- Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Mireille Eid
- Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Tamar Hakim
- Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Salima Abu Rabe'a
- Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Nana Shalev
- Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | - Albert Jordan
- Instituto de Biologia Molecular de Barcelona (IBMB-CSIC), Carrer de Baldiri Reixac, 15, 08028 Barcelona, Spain
| | - Eran Meshorer
- Department of Genetics, The Alexander Silberman Institute of Life Sciences, Jerusalem 91904, Israel; Edmond and Lily Center for Brain Sciences (ELSC), The Hebrew University of Jerusalem, Jerusalem 91904, Israel
| | - Gil Ast
- Department of Human Molecular Genetics and Biochemistry, Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel.
| |
Collapse
|
3
|
Stochastic Variation in DNA Methylation Modulates Nucleosome Occupancy and Alternative Splicing in Arabidopsis thaliana. PLANTS 2022; 11:plants11091105. [PMID: 35567106 PMCID: PMC9101026 DOI: 10.3390/plants11091105] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 04/05/2022] [Accepted: 04/07/2022] [Indexed: 11/17/2022]
Abstract
Plants use complex gene regulatory mechanisms to overcome diverse environmental challenges. For instance, cold stress induces rapid and massive transcriptome changes via alternative splicing (AS) to confer cold tolerance in plants. In mammals, mounting evidence suggests chromatin structure can regulate co-transcriptional AS. Recent evidence also supports co-transcriptional regulation of AS in plants, but how dynamic changes in DNA methylation and the chromatin structure influence the AS process upon cold stress remains poorly understood. In this study, we used the DNA methylation inhibitor 5-Aza-2′-Deoxycytidine (5-aza-dC) to investigate the role of stochastic variations in DNA methylation and nucleosome occupancy in modulating cold-induced AS, in Arabidopsis thaliana (Arabidopsis). Our results demonstrate that 5-aza-dC derived stochastic hypomethylation modulates nucleosome occupancy and AS profiles of genes implicated in RNA metabolism, plant hormone signal transduction, and of cold-related genes in response to cold stress. We also demonstrate that cold-induced remodelling of DNA methylation regulates genes involved in amino acid metabolism. Collectively, we demonstrate that sudden changes in DNA methylation via drug treatment can influence nucleosome occupancy levels and modulate AS in a temperature-dependent manner to regulate plant metabolism and physiological stress adaptation.
Collapse
|
4
|
Giniūnaitė R, Petkevičiūtė-Gerlach D. Predicting the configuration and energy of DNA in a nucleosome by coarse-grain modelling. Phys Chem Chem Phys 2022; 24:26124-26133. [DOI: 10.1039/d2cp03553g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
We present a novel algorithm which uses a coarse-grained model and an energy minimisation procedure to predict the sequence-dependent DNA configuration in a nucleosome together with its energetic cost.
Collapse
Affiliation(s)
- Rasa Giniūnaitė
- Department of Applied Mathematics, Kaunas University of Technology, Studentų 50-318, 51368, Kaunas, Lithuania
- Institute of Applied Mathematics, Vilnius University, Naugarduko 24, 03225, Vilnius, Lithuania
| | - Daiva Petkevičiūtė-Gerlach
- Department of Applied Mathematics, Kaunas University of Technology, Studentų 50-318, 51368, Kaunas, Lithuania
| |
Collapse
|
5
|
Chaudhary S, Jabre I, Syed NH. Epigenetic differences in an identical genetic background modulate alternative splicing in A. thaliana. Genomics 2021; 113:3476-3486. [PMID: 34391867 DOI: 10.1016/j.ygeno.2021.08.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2021] [Revised: 08/02/2021] [Accepted: 08/10/2021] [Indexed: 11/19/2022]
Abstract
How stable and temperature-dependent variations in DNA methylation and nucleosome occupancy influence alternative splicing (AS) remains poorly understood in plants. To answer this, we generated transcriptome, whole-genome bisulfite, and MNase sequencing data for an epigenetic Recombinant Inbred Line (epiRIL) of A. thaliana at normal and cold temperature. For comparative analysis, the same data sets for the parental ecotype Columbia (Col-0) were also generated, whereas for DNA methylation, previously published high confidence methylation profiles of Col-0 were used. Significant epigenetic differences in an identical genetic background were observed between Col-0 and epiRIL lines under normal and cold temperatures. Our transcriptome data revealed that differential DNA methylation and nucleosome occupancy modulate expression levels of many genes and AS in response to cold. Collectively, DNA methylation and nucleosome levels exhibit characteristic patterns around intron-exon boundaries at normal and cold conditions, and any perturbation in them, in an identical genetic background is sufficient to modulate AS in Arabidopsis.
Collapse
Affiliation(s)
- Saurabh Chaudhary
- School of Psychology and Life Sciences, Canterbury Christ Church University, Canterbury CT1 1QU, UK; Cardiff School of Biosciences, Cardiff University, Cardiff CF10 3AX, UK.
| | - Ibtissam Jabre
- School of Psychology and Life Sciences, Canterbury Christ Church University, Canterbury CT1 1QU, UK; Department of Microbial Sciences, School of Biosciences and Medicine, Faculty of Health and Medical Sciences, University of Surrey, Guildford GU2 7XH, UK
| | - Naeem H Syed
- School of Psychology and Life Sciences, Canterbury Christ Church University, Canterbury CT1 1QU, UK.
| |
Collapse
|
6
|
Abstract
Aims:
The discontinuous pattern of genome size variation in angiosperms is an unsolved
problem related to genome evolution. In this study, we introduced a genome evolution operator
and solved the related eigenvalue equation to deduce the discontinuous pattern.
Background:
Genome is a well-defined system for studying the evolution of species. One of the
basic problems is the genome size evolution. The DNA amounts for angiosperm species are highly
variable, differing over 1000-fold. One big surprise is the discovery of the discontinuous
distribution of nuclear DNA amounts in many angiosperm genera.
Objective:
The discontinuous distribution of nuclear DNA amounts has certain regularity, much
like a group of quantum states in atomic physics. The quantum pattern has not been explained by
all the evolutionary theories so far and we shall interpret it through the quantum simulation of
genome evolution.
Methods:
We introduced a genome evolution operator H to deduce the distribution of DNA
amount. The nuclear DNA amount in angiosperms is studied from the eigenvalue equation of the
genome evolution operator H. The operator H is introduced by physical simulation and it is
defined as a function of the genome size N and the derivative with respect to the size.
Results:
The discontinuity of DNA size distribution and its synergetic occurrence in related
angiosperms species are successfully deduced from the solution of the equation. The results agree
well with the existing experimental data of Aloe, Clarkia, Nicotiana, Lathyrus, Allium and other
genera.
Conclusion:
The success of our approach may infer the existence of a set of genomic evolutionary
equations satisfying classical-quantum duality. The classical phase of evolution means it obeys the
classical deterministic law, while the quantum phase means it obeys the quantum stochastic law.
The discontinuity of DNA size distribution provides novel evidences on the quantum evolution of
angiosperms. It has been realized that the discontinuous pattern is due to the existence of some
unknown evolutionary constraints. However, our study indicates that these constraints on the
angiosperm genome essentially originate from quantum.
Collapse
Affiliation(s)
- Liaofu Luo
- School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | - Lirong Zhang
- School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| |
Collapse
|
7
|
Liu B. BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches. Brief Bioinform 2020; 20:1280-1294. [PMID: 29272359 DOI: 10.1093/bib/bbx165] [Citation(s) in RCA: 194] [Impact Index Per Article: 38.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2017] [Revised: 11/08/2017] [Indexed: 01/07/2023] Open
Abstract
With the avalanche of biological sequences generated in the post-genomic age, one of the most challenging problems is how to computationally analyze their structures and functions. Machine learning techniques are playing key roles in this field. Typically, predictors based on machine learning techniques contain three main steps: feature extraction, predictor construction and performance evaluation. Although several Web servers and stand-alone tools have been developed to facilitate the biological sequence analysis, they only focus on individual step. In this regard, in this study a powerful Web server called BioSeq-Analysis (http://bioinformatics.hitsz.edu.cn/BioSeq-Analysis/) has been proposed to automatically complete the three main steps for constructing a predictor. The user only needs to upload the benchmark data set. BioSeq-Analysis can generate the optimized predictor based on the benchmark data set, and the performance measures can be reported as well. Furthermore, to maximize user's convenience, its stand-alone program was also released, which can be downloaded from http://bioinformatics.hitsz.edu.cn/BioSeq-Analysis/download/, and can be directly run on Windows, Linux and UNIX. Applied to three sequence analysis tasks, experimental results showed that the predictors generated by BioSeq-Analysis even outperformed some state-of-the-art methods. It is anticipated that BioSeq-Analysis will become a useful tool for biological sequence analysis.
Collapse
|
8
|
Jabre I, Reddy ASN, Kalyna M, Chaudhary S, Khokhar W, Byrne LJ, Wilson CM, Syed NH. Does co-transcriptional regulation of alternative splicing mediate plant stress responses? Nucleic Acids Res 2019; 47:2716-2726. [PMID: 30793202 PMCID: PMC6451118 DOI: 10.1093/nar/gkz121] [Citation(s) in RCA: 69] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Revised: 02/11/2019] [Accepted: 02/13/2019] [Indexed: 12/15/2022] Open
Abstract
Plants display exquisite control over gene expression to elicit appropriate responses under normal and stress conditions. Alternative splicing (AS) of pre-mRNAs, a process that generates two or more transcripts from multi-exon genes, adds another layer of regulation to fine-tune condition-specific gene expression in animals and plants. However, exactly how plants control splice isoform ratios and the timing of this regulation in response to environmental signals remains elusive. In mammals, recent evidence indicate that epigenetic and epitranscriptome changes, such as DNA methylation, chromatin modifications and RNA methylation, regulate RNA polymerase II processivity, co-transcriptional splicing, and stability and translation efficiency of splice isoforms. In plants, the role of epigenetic modifications in regulating transcription rate and mRNA abundance under stress is beginning to emerge. However, the mechanisms by which epigenetic and epitranscriptomic modifications regulate AS and translation efficiency require further research. Dynamic changes in the chromatin landscape in response to stress may provide a scaffold around which gene expression, AS and translation are orchestrated. Finally, we discuss CRISPR/Cas-based strategies for engineering chromatin architecture to manipulate AS patterns (or splice isoforms levels) to obtain insight into the epigenetic regulation of AS.
Collapse
Affiliation(s)
- Ibtissam Jabre
- School of Human and Life Sciences, Canterbury Christ Church University, Canterbury, CT1 1QU, UK
| | - Anireddy S N Reddy
- Department of Biology and Program in Cell and Molecular Biology, Colorado State University, Fort Collins, CO 80523-1878, USA
| | - Maria Kalyna
- Department of Applied Genetics and Cell Biology, University of Natural Resources and Life Sciences - BOKU, Muthgasse 18, 1190 Vienna, Austria
| | - Saurabh Chaudhary
- School of Human and Life Sciences, Canterbury Christ Church University, Canterbury, CT1 1QU, UK
| | - Waqas Khokhar
- School of Human and Life Sciences, Canterbury Christ Church University, Canterbury, CT1 1QU, UK
| | - Lee J Byrne
- School of Human and Life Sciences, Canterbury Christ Church University, Canterbury, CT1 1QU, UK
| | - Cornelia M Wilson
- School of Human and Life Sciences, Canterbury Christ Church University, Canterbury, CT1 1QU, UK
| | - Naeem H Syed
- School of Human and Life Sciences, Canterbury Christ Church University, Canterbury, CT1 1QU, UK
| |
Collapse
|
9
|
Glaich O, Leader Y, Lev Maor G, Ast G. Histone H1.5 binds over splice sites in chromatin and regulates alternative splicing. Nucleic Acids Res 2019; 47:6145-6159. [PMID: 31076740 PMCID: PMC6614845 DOI: 10.1093/nar/gkz338] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Revised: 04/17/2019] [Accepted: 04/27/2019] [Indexed: 12/11/2022] Open
Abstract
Chromatin organization and epigenetic markers influence splicing, though the magnitudes of these effects and the mechanisms are largely unknown. Here, we demonstrate that linker histone H1.5 influences mRNA splicing. We observed that linker histone H1.5 binds DNA over splice sites of short exons in human lung fibroblasts (IMR90 cells). We found that association of H1.5 with these splice sites correlated with the level of inclusion of alternatively spliced exons. Exons marked by H1.5 had more RNA polymerase II (RNAP II) stalling near the 3' splice site than did exons not associated with H1.5. In cells depleted of H1.5, we showed that the inclusion of five exons evaluated decreased and that RNAP II levels over these exons were also reduced. Our findings indicate that H1.5 is involved in regulation of splice site selection and alternative splicing, a function not previously demonstrated for linker histones.
Collapse
Affiliation(s)
- Ohad Glaich
- Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Ramat Aviv 69978, Israel
| | - Yodfat Leader
- Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Ramat Aviv 69978, Israel
| | - Galit Lev Maor
- Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Ramat Aviv 69978, Israel
| | - Gil Ast
- Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Ramat Aviv 69978, Israel
| |
Collapse
|
10
|
Feng P, Wang Z, Yu X. Predicting Antimicrobial Peptides by Using Increment of Diversity with Quadratic Discriminant Analysis Method. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1309-1312. [PMID: 28212093 DOI: 10.1109/tcbb.2017.2669302] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Antimicrobial peptides are crucial components of the innate host defense system of most living organisms and promising candidates for antimicrobial agents. Accurate classification of antimicrobial peptides will be helpful to the discovery of new therapeutic targets. In this work, the Increment of Diversity with Quadratic Discriminant analysis (IDQD) was presented to classify antifungal and antibacterial peptides based on primary sequence information. In the jackknife test, the proposed IDQD model yields an accuracy of 86.02 percent with the sensitivity of 74.31 percent and specificity of 92.79 percent for identifying antimicrobial peptides, which is superior to other state-of-the-art methods. This result suggests that the proposed IDQD model can be efficiently used to antimicrobial peptide classification.
Collapse
|
11
|
Yang L, Gao H, Liu Z, Tang L. Identification of Phage Virion Proteins by Using the g-gap Tripeptide Composition. LETT ORG CHEM 2019. [DOI: 10.2174/1570178615666180910112813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Phages are widely distributed in locations populated by bacterial hosts. Phage proteins can be divided into two main categories, that is, virion and non-virion proteins with different functions. In practice, people mainly use phage virion proteins to clarify the lysis mechanism of bacterial cells and develop new antibacterial drugs. Accurate identification of phage virion proteins is therefore essential to understanding the phage lysis mechanism. Although some computational methods have been focused on identifying virion proteins, the result is not satisfying which gives more room for improvement. In this study, a new sequence-based method was proposed to identify phage virion proteins using g-gap tripeptide composition. In this approach, the protein features were firstly extracted from the ggap tripeptide composition. Subsequently, we obtained an optimal feature subset by performing incremental feature selection (IFS) with information gain. Finally, the support vector machine (SVM) was used as the classifier to discriminate virion proteins from non-virion proteins. In 10-fold crossvalidation test, our proposed method achieved an accuracy of 97.40% with AUC of 0.9958, which outperforms state-of-the-art methods. The result reveals that our proposed method could be a promising method in the work of phage virion proteins identification.
Collapse
Affiliation(s)
- Liangwei Yang
- School of Computer Science and Engineering, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Hui Gao
- School of Computer Science and Engineering, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Zhen Liu
- School of Computer Science and Engineering, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Lixia Tang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
12
|
Jani MR, Khan Mozlish MT, Ahmed S, Tahniat NS, Farid DM, Shatabda S. iRecSpot-EF: Effective sequence based features for recombination hotspot prediction. Comput Biol Med 2018; 103:17-23. [DOI: 10.1016/j.compbiomed.2018.10.005] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2018] [Revised: 10/07/2018] [Accepted: 10/07/2018] [Indexed: 01/19/2023]
|
13
|
Jia C, Yang Q, Zou Q. NucPosPred: Predicting species-specific genomic nucleosome positioning via four different modes of general PseKNC. J Theor Biol 2018; 450:15-21. [PMID: 29678692 DOI: 10.1016/j.jtbi.2018.04.025] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2018] [Revised: 04/13/2018] [Accepted: 04/16/2018] [Indexed: 11/20/2022]
Abstract
The nucleosome is the basic structure of chromatin in eukaryotic cells, with essential roles in the regulation of many biological processes, such as DNA transcription, replication and repair, and RNA splicing. Because of the importance of nucleosomes, the factors that determine their positioning within genomes should be investigated. High-resolution nucleosome-positioning maps are now available for organisms including Saccharomyces cerevisiae, Drosophila melanogaster and Caenorhabditis elegans, enabling the identification of nucleosome positioning by application of computational tools. Here, we describe a novel predictor called NucPosPred, which was specifically designed for large-scale identification of nucleosome positioning in C. elegans and D. melanogaster genomes. NucPosPred was separately optimized for each species for four types of DNA sequence feature extraction, with consideration of two classification algorithms (gradient-boosting decision tree and support vector machine). The overall accuracy obtained with NucPosPred was 92.29% for C. elegans and 88.26% for D. melanogaster, outperforming previous methods and demonstrating the potential for species-specific prediction of nucleosome positioning. For the convenience of most experimental scientists, a web-server for the predictor NucPosPred is available at http://121.42.167.206/NucPosPred/index.jsp.
Collapse
Affiliation(s)
- Cangzhi Jia
- Science of College, Dalian Maritime University, No. 1 Linghai Road, Dalian 116026, China.
| | - Qing Yang
- Science of College, Dalian Maritime University, No. 1 Linghai Road, Dalian 116026, China
| | - Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin, China.
| |
Collapse
|
14
|
Jia Y, Li H, Wang J, Meng H, Yang Z. Spectrum structures and biological functions of 8-mers in the human genome. Genomics 2018. [PMID: 29522801 DOI: 10.1016/j.ygeno.2018.03.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The spectra of k-mer frequencies can reveal the structures and evolution of genome sequences. We confirmed that the trimodal spectrum of 8-mers in human genome sequences is distinguished only by CG2, CG1 and CG0 8-mer sets, containing 2,1 or 0 CpG, respectively. This phenomenon is called independent selection law. The three types of CG 8-mers were considered as different functional elements. We conjectured that (1) nucleosome binding motifs are mainly characterized by CG1 8-mers and (2) the core structural units of CpG island sequences are predominantly characterized by CG2 8-mers. To validate our conjectures, nucleosome occupied sequences and CGI sequences were extracted, then the sequence parameters were constructed through the information of the three CG 8-mer sets respectively. ROC analysis showed that CG1 8-mers are more preference in nucleosome occupied segments (AUC > 0.7) and CG2 8-mers are more preference in CGI sequences (AUC > 0.99). This validates our conjecture in principle.
Collapse
Affiliation(s)
- Yun Jia
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China; College of Science, Inner Mongolia University of Technology, Hohhot 010051, China
| | - Hong Li
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China.
| | - Jingfeng Wang
- College of Science, Inner Mongolia University of Technology, Hohhot 010051, China
| | - Hu Meng
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | - Zhenhua Yang
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| |
Collapse
|
15
|
Tahir M, Hayat M. iNuc-STNC: a sequence-based predictor for identification of nucleosome positioning in genomes by extending the concept of SAAC and Chou's PseAAC. MOLECULAR BIOSYSTEMS 2017; 12:2587-93. [PMID: 27271822 DOI: 10.1039/c6mb00221h] [Citation(s) in RCA: 89] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
The nucleosome is the fundamental unit of eukaryotic chromatin, which participates in regulating different cellular processes. Owing to the huge exploration of new DNA primary sequences, it is indispensable to develop an automated model. However, identification of novel protein sequences using conventional methods is difficult or sometimes impossible because of vague motifs and the intricate structure of DNA. In this regard, an effective and high throughput automated model "iNuc-STNC" has been proposed in order to identify accurately and reliably nucleosome positioning in genomes. In this proposed model, DNA sequences are expressed into three distinct feature extraction strategies containing dinucleotide composition, trinucleotide composition and split trinucleotide composition (STNC). Various statistical models were utilized as learner hypotheses. Jackknife test was employed to evaluate the success rates of the proposed model. The experiential results expressed that SVM, in combination with STNC, has obtained an outstanding performance on all benchmark datasets. The predicted outcomes of the proposed model "iNuc-STNC" is higher than current state of the art methods in the literature so far. It is ascertained that the "iNuc-STNC" model will provide a rudimentary framework for the pharmaceutical industry in the development of drug design.
Collapse
Affiliation(s)
- Muhammad Tahir
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan.
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan.
| |
Collapse
|
16
|
Li R, Zhong D, Liu R, Lv H, Zhang X, Liu J, Han J. A novel method for in silico identification of regulatory SNPs in human genome. J Theor Biol 2017; 415:84-89. [DOI: 10.1016/j.jtbi.2016.11.022] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2016] [Revised: 11/17/2016] [Accepted: 11/25/2016] [Indexed: 11/29/2022]
|
17
|
Liu B, Wu H, Chou KC. Pse-in-One 2.0: An Improved Package of Web Servers for Generating Various Modes of Pseudo Components of DNA, RNA, and Protein Sequences. ACTA ACUST UNITED AC 2017. [DOI: 10.4236/ns.2017.94007] [Citation(s) in RCA: 91] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
18
|
Awazu A. Prediction of nucleosome positioning by the incorporation of frequencies and distributions of three different nucleotide segment lengths into a general pseudo k-tuple nucleotide composition. Bioinformatics 2016; 33:42-48. [PMID: 27563027 PMCID: PMC5860184 DOI: 10.1093/bioinformatics/btw562] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2016] [Revised: 08/02/2016] [Accepted: 08/19/2016] [Indexed: 11/13/2022] Open
Abstract
Motivation Nucleosome positioning plays important roles in many eukaryotic intranuclear processes, such as transcriptional regulation and chromatin structure formation. The investigations of nucleosome positioning rules provide a deeper understanding of these intracellular processes. Results Nucleosome positioning prediction was performed using a model consisting of three types of variables characterizing a DNA sequence—the number of five-nucleotide sequences, the number of three-nucleotide combinations in one period of a helix, and mono- and di-nucleotide distributions in DNA fragments. Using recently proposed stringent benchmark datasets with low biases for Saccharomyces cerevisiae, Homo sapiens, Caenorhabditis elegans and Drosophila melanogaster, the present model was shown to have a better prediction performance than the recently proposed predictors. This model was able to display the common and organism-dependent factors that affect nucleosome forming and inhibiting sequences as well. Therefore, the predictors developed here can accurately predict nucleosome positioning and help determine the key factors influencing this process. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Akinori Awazu
- Department of Mathematical and Life Sciences.,Research Center for Mathematics on Chromatin Live Dynamics, Hiroshima University, Kagami-yama 1-3-1, Higashi-Hiroshima, 739-8526, Japan
| |
Collapse
|
19
|
Towards understanding pre-mRNA splicing mechanisms and the role of SR proteins. Gene 2016; 587:107-19. [PMID: 27154819 DOI: 10.1016/j.gene.2016.04.057] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2016] [Accepted: 04/30/2016] [Indexed: 01/04/2023]
Abstract
Alternative pre-mRNA splicing provides a source of vast protein diversity by removing non-coding sequences (introns) and accurately linking different exonic regions in the correct reading frame. The regulation of alternative splicing is essential for various cellular functions in both pathological and physiological conditions. In eukaryotic cells, this process is commonly used to increase proteomic diversity and to control gene expression either co- or post-transcriptionally. Alternative splicing occurs within a megadalton-sized, multi-component machine consisting of RNA and proteins; during the splicing process, this complex undergoes dynamic changes via RNA-RNA, protein-protein and RNA-protein interactions. Co-transcriptional splicing functionally integrates the transcriptional machinery, thereby enabling the two processes to influence one another, whereas post-transcriptional splicing facilitates the coupling of RNA splicing with post-splicing events. This review addresses the structural aspects of spliceosomes and the mechanistic implications of their stepwise assembly on the regulation of pre-mRNA splicing. Moreover, the role of phosphorylation-based, signal-induced changes in the regulation of the splicing process is demonstrated.
Collapse
|
20
|
Liu G, Xing Y, Zhao H, Wang J, Shang Y, Cai L. A deformation energy-based model for predicting nucleosome dyads and occupancy. Sci Rep 2016; 6:24133. [PMID: 27053067 PMCID: PMC4823781 DOI: 10.1038/srep24133] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2016] [Accepted: 03/21/2016] [Indexed: 12/14/2022] Open
Abstract
Nucleosome plays an essential role in various cellular processes, such as DNA replication, recombination, and transcription. Hence, it is important to decode the mechanism of nucleosome positioning and identify nucleosome positions in the genome. In this paper, we present a model for predicting nucleosome positioning based on DNA deformation, in which both bending and shearing of the nucleosomal DNA are considered. The model successfully predicted the dyad positions of nucleosomes assembled in vitro and the in vitro map of nucleosomes in Saccharomyces cerevisiae. Applying the model to Caenorhabditis elegans and Drosophila melanogaster, we achieved satisfactory results. Our data also show that shearing energy of nucleosomal DNA outperforms bending energy in nucleosome occupancy prediction and the ability to predict nucleosome dyad positions is attributed to bending energy that is associated with rotational positioning of nucleosomes.
Collapse
Affiliation(s)
- Guoqing Liu
- The Institute of Bioengineering and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China.,Computational Systems Biology Lab, Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Yongqiang Xing
- The Institute of Bioengineering and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China
| | - Hongyu Zhao
- The Institute of Bioengineering and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China
| | - Jianying Wang
- The Institute of Bioengineering and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China.,State Key Laboratory for Utilization of Bayan Obo Multi-Metallic Resources, Inner Mongolia University of Science and Technology, Baotou, 014010, China
| | - Yu Shang
- Computational Systems Biology Lab, Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA.,College of Computer Science and Technology, Jilin University, Changchun, Jilin 130021, China
| | - Lu Cai
- The Institute of Bioengineering and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China
| |
Collapse
|
21
|
Chen W, Feng P, Ding H, Lin H, Chou KC. Using deformation energy to analyze nucleosome positioning in genomes. Genomics 2016; 107:69-75. [DOI: 10.1016/j.ygeno.2015.12.005] [Citation(s) in RCA: 87] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2015] [Revised: 12/06/2015] [Accepted: 12/22/2015] [Indexed: 12/28/2022]
|
22
|
Yamauchi K, 1 Biotechnology Research Center and Department of Biotechnology, Toyama Prefectural University, Imizu, Toyama 939-0398, Japan, Kondo S, Hamamoto M, Suzuki Y, Nishida H. Genome-wide maps of nucleosomes of the trichostatin A treated and untreated archiascomycetous yeast <em>Saitoella complicata</em>. AIMS Microbiol 2016. [DOI: 10.3934/microbiol.2016.1.69] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
23
|
Naftelberg S, Schor IE, Ast G, Kornblihtt AR. Regulation of alternative splicing through coupling with transcription and chromatin structure. Annu Rev Biochem 2015; 84:165-98. [PMID: 26034889 DOI: 10.1146/annurev-biochem-060614-034242] [Citation(s) in RCA: 323] [Impact Index Per Article: 32.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Alternative precursor messenger RNA (pre-mRNA) splicing plays a pivotal role in the flow of genetic information from DNA to proteins by expanding the coding capacity of genomes. Regulation of alternative splicing is as important as regulation of transcription to determine cell- and tissue-specific features, normal cell functioning, and responses of eukaryotic cells to external cues. Its importance is confirmed by the evolutionary conservation and diversification of alternative splicing and the fact that its deregulation causes hereditary disease and cancer. This review discusses the multiple layers of cotranscriptional regulation of alternative splicing in which chromatin structure, DNA methylation, histone marks, and nucleosome positioning play a fundamental role in providing a dynamic scaffold for interactions between the splicing and transcription machineries. We focus on evidence for how the kinetics of RNA polymerase II (RNAPII) elongation and the recruitment of splicing factors and adaptor proteins to chromatin components act in coordination to regulate alternative splicing.
Collapse
Affiliation(s)
- Shiran Naftelberg
- Sackler Medical School, Tel Aviv University, Tel Aviv 69978, Israel;
| | | | | | | |
Collapse
|
24
|
Sohail M, Xie J. Diverse regulation of 3' splice site usage. Cell Mol Life Sci 2015; 72:4771-93. [PMID: 26370726 PMCID: PMC11113787 DOI: 10.1007/s00018-015-2037-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2015] [Revised: 08/12/2015] [Accepted: 09/03/2015] [Indexed: 01/13/2023]
Abstract
The regulation of splice site (SS) usage is important for alternative pre-mRNA splicing and thus proper expression of protein isoforms in cells; its disruption causes diseases. In recent years, an increasing number of novel regulatory elements have been found within or nearby the 3'SS in mammalian genes. The diverse elements recruit a repertoire of trans-acting factors or form secondary structures to regulate 3'SS usage, mostly at the early steps of spliceosome assembly. Their mechanisms of action mainly include: (1) competition between the factors for RNA elements, (2) steric hindrance between the factors, (3) direct interaction between the factors, (4) competition between two splice sites, or (5) local RNA secondary structures or longer range loops, according to the mode of protein/RNA interactions. Beyond the 3'SS, chromatin remodeling/transcription, posttranslational modifications of trans-acting factors and upstream signaling provide further layers of regulation. Evolutionarily, some of the 3'SS elements seem to have emerged in mammalian ancestors. Moreover, other possibilities of regulation such as that by non-coding RNA remain to be explored. It is thus likely that there are more diverse elements/factors and mechanisms that influence the choice of an intron end. The diverse regulation likely contributes to a more complex but refined transcriptome and proteome in mammals.
Collapse
Affiliation(s)
- Muhammad Sohail
- Department of Physiology and Pathophysiology, College of Medicine, Faculty of Health Sciences, University of Manitoba, Winnipeg, MB, R3E 0J9, Canada
| | - Jiuyong Xie
- Department of Physiology and Pathophysiology, College of Medicine, Faculty of Health Sciences, University of Manitoba, Winnipeg, MB, R3E 0J9, Canada.
- Department of Biochemistry and Medical Genetics, College of Medicine, Faculty of Health Sciences, University of Manitoba, Winnipeg, MB, R3E 0J9, Canada.
| |
Collapse
|
25
|
Kelly S, Georgomanolis T, Zirkel A, Diermeier S, O'Reilly D, Murphy S, Längst G, Cook PR, Papantonis A. Splicing of many human genes involves sites embedded within introns. Nucleic Acids Res 2015; 43:4721-32. [PMID: 25897131 PMCID: PMC4482092 DOI: 10.1093/nar/gkv386] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2014] [Accepted: 04/12/2015] [Indexed: 02/03/2023] Open
Abstract
The conventional model for splicing involves excision of each intron in one piece; we demonstrate this inaccurately describes splicing in many human genes. First, after switching on transcription of SAMD4A, a gene with a 134 kb-long first intron, splicing joins the 3′ end of exon 1 to successive points within intron 1 well before the acceptor site at exon 2 is made. Second, genome-wide analysis shows that >60% of active genes yield products generated by such intermediate intron splicing. These products are present at ∼15% the levels of primary transcripts, are encoded by conserved sequences similar to those found at canonical acceptors, and marked by distinctive structural and epigenetic features. Finally, using targeted genome editing, we demonstrate that inhibiting the formation of these splicing intermediates affects efficient exon–exon splicing. These findings greatly expand the functional and regulatory complexity of the human transcriptome.
Collapse
Affiliation(s)
- Steven Kelly
- Department of Plant Sciences, University of Oxford, Oxford OX1 3RB, United Kingdom
| | | | - Anne Zirkel
- Centre for Molecular Medicine, University of Cologne, Cologne D-50931, Germany
| | - Sarah Diermeier
- Institut für Biochemie III, University of Regensburg, Regensburg D-93053, Germany
| | - Dawn O'Reilly
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, United Kingdom
| | - Shona Murphy
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, United Kingdom
| | - Gernot Längst
- Institut für Biochemie III, University of Regensburg, Regensburg D-93053, Germany
| | - Peter R Cook
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, United Kingdom
| | - Argyris Papantonis
- Centre for Molecular Medicine, University of Cologne, Cologne D-50931, Germany
| |
Collapse
|
26
|
Martinho RG, Guilgur LG, Prudêncio P. How gene expression in fast-proliferating cells keeps pace. Bioessays 2015; 37:514-24. [PMID: 25823409 DOI: 10.1002/bies.201400195] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The development of living organisms requires a precise coordination of all basic cellular processes, in space and time. Early embryogenesis of most species with externally deposited eggs starts with a series of extremely fast cleavage cycles. These divisions have a strong influence on gene expression as mitosis represses transcription and pre-mRNA processing. In this review, we will describe the distinct adaptations for efficient gene expression and discuss the emerging role of the multifunctional NineTeen Complex (NTC) in gene expression and genomic stability during fast proliferation.
Collapse
Affiliation(s)
- Rui G Martinho
- Departamento de Ciências Biomédicas e Medicina, Regenerative Medicine Program, Universidade do Algarve, Campus de Gambelas, Faro, Portugal; Center for Biomedical Research, Universidade do Algarve, Campus de Gambelas, Faro, Portugal; Instituto Gulbenkian de Ciência, Oeiras, Portugal
| | | | | |
Collapse
|
27
|
Liu B, Liu F, Fang L, Wang X, Chou KC. repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects. ACTA ACUST UNITED AC 2014; 31:1307-9. [PMID: 25504848 DOI: 10.1093/bioinformatics/btu820] [Citation(s) in RCA: 209] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2014] [Accepted: 12/05/2014] [Indexed: 12/29/2022]
Abstract
UNLABELLED In order to develop powerful computational predictors for identifying the biological features or attributes of DNAs, one of the most challenging problems is to find a suitable approach to effectively represent the DNA sequences. To facilitate the studies of DNAs and nucleotides, we developed a Python package called representations of DNAs (repDNA) for generating the widely used features reflecting the physicochemical properties and sequence-order effects of DNAs and nucleotides. There are three feature groups composed of 15 features. The first group calculates three nucleic acid composition features describing the local sequence information by means of kmers; the second group calculates six autocorrelation features describing the level of correlation between two oligonucleotides along a DNA sequence in terms of their specific physicochemical properties; the third group calculates six pseudo nucleotide composition features, which can be used to represent a DNA sequence with a discrete model or vector yet still keep considerable sequence-order information via the physicochemical properties of its constituent oligonucleotides. In addition, these features can be easily calculated based on both the built-in and user-defined properties via using repDNA. AVAILABILITY AND IMPLEMENTATION The repDNA Python package is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/repDNA/. CONTACT bliu@insun.hit.edu.cn or kcchou@gordonlifescience.org SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bin Liu
- School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Gordon Life Science Institute, Belmont, MA 02478, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Gordon Life Science Institute, Belmont, MA 02478, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Gordon Life Science Institute, Belmont, MA 02478, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia
| | - Fule Liu
- School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Gordon Life Science Institute, Belmont, MA 02478, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia
| | - Longyun Fang
- School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Gordon Life Science Institute, Belmont, MA 02478, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia
| | - Xiaolong Wang
- School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Gordon Life Science Institute, Belmont, MA 02478, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Gordon Life Science Institute, Belmont, MA 02478, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia
| | - Kuo-Chen Chou
- School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Gordon Life Science Institute, Belmont, MA 02478, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia School of Computer Science and Technology and Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China, Gordon Life Science Institute, Belmont, MA 02478, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia
| |
Collapse
|
28
|
Hassan MA, Saeij JP. Incorporating alternative splicing and mRNA editing into the genetic analysis of complex traits. Bioessays 2014; 36:1032-40. [PMID: 25171292 PMCID: PMC4280019 DOI: 10.1002/bies.201400079] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The nomination of candidate genes underlying complex traits is often focused on genetic variations that alter mRNA abundance or result in non-conservative changes in amino acids. Although inconspicuous in complex trait analysis, genetic variants that affect splicing or RNA editing can also generate proteomic diversity and impact genetic traits. Indeed, it is known that splicing and RNA editing modulate several traits in humans and model organisms. Using high-throughput RNA sequencing (RNA-seq) analysis, it is now possible to integrate the genetics of transcript abundance, alternative splicing (AS) and editing with the analysis of complex traits. We recently demonstrated that both AS and mRNA editing are modulated by genetic and environmental factors, and potentially engender phenotypic diversity in a genetically segregating mouse population. Therefore, the analysis of splicing and RNA editing can expand not only the regulatory landscape of transcriptome and proteome complexity, but also the repertoire of candidate genes for complex traits.
Collapse
Affiliation(s)
- Musa A. Hassan
- Massachusetts Institute of Technology, Department of Biology, Cambridge, MA, USA
| | - Jeroen P.J. Saeij
- Massachusetts Institute of Technology, Department of Biology, Cambridge, MA, USA
| |
Collapse
|
29
|
Exon skipping event prediction based on histone modifications. Interdiscip Sci 2014; 6:241-9. [DOI: 10.1007/s12539-013-0195-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2013] [Revised: 12/30/2013] [Accepted: 02/07/2014] [Indexed: 12/11/2022]
|
30
|
Feng Y, Luo L. Using long-range contact number information for protein secondary structure prediction. INT J BIOMATH 2014. [DOI: 10.1142/s1793524514500521] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In this paper, we first combine tetra-peptide structural words with contact number for protein secondary structure prediction. We used the method of increment of diversity combined with quadratic discriminant analysis to predict the structure of central residue for a sequence fragment. The method is used tetra-peptide structural words and long-range contact number as information resources. The accuracy of Q3 is over 83% in 194 proteins. The accuracies of predicted secondary structures for 20 amino acid residues are ranged from 81% to 88%. Moreover, we have introduced the residue long-range contact, which directly indicates the separation of contacting residue in terms of the position in the sequence, and examined the negative influence of long-range residue interactions on predicting secondary structure in a protein. The method is also compared with existing prediction methods. The results show that our method is more effective in protein secondary structures prediction.
Collapse
Affiliation(s)
- Yonge Feng
- College of Science, Inner Mongolia Agriculture University, Hohhot 010018, P. R. China
| | - Liaofu Luo
- School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, P. R. China
| |
Collapse
|
31
|
Xing YQ, Liu GQ, Zhao XJ, Zhao HY, Cai L. Genome-wide characterization and prediction of Arabidopsis thaliana replication origins. Biosystems 2014; 124:1-6. [PMID: 25050475 DOI: 10.1016/j.biosystems.2014.07.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2013] [Revised: 03/25/2014] [Accepted: 07/15/2014] [Indexed: 01/25/2023]
Abstract
Identification of replication origins is crucial for the faithful duplication of genomic DNA. The frequencies of single nucleotides and dinucleotides, GC/AT bias and GC/AT profile in the vicinity of Arabidopsis thaliana replication origins were analyzed in the present work. The guanine content or cytosine content is higher in origin of replication (Ori) than in non-Ori. The SS (S=G or C) dinucleotides are favoured in Ori whereas WW (W=A or T) dinucleotides are favoured in non-Ori. GC/AT bias and GC/AT profile in Ori are significantly different from that in non-Ori. Furthermore, by inputting DNA sequence features into support vector machine, we distinguished between the Ori and non-Ori regions in A. thaliana. The total prediction accuracy is about 69.5% as evaluated by the 10-fold cross-validation. This result suggested that apart from DNA sequence, deciphering the selection of replication origin must integrate many other factors including nucleosome positioning, DNA methylation, histone modification, etc. In addition, by comparing predictive performance we found that the predictive accuracy of SVM using sequence features on the context of WS language is significantly better than that of RY language. Furthermore, the same conclusion was also obtained in S. cerevisiae and D. melanogaster.
Collapse
Affiliation(s)
- Yong-Qiang Xing
- School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou, 014010, China; School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China; The Institute of Bioengineering and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China
| | - Guo-Qing Liu
- School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou, 014010, China; The Institute of Bioengineering and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China
| | - Xiu-Juan Zhao
- School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou, 014010, China; The Institute of Bioengineering and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China
| | - Hong-Yu Zhao
- School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou, 014010, China; The Institute of Bioengineering and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China; Inner Mongolia Key Laboratory of Biomass-Energy Conversion, Baotou, 014010, China
| | - Lu Cai
- School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou, 014010, China; The Institute of Bioengineering and Technology, Inner Mongolia University of Science and Technology, Baotou, 014010, China; Inner Mongolia Key Laboratory of Biomass-Energy Conversion, Baotou, 014010, China.
| |
Collapse
|
32
|
Zuo Y, Zhang P, Liu L, Li T, Peng Y, Li G, Li Q. Sequence-specific flexibility organization of splicing flanking sequence and prediction of splice sites in the human genome. Chromosome Res 2014; 22:321-34. [PMID: 24728765 DOI: 10.1007/s10577-014-9414-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2014] [Revised: 03/24/2014] [Accepted: 03/26/2014] [Indexed: 12/15/2022]
Abstract
More and more reported results of nucleosome positioning and histone modifications showed that DNA structure play a well-established role in splicing. In this study, a set of DNA geometric flexibility parameters originated from molecular dynamics (MD) simulations were introduced to discuss the structure organization around splice sites at the DNA level. The obtained profiles of specific flexibility/stiffness around splice sites indicated that the DNA physical-geometry deformation could be used as an alternative way to describe the splicing junction region. In combination with structural flexibility as discriminatory parameter, we developed a hybrid computational model for predicting potential splicing sites. And the better prediction performance was achieved when the benchmark dataset evaluated. Our results showed that the mechanical deformability character of a splice junction is closely correlated with both the splice site strength and structural information in its flanking sequences.
Collapse
Affiliation(s)
- Yongchun Zuo
- The Key Laboratory of National Education Ministry for Mammalian Reproductive Biology and Biotechnology, Inner Mongolia University, Hohhot, 010021, China,
| | | | | | | | | | | | | |
Collapse
|
33
|
Guo SH, Deng EZ, Xu LQ, Ding H, Lin H, Chen W, Chou KC. iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. ACTA ACUST UNITED AC 2014; 30:1522-9. [PMID: 24504871 DOI: 10.1093/bioinformatics/btu083] [Citation(s) in RCA: 282] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
MOTIVATION Nucleosome positioning participates in many cellular activities and plays significant roles in regulating cellular processes. With the avalanche of genome sequences generated in the post-genomic age, it is highly desired to develop automated methods for rapidly and effectively identifying nucleosome positioning. Although some computational methods were proposed, most of them were species specific and neglected the intrinsic local structural properties that might play important roles in determining the nucleosome positioning on a DNA sequence. RESULTS Here a predictor called 'iNuc-PseKNC' was developed for predicting nucleosome positioning in Homo sapiens, Caenorhabditis elegans and Drosophila melanogaster genomes, respectively. In the new predictor, the samples of DNA sequences were formulated by a novel feature-vector called 'pseudo k-tuple nucleotide composition', into which six DNA local structural properties were incorporated. It was observed by the rigorous cross-validation tests on the three stringent benchmark datasets that the overall success rates achieved by iNuc-PseKNC in predicting the nucleosome positioning of the aforementioned three genomes were 86.27%, 86.90% and 79.97%, respectively. Meanwhile, the results obtained by iNuc-PseKNC on various benchmark datasets used by the previous investigators for different genomes also indicated that the current predictor remarkably outperformed its counterparts. AVAILABILITY A user-friendly web-server, iNuc-PseKNC is freely accessible at http://lin.uestc.edu.cn/server/iNuc-PseKNC.
Collapse
Affiliation(s)
- Shou-Hui Guo
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China, Gordon Life Science Institute, Belmont, Massachusetts, USA, Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063000, China and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia
| | - En-Ze Deng
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China, Gordon Life Science Institute, Belmont, Massachusetts, USA, Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063000, China and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia
| | - Li-Qin Xu
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China, Gordon Life Science Institute, Belmont, Massachusetts, USA, Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063000, China and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China, Gordon Life Science Institute, Belmont, Massachusetts, USA, Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063000, China and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China, Gordon Life Science Institute, Belmont, Massachusetts, USA, Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063000, China and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi ArabiaKey Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China, Gordon Life Science Institute, Belmont, Massachusetts, USA, Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063000, China and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia
| | - Wei Chen
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China, Gordon Life Science Institute, Belmont, Massachusetts, USA, Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063000, China and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi ArabiaKey Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China, Gordon Life Science Institute, Belmont, Massachusetts, USA, Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063000, China and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia
| | - Kuo-Chen Chou
- Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China, Gordon Life Science Institute, Belmont, Massachusetts, USA, Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063000, China and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi ArabiaKey Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China, Gordon Life Science Institute, Belmont, Massachusetts, USA, Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063000, China and Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
34
|
Intragenic DNA methylation in transcriptional regulation, normal differentiation and cancer. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2013; 1829:1161-74. [PMID: 23938249 DOI: 10.1016/j.bbagrm.2013.08.001] [Citation(s) in RCA: 160] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2013] [Revised: 08/02/2013] [Accepted: 08/05/2013] [Indexed: 02/06/2023]
Abstract
Ever since the discovery of DNA methylation at cytosine residues, the role of this so called fifth base has been extensively studied and debated. Until recently, the majority of DNA methylation studies focused on the analysis of CpG islands associated to promoter regions. However, with the upcoming possibilities to study DNA methylation in a genome-wide context, this epigenetic mark can now be studied in an unbiased manner. As a result, recent studies have shown that not only promoters but also intragenic and intergenic regions are widely modulated during physiological processes and disease. In particular, it is becoming increasingly clear that DNA methylation in the gene body is not just a passive witness of gene transcription but it seems to be actively involved in multiple gene regulation processes. In this review we discuss the potential role of intragenic DNA methylation in alternative promoter usage, regulation of short and long non-coding RNAs, alternative RNA processing, as well as enhancer activity. Furthermore, we summarize how the intragenic DNA methylome is modified both during normal cell differentiation and neoplastic transformation.
Collapse
|
35
|
Deng N, Sanchez CG, Lasky JA, Zhu D. Detecting splicing variants in idiopathic pulmonary fibrosis from non-differentially expressed genes. PLoS One 2013; 8:e68352. [PMID: 23844188 PMCID: PMC3699530 DOI: 10.1371/journal.pone.0068352] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2012] [Accepted: 06/01/2013] [Indexed: 12/14/2022] Open
Abstract
Idiopathic pulmonary fibrosis (IPF) is an interstitial lung disease of unknown cause that lacks a proven therapy for altering its high mortality rate. Microarrays have been employed to investigate the pathogenesis of IPF, but are presented mostly at the gene-expression level due to technologic limitations. In as much as, alternative RNA splicing isoforms are increasingly identified as potential regulators of human diseases, including IPF, we propose a new approach with the capacity to detect splicing variants using RNA-seq data. We conducted a joint analysis of differential expression and differential splicing on annotated human genes and isoforms, and identified 122 non-differentially expressed genes with a high degree of "switch" between major and minor isoforms. Three cases with variant mechanisms for alternative splicing were validated using qRT-PCR, among the group of genes in which expression was not significantly changed at the gene level. We also identified 35 novel transcripts that were unique to the fibrotic lungs using exon-exon junction evidence, and selected a representative for qRT-PCR validation. The results of our study are likely to provide new insight into the pathogenesis of pulmonary fibrosis and may eventuate in new treatment targets.
Collapse
Affiliation(s)
- Nan Deng
- Department of Computer Science, Wayne State University, Detroit, Michigan, United States of America
| | - Cecilia G. Sanchez
- Tulane Cancer Center, School of Medicine, Tulane University, New Orleans, Louisiana, United States of America
| | - Joseph A. Lasky
- Tulane Cancer Center, School of Medicine, Tulane University, New Orleans, Louisiana, United States of America
- * E-mail: (DZ); (JAL)
| | - Dongxiao Zhu
- Department of Computer Science, Wayne State University, Detroit, Michigan, United States of America
- * E-mail: (DZ); (JAL)
| |
Collapse
|
36
|
Nishida H, Katayama T, Suzuki Y, Kondo S, Horiuchi H. Base composition and nucleosome density in exonic and intronic regions in genes of the filamentous ascomycetes Aspergillus nidulans and Aspergillus oryzae. Gene 2013; 525:5-10. [PMID: 23664982 DOI: 10.1016/j.gene.2013.04.077] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2013] [Revised: 04/18/2013] [Accepted: 04/22/2013] [Indexed: 12/18/2022]
Abstract
We sequenced nucleosomal DNA fragments of the filamentous ascomycetes Aspergillus nidulans and Aspergillus oryzae and then mapped those sequences on their genomes. We compared the GC content and nucleosome density in the exonic and intronic regions in the genes of A. nidulans and A. oryzae. Although the GC content and nucleosome density in the exonic regions tended to be higher than those in the intronic regions, the difference in the distribution of the GC content was more notable than that of the nucleosome density. Next, we compared the GC content and nucleosome density in the exonic regions of 9616 orthologous gene pairs. In both Aspergillus species, the GC content did not correlate with the nucleosome density. In addition, the Spearman's rank correlation coefficient (ρ=0.51) between the GC content of the exonic regions of the 9616 orthologous gene pairs was higher than that (ρ=0.31) of the nucleosome densities of A. nidulans and A. oryzae. These results strongly suggest that the GC content in the exons of the orthologous gene pairs has been conserved during evolution but the nucleosome density has varied throughout.
Collapse
Affiliation(s)
- Hiromi Nishida
- Agricultural Bioinformatics Research Unit, Graduate School of Agricultural and Life Sciences, University of Tokyo, Tokyo 113-8657, Japan.
| | | | | | | | | |
Collapse
|
37
|
DNA-methylation effect on cotranscriptional splicing is dependent on GC architecture of the exon-intron structure. Genome Res 2013; 23:789-99. [PMID: 23502848 PMCID: PMC3638135 DOI: 10.1101/gr.143503.112] [Citation(s) in RCA: 169] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
DNA methylation is known to regulate transcription and was recently found to be involved in exon recognition via cotranscriptional splicing. We recently observed that exon–intron architectures can be grouped into two classes: one with higher GC content in exons compared to the flanking introns, and the other with similar GC content in exons and introns. The first group has higher nucleosome occupancy on exons than introns, whereas the second group exhibits weak nucleosome marking of exons, suggesting another type of epigenetic marker distinguishes exons from introns when GC content is similar. We find different and specific patterns of DNA methylation in each of the GC architectures; yet in both groups, DNA methylation clearly marks the exons. Exons of the leveled GC architecture exhibit a significantly stronger DNA methylation signal in relation to their flanking introns compared to exons of the differential GC architecture. This is accentuated by a reduction of the DNA methylation level in the intronic sequences in proximity to the splice sites and shows that different epigenetic modifications mark the location of exons already at the DNA level. Also, lower levels of methylated CpGs on alternative exons can successfully distinguish alternative exons from constitutive ones. Three positions at the splice sites show high CpG abundance and accompany elevated nucleosome occupancy in a leveled GC architecture. Overall, these results suggest that DNA methylation affects exon recognition and is influenced by the GC architecture of the exon and flanking introns.
Collapse
|
38
|
Kelemen O, Convertini P, Zhang Z, Wen Y, Shen M, Falaleeva M, Stamm S. Function of alternative splicing. Gene 2013; 514:1-30. [PMID: 22909801 PMCID: PMC5632952 DOI: 10.1016/j.gene.2012.07.083] [Citation(s) in RCA: 548] [Impact Index Per Article: 45.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2012] [Revised: 07/21/2012] [Accepted: 07/30/2012] [Indexed: 12/15/2022]
Abstract
Almost all polymerase II transcripts undergo alternative pre-mRNA splicing. Here, we review the functions of alternative splicing events that have been experimentally determined. The overall function of alternative splicing is to increase the diversity of mRNAs expressed from the genome. Alternative splicing changes proteins encoded by mRNAs, which has profound functional effects. Experimental analysis of these protein isoforms showed that alternative splicing regulates binding between proteins, between proteins and nucleic acids as well as between proteins and membranes. Alternative splicing regulates the localization of proteins, their enzymatic properties and their interaction with ligands. In most cases, changes caused by individual splicing isoforms are small. However, cells typically coordinate numerous changes in 'splicing programs', which can have strong effects on cell proliferation, cell survival and properties of the nervous system. Due to its widespread usage and molecular versatility, alternative splicing emerges as a central element in gene regulation that interferes with almost every biological function analyzed.
Collapse
Affiliation(s)
- Olga Kelemen
- Department of Molecular and Cellular Biochemistry, University of Kentucky, Lexington, Kentucky, United States of America
| | - Paolo Convertini
- Department of Molecular and Cellular Biochemistry, University of Kentucky, Lexington, Kentucky, United States of America
| | - Zhaiyi Zhang
- Department of Molecular and Cellular Biochemistry, University of Kentucky, Lexington, Kentucky, United States of America
| | - Yuan Wen
- Department of Molecular and Cellular Biochemistry, University of Kentucky, Lexington, Kentucky, United States of America
| | - Manli Shen
- Department of Molecular and Cellular Biochemistry, University of Kentucky, Lexington, Kentucky, United States of America
| | - Marina Falaleeva
- Department of Molecular and Cellular Biochemistry, University of Kentucky, Lexington, Kentucky, United States of America
| | - Stefan Stamm
- Department of Molecular and Cellular Biochemistry, University of Kentucky, Lexington, Kentucky, United States of America
| |
Collapse
|
39
|
Teves SS, Henikoff S. The heat shock response: A case study of chromatin dynamics in gene regulation. Biochem Cell Biol 2013; 91:42-8. [DOI: 10.1139/bcb-2012-0075] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Recent studies in transcriptional regulation using the Drosophila heat shock response system have elucidated many of the dynamic regulatory processes that govern transcriptional activation and repression. The classic view that the control of gene expression occurs at the point of RNA polymerase II (Pol II) recruitment is now giving way to a more complex outlook of gene regulation. Promoter chromatin dynamics coordinate with transcription factor binding to maintain the promoters of active genes accessible. For a large number of genes, the rate-limiting step in Pol II progression occurs during its initial elongation, where Pol II transcribes 30–50 bp and pauses for further signals. These paused genes have unique genic chromatin architecture and dynamics compared with genes where Pol II recruitment is rate limiting for expression. Further elongation of Pol II along the gene causes nucleosome turnover, a continuous process of eviction and replacement, which suggests a potential mechanism for Pol II transit along a nucleosomal template. In this review, we highlight recent insights into transcription regulation of the heat shock response and discuss how the dynamic regulatory processes involved at each transcriptional stage help to generate faithful yet highly responsive gene expression.
Collapse
Affiliation(s)
- Sheila S. Teves
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA USA
| | - Steven Henikoff
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
- Howard Hughes Medical Institute, Seattle, WA 98109, USA
| |
Collapse
|
40
|
Nishida H. Nucleosome Positioning. ISRN MOLECULAR BIOLOGY 2012; 2012:245706. [PMID: 27335664 PMCID: PMC4890889 DOI: 10.5402/2012/245706] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/19/2012] [Accepted: 09/17/2012] [Indexed: 02/05/2023]
Abstract
Nucleosome positioning is not only related to genomic DNA compaction but also to other biological functions. After the chromatin is digested by micrococcal nuclease, nucleosomal (nucleosome-bound) DNA fragments can be sequenced and mapped on the genomic DNA sequence. Due to the development of modern DNA sequencing technology, genome-wide nucleosome mapping has been performed in a wide range of eukaryotic species. Comparative analyses of the nucleosome positions have revealed that the nucleosome is more frequently formed in exonic than intronic regions, and that most of transcription start and translation (or transcription) end sites are located in nucleosome linker DNA regions, indicating that nucleosome positioning influences transcription initiation, transcription termination, and gene splicing. In addition, nucleosomal DNA contains guanine and cytosine (G + C)-rich sequences and a high level of cytosine methylation. Thus, the nucleosome positioning system has been conserved during eukaryotic evolution.
Collapse
Affiliation(s)
- Hiromi Nishida
- Agricultural Bioinformatics Research Unit, Graduate School of Agricultural and Life Sciences, University of Tokyo, Tokyo 113-8657, Japan
| |
Collapse
|
41
|
Huang H, Yu S, Liu H, Sun X. Nucleosome organization in sequences of alternative events in human genome. Biosystems 2012; 109:214-9. [DOI: 10.1016/j.biosystems.2012.05.011] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2011] [Revised: 05/25/2012] [Accepted: 05/28/2012] [Indexed: 12/01/2022]
|
42
|
Abstract
The intron–exon architecture of many eukaryotic genes raises the intriguing question of whether this unique organization serves any function, or is it simply a result of the spread of functionless introns in eukaryotic genomes. In this review, we show that introns in contemporary species fulfill a broad spectrum of functions, and are involved in virtually every step of mRNA processing. We propose that this great diversity of intronic functions supports the notion that introns were indeed selfish elements in early eukaryotes, but then independently gained numerous functions in different eukaryotic lineages. We suggest a novel criterion of evolutionary conservation, dubbed intron positional conservation, which can identify functional introns.
Collapse
Affiliation(s)
- Michal Chorev
- Department of Genetics, The Alexander Silberman Institute of Life Sciences, Faculty of Science, The Hebrew University of Jerusalem Jerusalem, Israel
| | | |
Collapse
|
43
|
High resolution positioning of intron ends on the nucleosomes. Gene 2011; 489:6-10. [DOI: 10.1016/j.gene.2011.08.022] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2011] [Revised: 08/20/2011] [Accepted: 08/26/2011] [Indexed: 01/23/2023]
|
44
|
Xing Y, Zhao X, Cai L. Prediction of nucleosome occupancy in Saccharomyces cerevisiae using position-correlation scoring function. Genomics 2011; 98:359-66. [DOI: 10.1016/j.ygeno.2011.07.008] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2011] [Revised: 07/16/2011] [Accepted: 07/26/2011] [Indexed: 10/17/2022]
|
45
|
Farlow A, Dolezal M, Hua L, Schlötterer C. The genomic signature of splicing-coupled selection differs between long and short introns. Mol Biol Evol 2011; 29:21-4. [PMID: 21878685 PMCID: PMC3245539 DOI: 10.1093/molbev/msr201] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Understanding the function of noncoding regions in the genome, such as introns, is of central importance to evolutionary biology. One approach is to assay for the targets of natural selection. On one hand, the sequence of introns, especially short introns, appears to evolve in an almost neutral manner. Whereas on the other hand, a large proportion of intronic sequence is under selective constraint. This discrepancy is largely dependent on intron length and differences in the methods used to infer selection. We have used a method based on DNA strand asymmetery that does not require comparison with any putatively neutrally evolving sequence, nor sequence conservation between species, to detect selection within introns. The strongest signal we identify is associated with short introns. This signal comes from a family of motifs that could act as cryptic 5′ splice sites during mRNA processing, suggesting a mechanistic justification underlying this signal of selection. Together with an analysis of intron length and splice site strength, we observe that the genomic signature of splicing-coupled selection differs between long and short introns.
Collapse
Affiliation(s)
- Ashley Farlow
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
- Present address: Gregor Mendel Institute of Molecular Plant Biology, Vienna, Austria
| | - Marlies Dolezal
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
| | - Liushuai Hua
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
- Present address: College of Animal Science and Technology, Shaanxi Key Laboratory of Molecular Biology for Agriculture, Northwest A&F University, Yangling, Shaanxi, China
| | - Christian Schlötterer
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
- Corresponding author: E-mail:
| |
Collapse
|
46
|
Pandya-Jones A. Pre-mRNA splicing during transcription in the mammalian system. WILEY INTERDISCIPLINARY REVIEWS-RNA 2011; 2:700-17. [PMID: 21823230 DOI: 10.1002/wrna.86] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Splicing of RNA polymerase II transcripts is a crucial step in gene expression and a key generator of mRNA diversity. Splicing and transcription have generally been studied in isolation, although in vivo pre-mRNA splicing occurs in concert with transcription. The two processes appear to be functionally connected because a number of variables that regulate transcription have been identified as also influencing splicing. However, the mechanisms that couple the two processes are largely unknown. This review highlights the observations that implicate splicing as occurring during transcription and describes the evidence supporting functional interactions between the two processes. I discuss postulated models of how splicing couples to transcription and consider the potential impact that such coupling might have on exon recognition. WIREs RNA 2011 2 700-717 DOI: 10.1002/wrna.86 For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Amy Pandya-Jones
- Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles (UCLA), USA.
| |
Collapse
|
47
|
Detection and removal of biases in the analysis of next-generation sequencing reads. PLoS One 2011; 6:e16685. [PMID: 21304912 PMCID: PMC3031631 DOI: 10.1371/journal.pone.0016685] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2010] [Accepted: 01/11/2011] [Indexed: 01/03/2023] Open
Abstract
Since the emergence of next-generation sequencing (NGS) technologies, great effort has been put into the development of tools for analysis of the short reads. In parallel, knowledge is increasing regarding biases inherent in these technologies. Here we discuss four different biases we encountered while analyzing various Illumina datasets. These biases are due to both biological and statistical effects that in particular affect comparisons between different genomic regions. Specifically, we encountered biases pertaining to the distributions of nucleotides across sequencing cycles, to mappability, to contamination of pre-mRNA with mRNA, and to non-uniform hydrolysis of RNA. Most of these biases are not specific to one analyzed dataset, but are present across a variety of datasets and within a variety of genomic contexts. Importantly, some of these biases correlated in a highly significant manner with biological features, including transcript length, gene expression levels, conservation levels, and exon-intron architecture, misleadingly increasing the credibility of results due to them. We also demonstrate the relevance of these biases in the context of analyzing an NGS dataset mapping transcriptionally engaged RNA polymerase II (RNAPII) in the context of exon-intron architecture, and show that elimination of these biases is crucial for avoiding erroneous interpretation of the data. Collectively, our results highlight several important pitfalls, challenges and approaches in the analysis of NGS reads.
Collapse
|
48
|
Ringrose L. How do RNA sequence, DNA sequence, and chromatin properties regulate splicing? F1000 BIOLOGY REPORTS 2010; 2:74. [PMID: 21173847 PMCID: PMC2989630 DOI: 10.3410/b2-74] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Recent genome-wide studies have revealed a remarkable correspondence between nucleosome positions and exon-intron boundaries, and several studies have implicated specific histone modifications in regulating alternative splicing. In addition, recent progress in cracking the ‘splicing code’ shows that sequence motifs carried on the nascent RNA molecule itself are sufficient to accurately predict tissue-specific alternative splicing patterns. Together, these studies shed light on the complex interplay between RNA sequence, DNA sequence, and chromatin properties in regulating splicing.
Collapse
Affiliation(s)
- Leonie Ringrose
- IMBA - Institute of Molecular Biotechnology Dr Bohr-Gasse 3, 1030 Vienna Austria
| |
Collapse
|
49
|
Niu DK, Cao JL. Nucleosome deposition and DNA methylation may participate in the recognition of premature termination codon in nonsense-mediated mRNA decay. FEBS Lett 2010; 584:3509-12. [PMID: 20674569 DOI: 10.1016/j.febslet.2010.07.046] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2010] [Accepted: 07/26/2010] [Indexed: 11/25/2022]
Abstract
In non-mammalian eukaryotes, an abnormally long 3' untranslated region (UTR) is generally thought to be the definitive signal in the recognition of a premature termination codon (PTC) in nonsense-mediated mRNA decay (NMD). However, because the lengths of 3' UTRs in normal mRNAs are widely distributed, "abnormally long" is hard to define. Distinct peaks of nucleosome deposition and DNA methylation have recently been found at coding region boundaries. We propose that nucleosomes and DNA methylation just upstream of a normal stop codon are ideal indicators for the position of a normal stop codon and may thus serve as signals in PTC recognition.
Collapse
Affiliation(s)
- Deng-Ke Niu
- Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing 100875, China.
| | | |
Collapse
|
50
|
Chromatin density and splicing destiny: on the cross-talk between chromatin structure and splicing. EMBO J 2010; 29:1629-36. [PMID: 20407423 DOI: 10.1038/emboj.2010.71] [Citation(s) in RCA: 106] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2010] [Accepted: 03/26/2010] [Indexed: 12/11/2022] Open
Abstract
How are short exonic sequences recognized within the vast intronic oceans in which they reside? Despite decades of research, this remains one of the most fundamental, yet enigmatic, questions in the field of pre-mRNA splicing research. For many years, studies aiming to shed light on this process were focused at the RNA level, characterizing the manner by which splicing factors and auxiliary proteins interact with splicing signals, thereby enabling, facilitating and regulating splicing. However, we increasingly understand that splicing is not an isolated process; rather it occurs co-transcriptionally and is presumably also regulated by transcription-related processes. In fact, studies by our group and others over the past year suggest that DNA structure in terms of nucleosome positioning and specific histone modifications, which have a well established role in transcription, may also have a role in splicing. In this review we discuss evidence for the coupling between transcription and splicing, focusing on recent findings suggesting a link between chromatin structure and splicing, and highlighting challenges this emerging field is facing.
Collapse
|