1
|
Santucci K, Cheng Y, Xu SM, Gao Y, Lindner G, Takenaka K, Janitz M. Discovery of Novel Protein-Coding and Long Non-coding Transcripts in Distinct Regions of the Human Brain. J Mol Neurosci 2025; 75:30. [PMID: 40048072 PMCID: PMC11885362 DOI: 10.1007/s12031-025-02316-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2024] [Accepted: 02/06/2025] [Indexed: 03/09/2025]
Abstract
Recent improvements in the accuracy of long-read sequencing (LRS) technologies have expanded the scope for novel transcriptional isoform discovery. Additionally, these advancements have improved the precision of transcript quantification, enabling a more accurate reconstruction of complex splicing patterns and transcriptomes. Thus, this project aims to take advantage of these analytical developments for the discovery and analysis of RNA isoforms in the human brain. A set of novel transcript isoforms was compiled using three bioinformatic tools, quantifying their expression across eight replicates of the cerebellar hemisphere, five replicates of the frontal cortex, and six replicates of the putamen. By taking a subset of the novel isoforms consistent across all discovery methods, a set of 170 highly confident novel RNA isoforms was curated for downstream analysis. This set consisted of 104 messenger RNAs (mRNAs) and 66 long non-coding RNAs (lncRNAs) isoforms. The detailed structure, expression, and potential encoded proteins of novel mRNA isoform BambuTx321 have been further described as an exemplary representative. Additionally, the tissue-specific expression [mean counts per million (CPM) of 5.979] of novel lncRNA, BambuTx1299, in the cerebellar hemisphere was observed. Overall, this project has identified and annotated several novel RNA isoforms across diverse tissues of the human brain, providing insights into their expression patterns and investigating their potential functional roles. Thus, this project has contributed to a more comprehensive understanding of the brain's transcriptomic landscape for applications in basic research.
Collapse
Affiliation(s)
- Kristina Santucci
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia
| | - Yuning Cheng
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia
| | - Si-Mei Xu
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia
| | - Yulan Gao
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia
| | - Grace Lindner
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia
| | - Konii Takenaka
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia
| | - Michael Janitz
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, Australia.
| |
Collapse
|
2
|
Gaugel J, Jähnert M, Neumann A, Heyd F, Schürmann A, Vogel H. Alternative splicing landscape in mouse skeletal muscle and adipose tissue: Effects of intermittent fasting and exercise. J Nutr Biochem 2025; 137:109837. [PMID: 39725041 DOI: 10.1016/j.jnutbio.2024.109837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Revised: 11/28/2024] [Accepted: 12/20/2024] [Indexed: 12/28/2024]
Abstract
Alternative splicing contributes to diversify the cellular protein landscape, but aberrant splicing is implicated in many diseases. To which extent mis-splicing contributes to insulin resistance as the causal defect of type 2 diabetes and whether this can be reversed by lifestyle interventions is largely unknown. Therefore, RNA sequencing data from skeletal muscle and adipose tissue of diabetes-susceptible NZO mice treated with or without intermittent fasting and of healthy C57BL/6J mice subjected to exercise were analyzed for alternative splicing differences using Whippet and rMATS. Diet and exercise interventions triggered comparable levels of splicing changes, although the splicing profile of skeletal muscle appeared to be more flexible than that of adipose tissue, with 72-114 differential splicing events in muscle and less than 25 in adipose tissue. Splicing changes induced by time-restricted feeding, alternate-day fasting and exercise were generally mild, with a maximal percent spliced in (PSI) difference of 67%, indicating that alternative splicing plays a rather minor role in lifestyle-induced adaptations of muscle and adipose tissue in mice. However, intron retention contributed to the regulation of gene expression, influencing genes whose expression was directly linked to phenotypic parameters (e.g. Eno2 and Pan2). Alternate-day fasting promoted skipping of exon 7 in Mlxipl (coding for ChREBP), thereby affecting the glucose sensing module of this carbohydrate-responsive transcription factor. Both intermittent fasting and exercise training led to alternative splicing of known diabetes-related GWAS genes (e.g. Abcc8, Ifnar2, Smarcad1), highlighting the potential metabolic relevance of these changes.
Collapse
Affiliation(s)
- Jasmin Gaugel
- Research Group Nutrigenomics of Obesity and Department of Experimental Diabetology, German Institute of Human Nutrition Potsdam-Rehbruecke, Nuthetal, Germany; German Center for Diabetes Research (DZD), München-Neuherberg, Germany; Research Group Molecular and Clinical Life Science of Metabolic Diseases, Faculty of Health Sciences Brandenburg, University of Potsdam, Brandenburg, Germany
| | - Markus Jähnert
- Research Group Nutrigenomics of Obesity and Department of Experimental Diabetology, German Institute of Human Nutrition Potsdam-Rehbruecke, Nuthetal, Germany; German Center for Diabetes Research (DZD), München-Neuherberg, Germany
| | - Alexander Neumann
- Laboratory of RNA Biochemistry, Institute of Chemistry and Biochemistry, Freie Universität Berlin, Berlin, Germany; Omiqa Bioinformatics, Berlin, Germany
| | - Florian Heyd
- Laboratory of RNA Biochemistry, Institute of Chemistry and Biochemistry, Freie Universität Berlin, Berlin, Germany
| | - Annette Schürmann
- Research Group Nutrigenomics of Obesity and Department of Experimental Diabetology, German Institute of Human Nutrition Potsdam-Rehbruecke, Nuthetal, Germany; German Center for Diabetes Research (DZD), München-Neuherberg, Germany; Institute of Nutritional Science, University of Potsdam, Nuthetal, Germany
| | - Heike Vogel
- Research Group Nutrigenomics of Obesity and Department of Experimental Diabetology, German Institute of Human Nutrition Potsdam-Rehbruecke, Nuthetal, Germany; German Center for Diabetes Research (DZD), München-Neuherberg, Germany; Research Group Molecular and Clinical Life Science of Metabolic Diseases, Faculty of Health Sciences Brandenburg, University of Potsdam, Brandenburg, Germany.
| |
Collapse
|
3
|
Santucci K, Cheng Y, Xu SM, Janitz M. Enhancing novel isoform discovery: leveraging nanopore long-read sequencing and machine learning approaches. Brief Funct Genomics 2024; 23:683-694. [PMID: 39158328 DOI: 10.1093/bfgp/elae031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 07/29/2024] [Accepted: 07/31/2024] [Indexed: 08/20/2024] Open
Abstract
Long-read sequencing technologies can capture entire RNA transcripts in a single sequencing read, reducing the ambiguity in constructing and quantifying transcript models in comparison to more common and earlier methods, such as short-read sequencing. Recent improvements in the accuracy of long-read sequencing technologies have expanded the scope for novel splice isoform detection and have also enabled a far more accurate reconstruction of complex splicing patterns and transcriptomes. Additionally, the incorporation and advancements of machine learning and deep learning algorithms in bioinformatic software have significantly improved the reliability of long-read sequencing transcriptomic studies. However, there is a lack of consensus on what bioinformatic tools and pipelines produce the most precise and consistent results. Thus, this review aims to discuss and compare the performance of available methods for novel isoform discovery with long-read sequencing technologies, with 25 tools being presented. Furthermore, this review intends to demonstrate the need for developing standard analytical pipelines, tools, and transcript model conventions for novel isoform discovery and transcriptomic studies.
Collapse
Affiliation(s)
- Kristina Santucci
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| | - Yuning Cheng
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| | - Si-Mei Xu
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| | - Michael Janitz
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| |
Collapse
|
4
|
Hosseini M, Palmer A, Manka W, Grady PGS, Patchigolla V, Bi J, O'Neill RJ, Chi Z, Aguiar D. Deep statistical modelling of nanopore sequencing translocation times reveals latent non-B DNA structures. Bioinformatics 2023; 39:i242-i251. [PMID: 37387144 DOI: 10.1093/bioinformatics/btad220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Non-canonical (or non-B) DNA are genomic regions whose three-dimensional conformation deviates from the canonical double helix. Non-B DNA play an important role in basic cellular processes and are associated with genomic instability, gene regulation, and oncogenesis. Experimental methods are low-throughput and can detect only a limited set of non-B DNA structures, while computational methods rely on non-B DNA base motifs, which are necessary but not sufficient indicators of non-B structures. Oxford Nanopore sequencing is an efficient and low-cost platform, but it is currently unknown whether nanopore reads can be used for identifying non-B structures. RESULTS We build the first computational pipeline to predict non-B DNA structures from nanopore sequencing. We formalize non-B detection as a novelty detection problem and develop the GoFAE-DND, an autoencoder that uses goodness-of-fit (GoF) tests as a regularizer. A discriminative loss encourages non-B DNA to be poorly reconstructed and optimizing Gaussian GoF tests allows for the computation of P-values that indicate non-B structures. Based on whole genome nanopore sequencing of NA12878, we show that there exist significant differences between the timing of DNA translocation for non-B DNA bases compared with B-DNA. We demonstrate the efficacy of our approach through comparisons with novelty detection methods using experimental data and data synthesized from a new translocation time simulator. Experimental validations suggest that reliable detection of non-B DNA from nanopore sequencing is achievable. AVAILABILITY AND IMPLEMENTATION Source code is available at https://github.com/bayesomicslab/ONT-nonb-GoFAE-DND.
Collapse
Affiliation(s)
- Marjan Hosseini
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-4155, United States
| | - Aaron Palmer
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-4155, United States
| | - William Manka
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-4155, United States
| | - Patrick G S Grady
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269-3003, United States
| | - Venkata Patchigolla
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-4155, United States
| | - Jinbo Bi
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-4155, United States
| | - Rachel J O'Neill
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269-3003, United States
| | - Zhiyi Chi
- Department of Statistics, University of Connecticut, Storrs, CT 06269-4120, United States
| | - Derek Aguiar
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269-4155, United States
| |
Collapse
|
5
|
Ringeling FR, Chakraborty S, Vissers C, Reiman D, Patel AM, Lee KH, Hong A, Park CW, Reska T, Gagneur J, Chang H, Spletter ML, Yoon KJ, Ming GL, Song H, Canzar S. Partitioning RNAs by length improves transcriptome reconstruction from short-read RNA-seq data. Nat Biotechnol 2022; 40:741-750. [PMID: 35013600 PMCID: PMC11332977 DOI: 10.1038/s41587-021-01136-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Accepted: 10/26/2021] [Indexed: 02/06/2023]
Abstract
The accuracy of methods for assembling transcripts from short-read RNA sequencing data is limited by the lack of long-range information. Here we introduce Ladder-seq, an approach that separates transcripts according to their lengths before sequencing and uses the additional information to improve the quantification and assembly of transcripts. Using simulated data, we show that a kallisto algorithm extended to process Ladder-seq data quantifies transcripts of complex genes with substantially higher accuracy than conventional kallisto. For reference-based assembly, a tailored scheme based on the StringTie2 algorithm reconstructs a single transcript with 30.8% higher precision than its conventional counterpart and is more than 30% more sensitive for complex genes. For de novo assembly, a similar scheme based on the Trinity algorithm correctly assembles 78% more transcripts than conventional Trinity while improving precision by 78%. In experimental data, Ladder-seq reveals 40% more genes harboring isoform switches compared to conventional RNA sequencing and unveils widespread changes in isoform usage upon m6A depletion by Mettl14 knockout.
Collapse
Affiliation(s)
| | | | - Caroline Vissers
- Department of Biochemistry & Biophysics, University of California, San Francisco, San Francisco, CA, USA
| | - Derek Reiman
- Department of Biomedical Engineering, University of Illinois at Chicago, Chicago, IL, USA
| | - Akshay M Patel
- Gene Center, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Ki-Heon Lee
- Department of Biological Sciences, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
| | - Ari Hong
- Center for RNA Research, Institute for Basic Science (IBS), Seoul, Republic of Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Chan-Woo Park
- Department of Biological Sciences, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
| | - Tim Reska
- Gene Center, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Julien Gagneur
- Department of Informatics, Technical University of Munich, Garching, Germany
- Institute of Human Genetics, Technical University of Munich, Munich, Germany
- Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany
| | - Hyeshik Chang
- Center for RNA Research, Institute for Basic Science (IBS), Seoul, Republic of Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Maria L Spletter
- Biomedical Center, Department of Physiological Chemistry, Ludwig-Maximilians-Universität München, Martinsried-Planegg, Germany
| | - Ki-Jun Yoon
- Department of Biological Sciences, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
| | - Guo-Li Ming
- Department of Neuroscience and Mahoney Institute for Neurosciences, University of Pennsylvania, Philadelphia, PA, USA
| | - Hongjun Song
- Department of Neuroscience and Mahoney Institute for Neurosciences, University of Pennsylvania, Philadelphia, PA, USA
| | - Stefan Canzar
- Gene Center, Ludwig-Maximilians-Universität München, Munich, Germany.
| |
Collapse
|
6
|
Shi X, Neuwald AF, Wang X, Wang TL, Hilakivi-Clarke L, Clarke R, Xuan J. IntAPT: integrated assembly of phenotype-specific transcripts from multiple RNA-seq profiles. Bioinformatics 2021; 37:650-658. [PMID: 33016988 PMCID: PMC8097681 DOI: 10.1093/bioinformatics/btaa852] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2019] [Revised: 08/27/2020] [Accepted: 09/21/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION High-throughput RNA sequencing has revolutionized the scope and depth of transcriptome analysis. Accurate reconstruction of a phenotype-specific transcriptome is challenging due to the noise and variability of RNA-seq data. This requires computational identification of transcripts from multiple samples of the same phenotype, given the underlying consensus transcript structure. RESULTS We present a Bayesian method, integrated assembly of phenotype-specific transcripts (IntAPT), that identifies phenotype-specific isoforms from multiple RNA-seq profiles. IntAPT features a novel two-layer Bayesian model to capture the presence of isoforms at the group layer and to quantify the abundance of isoforms at the sample layer. A spike-and-slab prior is used to model the isoform expression and to enforce the sparsity of expressed isoforms. Dependencies between the existence of isoforms and their expression are modeled explicitly to facilitate parameter estimation. Model parameters are estimated iteratively using Gibbs sampling to infer the joint posterior distribution, from which the presence and abundance of isoforms can reliably be determined. Studies using both simulations and real datasets show that IntAPT consistently outperforms existing methods for the IntAPT. Experimental results demonstrate that, despite sequencing errors, IntAPT exhibits a robust performance among multiple samples, resulting in notably improved identification of expressed isoforms of low abundance. AVAILABILITY AND IMPLEMENTATION The IntAPT package is available at http://github.com/henryxushi/IntAPT. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xu Shi
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| | - Andrew F Neuwald
- Institute for Genome Sciences and Department of Biochemistry & Molecular Biology, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Xiao Wang
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| | - Tian-Li Wang
- Department of Pathology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, USA
| | | | - Robert Clarke
- Hormel Institute, University of Minnesota, 801 16th Ave NE, Austin, MN 55912, USA
| | - Jianhua Xuan
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA
| |
Collapse
|
7
|
Functional and structural features of proteins associated with alternative splicing. Int J Biol Macromol 2020; 147:513-520. [PMID: 31931065 DOI: 10.1016/j.ijbiomac.2019.09.241] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2019] [Revised: 09/16/2019] [Accepted: 09/21/2019] [Indexed: 12/16/2022]
Abstract
The alternative splicing is a mechanism increasing the number of expressed proteins and a variety of these functions. We uncovered the protein domains most frequently lacked or occurred in the splice variants. Proteins presented by several isoforms participate in such processes as transcription regulation, immune response, etc. Our results displayed the association of alternative splicing with branched regulatory pathways. By considering the published data on the protein proteins encoded by the 18th human chromosome, we noted that alternative products display the differences in several functional features, such as phosphorylation, subcellular location, ligand specificity, protein-protein interactions, etc. The investigation of alternative variants referred to the protein kinase domain was performed by comparing the alternative sequences with 3D structures. It was shown that large enough insertions/deletions could be compatible with the kinase fold if they match between the conserved secondary structures. Using the 3D data on human proteins, we showed that conformational flexibility could accommodate fold alterations in splice variants. The investigations of structural and functional differences in splice isoforms are required to understand how to distinguish the isoforms expressed as functioning proteins from the non-realized transcripts. These studies allow filling the gap between genomic and proteomic data.
Collapse
|