1
|
Chitkara P, Singh A, Gangwar R, Bhardwaj R, Zahra S, Arora S, Hamid F, Arya A, Sahu N, Chakraborty S, Ramesh M, Kumar S. The landscape of fusion transcripts in plants: a new insight into genome complexity. BMC PLANT BIOLOGY 2024; 24:1162. [PMID: 39627690 PMCID: PMC11616359 DOI: 10.1186/s12870-024-05900-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Accepted: 11/29/2024] [Indexed: 12/06/2024]
Abstract
BACKGROUND Fusion transcripts (FTs), generated by the fusion of genes at the DNA level or RNA-level splicing events significantly contribute to transcriptome diversity. FTs are usually considered unique features of neoplasia and serve as biomarkers and therapeutic targets for multiple cancers. The latest findings show the presence of FTs in normal human physiology. Several discrete reports mentioned the presence of fusion transcripts in planta, has important roles in stress responses, morphological alterations, or traits (e.g. seed size, etc.). RESULTS In this study, we identified 169,197 fusion transcripts in 2795 transcriptome datasets of Arabidopsis thaliana, Cicer arietinum, and Oryza sativa by using a combination of tools, and confirmed the translational activity of 150 fusion transcripts through proteomic datasets. Analysis of the FT junction sequences and their association with epigenetic factors, as revealed by ChIP-Seq datasets, demonstrated an organised process of fusion formation at the DNA level. We investigated the possible impact of three-dimensional chromatin conformation on intra-chromosomal fusion events by leveraging the Hi-C datasets with the incidence of fusion transcripts. We further utilised the long-read RNA-Seq datasets to validate the most reoccurring fusion transcripts in each plant species followed by further authentication through RT-PCR and Sanger sequencing. CONCLUSIONS Our findings suggest that a significant portion of fusion events may be attributed to alternative splicing during transcription, accounting for numerous fusion events without a proportional increase in the number of RNA pairs. Even non-nuclear DNA transcripts from mitochondria and chloroplasts can participate in intra- and inter-chromosomal fusion formation. Genes in close spatial proximity are more prone to undergoing fusion formation, especially in intra-chromosomal FTs. Most of the fusion transcripts may not undergo translation and serve as long non-coding RNAs. The low validation rate of FTs in plants indicated that the fusion transcripts are expressed at very low levels, like in the case of humans. FTs often originate from parental genes involved in essential biological processes, suggesting their relevance across diverse tissues and stress conditions. This study presents a comprehensive repository of fusion transcripts, offering valuable insights into their roles in vital physiological processes and stress responses.
Collapse
Affiliation(s)
- Pragya Chitkara
- Bioinformatics Lab, National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi, 110067, India
| | - Ajeet Singh
- Bioinformatics Lab, National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi, 110067, India
- Baylor College of Medicine, Houston, TX, USA
| | - Rashmi Gangwar
- Bioinformatics Lab, National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi, 110067, India
| | - Rohan Bhardwaj
- Bioinformatics Lab, National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi, 110067, India
- Technical University of Munich, Freising, Germany
| | - Shafaque Zahra
- Bioinformatics Lab, National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi, 110067, India
- Department of Pathology, School of Medicine, University of Virginia, Charlottesville, VA, USA
| | - Simran Arora
- Bioinformatics Lab, National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi, 110067, India
| | - Fiza Hamid
- Bioinformatics Lab, National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi, 110067, India
| | - Ajay Arya
- Bioinformatics Lab, National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi, 110067, India
| | - Namrata Sahu
- Bioinformatics Lab, National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi, 110067, India
| | - Srija Chakraborty
- Bioinformatics Lab, National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi, 110067, India
- University of Nottingham, Sutton Bonington Campus, Loughborough, UK
| | - Madhulika Ramesh
- Bioinformatics Lab, National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi, 110067, India
| | - Shailesh Kumar
- Bioinformatics Lab, National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi, 110067, India.
| |
Collapse
|
2
|
Zhu XT, Sanz-Jimenez P, Ning XT, Tahir Ul Qamar M, Chen LL. Direct RNA sequencing in plants: Practical applications and future perspectives. PLANT COMMUNICATIONS 2024; 5:101064. [PMID: 39155503 PMCID: PMC11589328 DOI: 10.1016/j.xplc.2024.101064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 07/17/2024] [Accepted: 08/14/2024] [Indexed: 08/20/2024]
Abstract
The transcriptome serves as a bridge that links genomic variation to phenotypic diversity. A vast number of studies using next-generation RNA sequencing (RNA-seq) over the last 2 decades have emphasized the essential roles of the plant transcriptome in response to developmental and environmental conditions, providing numerous insights into the dynamic changes, evolutionary traces, and elaborate regulation of the plant transcriptome. With substantial improvement in accuracy and throughput, direct RNA sequencing (DRS) has emerged as a new and powerful sequencing platform for precise detection of native and full-length transcripts, overcoming many limitations such as read length and PCR bias that are inherent to short-read RNA-seq. Here, we review recent advances in dissecting the complexity and diversity of plant transcriptomes using DRS as the main technological approach, covering many aspects of RNA metabolism, including novel isoforms, poly(A) tails, and RNA modification, and we propose a comprehensive workflow for processing of plant DRS data. Many challenges to the application of DRS in plants, such as the need for machine learning tools tailored to plant transcriptomes, remain to be overcome, and together we outline future biological questions that can be addressed by DRS, such as allele-specific RNA modification. This technology provides convenient support on which the connection of distinct RNA features is tightly built, sustainably refining our understanding of the biological functions of the plant transcriptome.
Collapse
Affiliation(s)
- Xi-Tong Zhu
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, Nanning 530004, China.
| | - Pablo Sanz-Jimenez
- National Key Laboratory of Crop Genetic Improvement, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xiao-Tong Ning
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, Nanning 530004, China
| | - Muhammad Tahir Ul Qamar
- Integrative Omics and Molecular Modeling Laboratory, Department of Bioinformatics and Biotechnology, Government College University Faisalabad (GCUF), Faisalabad 38000, Pakistan
| | - Ling-Ling Chen
- State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, College of Life Science and Technology, Guangxi University, Nanning 530004, China.
| |
Collapse
|
3
|
Arya A, Arora S, Hamid F, Kumar S. PFusionDB: a comprehensive database of plant-specific fusion transcripts. 3 Biotech 2024; 14:282. [PMID: 39479298 PMCID: PMC11519250 DOI: 10.1007/s13205-024-04132-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Accepted: 10/20/2024] [Indexed: 11/02/2024] Open
Abstract
Fusion transcripts (FTs) are well known cancer biomarkers, relatively understudied in plants. Here, we developed PFusionDB (www.nipgr.ac.in/PFusionDB), a novel plant-specific fusion-transcript database. It is a comprehensive repository of 80,170, 39,108, 83,330, and 11,500 unique fusions detected in 1280, 637, 697, and 181 RNA-Seq samples of Arabidopsis thaliana, Oryza sativa japonica, Oryza sativa indica, and Cicer arietinum respectively. Here, a total of 76,599 (Arabidopsis thaliana), 35,480 (Oryza sativa japonica), 72,099 (Oryza sativa indica), and 9524 (Cicer arietinum) fusion transcripts are non-recurrent i.e., only found in one sample. Identification of FTs was performed by using a total of five tools viz. EricScript-Plants, STAR-Fusion, TrinityFusion, SQUID, and MapSplice. At PFusionDB, available fundamental details of fusion events includes the information of parental genes, junction sequence, expression levels of fusion transcripts, breakpoint coordinates, strand information, tissue type, treatment information, fusion type, PFusionDB ID, and Sequence Read Archive (SRA) ID. Further, two search modules: 'Simple Search' and 'Advanced Search', along with a 'Browse' option to data download, are present for the ease of users. Three distinct modules viz. 'BLASTN', 'SW Align', and 'Mapping' are also available for efficient query sequence mapping and alignment to FTs. PFusionDB serves as a crucial resource for delving into the intricate world of fusion transcript in plants, providing researchers with a foundation for further exploration and analysis. Database URL: www.nipgr.ac.in/PFusionDB. Supplementary Information The online version contains supplementary material available at 10.1007/s13205-024-04132-1.
Collapse
Affiliation(s)
- Ajay Arya
- Bioinformatics Lab, National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi, 110067 India
| | - Simran Arora
- Bioinformatics Lab, National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi, 110067 India
| | - Fiza Hamid
- Bioinformatics Lab, National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi, 110067 India
| | - Shailesh Kumar
- Bioinformatics Lab, National Institute of Plant Genome Research, Aruna Asaf Ali Marg, New Delhi, 110067 India
| |
Collapse
|
4
|
Cong J, Zhang S, Zhang Q, Yu X, Huang J, Wei X, Huang X, Qiu J, Zhou X. Conserved features and diversity attributes of chimeric RNAs across accessions in four plants. PLANT BIOTECHNOLOGY JOURNAL 2024; 22:3151-3163. [PMID: 39087631 PMCID: PMC11500992 DOI: 10.1111/pbi.14437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Revised: 06/17/2024] [Accepted: 07/08/2024] [Indexed: 08/02/2024]
Abstract
As a non-collinear expression form of genetic information, chimeric RNAs increase the complexity of transcriptome in diverse organisms. Although chimeric RNAs have been identified in plants, few common features have been revealed. Here, we systemically explored the landscape of chimeric RNAs across multi-accession and multi-tissue using pan-genome and transcriptome data of four plants: rice, maize, soybean, and Arabidopsis. Among the four species, conserved characteristics of breakpoints and parental genes were discovered. In each species, chimeric RNAs displayed a high level of diversity among accessions, and the clustering of accessions using chimeric events was generally concordant with clustering based on genomic variants, implying a general relationship between genetic variations and chimeric RNAs. Through mass spectrometry, we confirmed a fusion protein OsNDC1-OsGID1L2 and observed its subcellular localization, which differed from the original proteins. Phenotypic cues in transgenic rice suggest the potential functions of OsNDC1-OsGID1L2. Moreover, an intriguing chimeric event Os01g0216500-Os01g0216900, generated by a large deletion in basmati rice, also exists in another accession without the deletion, demonstrating its convergence in evolution. Our results illuminate the characteristics and hint at the evolutionary implications of plant chimeric RNAs, which serve as a supplement to genetic variations, thus expanding our understanding of genetic diversity.
Collapse
Affiliation(s)
- Jia Cong
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life SciencesShanghai Normal UniversityShanghaiChina
| | - Sinan Zhang
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life SciencesShanghai Normal UniversityShanghaiChina
- CAS Center for Excellence in Molecular Plant SciencesChinese Academy of SciencesShanghaiChina
| | - Qi Zhang
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life SciencesShanghai Normal UniversityShanghaiChina
| | - Xiting Yu
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life SciencesShanghai Normal UniversityShanghaiChina
| | - Jiazhi Huang
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life SciencesShanghai Normal UniversityShanghaiChina
| | - Xin Wei
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life SciencesShanghai Normal UniversityShanghaiChina
| | - Xuehui Huang
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life SciencesShanghai Normal UniversityShanghaiChina
| | - Jie Qiu
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life SciencesShanghai Normal UniversityShanghaiChina
| | - Xiaoyi Zhou
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life SciencesShanghai Normal UniversityShanghaiChina
| |
Collapse
|
5
|
Hamid F, Arora S, Chitkara P, Kumar S. A Protocol for the Detection of Fusion Transcripts Using RNA-Sequencing Data. Methods Mol Biol 2024; 2812:243-258. [PMID: 39068367 DOI: 10.1007/978-1-0716-3886-6_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Fusion transcripts are formed when two genes or their mRNAs fuse to produce a novel gene or chimeric transcript. Fusion genes are well-known cancer biomarkers used for cancer diagnosis and as therapeutic targets. Gene fusions are also found in normal physiology and lead to the evolution of novel genes that contribute to better survival and adaptation for an organism. Various in vitro approaches, such as FISH, PCR, RT-PCR, and chromosome banding techniques, have been used to detect gene fusion. However, all these approaches have low resolution and throughput. Due to the development of high-throughput next-generation sequencing technologies, the detection of fusion transcript becomes feasible using whole genome sequencing, RNA-Seq data, and bioinformatics tools. This chapter will overview the general computational protocol for fusion transcript detection from RNA-sequencing datasets.
Collapse
Affiliation(s)
- Fiza Hamid
- Bioinformatics Laboratory, National Institute of Plant Genome Research (NIPGR), New Delhi, India
| | - Simran Arora
- Bioinformatics Laboratory, National Institute of Plant Genome Research (NIPGR), New Delhi, India
| | - Pragya Chitkara
- Bioinformatics Laboratory, National Institute of Plant Genome Research (NIPGR), New Delhi, India
| | - Shailesh Kumar
- Bioinformatics Laboratory, National Institute of Plant Genome Research (NIPGR), New Delhi, India.
| |
Collapse
|
6
|
Liu Q, Hu Y, Stucky A, Fang L, Zhong JF, Wang K. LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing. BMC Genomics 2020; 21:793. [PMID: 33372596 PMCID: PMC7771079 DOI: 10.1186/s12864-020-07207-4] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 10/29/2020] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Long-read RNA-Seq techniques can generate reads that encompass a large proportion or the entire mRNA/cDNA molecules, so they are expected to address inherited limitations of short-read RNA-Seq techniques that typically generate < 150 bp reads. However, there is a general lack of software tools for gene fusion detection from long-read RNA-seq data, which takes into account the high basecalling error rates and the presence of alignment errors. RESULTS In this study, we developed a fast computational tool, LongGF, to efficiently detect candidate gene fusions from long-read RNA-seq data, including cDNA sequencing data and direct mRNA sequencing data. We evaluated LongGF on tens of simulated long-read RNA-seq datasets, and demonstrated its superior performance in gene fusion detection. We also tested LongGF on a Nanopore direct mRNA sequencing dataset and a PacBio sequencing dataset generated on a mixture of 10 cancer cell lines, and found that LongGF achieved better performance to detect known gene fusions over existing computational tools. Furthermore, we tested LongGF on a Nanopore cDNA sequencing dataset on acute myeloid leukemia, and pinpointed the exact location of a translocation (previously known in cytogenetic resolution) in base resolution, which was further validated by Sanger sequencing. CONCLUSIONS In summary, LongGF will greatly facilitate the discovery of candidate gene fusion events from long-read RNA-Seq data, especially in cancer samples. LongGF is implemented in C++ and is available at https://github.com/WGLab/LongGF .
Collapse
Affiliation(s)
- Qian Liu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Yu Hu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Andres Stucky
- Department of Otolaryngology, Keck School of Medicine, University of Southern California, Los Angeles, CA, 90033, USA
| | - Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Jiang F Zhong
- Department of Otolaryngology, Keck School of Medicine, University of Southern California, Los Angeles, CA, 90033, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
7
|
Abstract
Chimeric RNAs are hybrid transcripts containing exons from two separate genes. Chimeric RNAs are traditionally considered to be transcribed from fusion genes caused by chromosomal rearrangement. These canonical chimeric RNAs are well characterized to be expressed in a cancer-unique pattern and/or act as oncogene products. However, benefited by the development of advanced deep sequencing technologies, novel types of non-canonical chimeric RNAs have been discovered to be generated from intergenic splicing without genomic aberrations. They can be formed through trans-splicing or cis-splicing between adjacent genes (cis-SAGe) mechanisms. Non-canonical chimeric RNAs are widely detected in normal physiology, although several have been shown to have a cancer-specific expression pattern. Further studies have indicated that some of them play fundamental roles in controlling cell growth and motility, and may have functions independent of the parental genes. These discoveries are unveiling a new layer of the functional transcriptome and are also raising the possibility of utilizing non-canonical chimeric RNAs as cancer diagnostic markers and therapeutic targets. In this chapter, we will overview different categories of chimeric RNAs and their expression in various types of cancerous and normal samples. Acknowledging that chimeric RNAs are not unique to cancer, we will discuss both bioinformatic and biological methods to identify credible cancer-specific chimeric RNAs. Furthermore, we will describe downstream methods to explore their molecular processing mechanisms and potential functions. A better understanding of the biogenesis mechanisms and functional products of cancer-specific chimeric RNAs will pave ways for the development of novel cancer biomarkers and therapeutic targets.
Collapse
Affiliation(s)
- Xinrui Shi
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA, United States
| | - Sandeep Singh
- Department of Pathology, School of Medicine, University of Virginia, Charlottesville, VA, United States
| | - Emily Lin
- Department of Pathology, School of Medicine, University of Virginia, Charlottesville, VA, United States
| | - Hui Li
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA, United States; Department of Pathology, School of Medicine, University of Virginia, Charlottesville, VA, United States.
| |
Collapse
|
8
|
Warthi G, Fournier PE, Seligmann H. Systematic Nucleotide Exchange Analysis of ESTs From the Human Cancer Genome Project Report: Origins of 347 Unknown ESTs Indicate Putative Transcription of Non-Coding Genomic Regions. Front Genet 2020; 11:42. [PMID: 32117454 PMCID: PMC7027195 DOI: 10.3389/fgene.2020.00042] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2019] [Accepted: 01/15/2020] [Indexed: 12/16/2022] Open
Abstract
Expressed sequence tags (ESTs) provide an imprint of cellular RNA diversity irrespectively of sequence homology with template genomes. NCBI databases include many unknown RNAs from various normal and cancer cells. These are usually ignored assuming sequencing artefacts or contamination due to their lack of sequence homology with template DNA. Here, we report genomic origins of 347 ESTs previously assumed artefacts/unknown, from the FAPESP/LICR Human Cancer Genome Project. EST template detection uses systematic nucleotide exchange analyses called swinger transformations. Systematic nucleotide exchanges replace systematically particular nucleotides with different nucleotides. Among 347 unknown ESTs, 51 ESTs match mitogenome transcription, 17 and 2 ESTs are from nuclear chromosome non-coding regions, and uncharacterized nuclear genes. Identified ESTs mapped on 205 protein-coding genes, 10 genes had swinger RNAs in several biosamples. Whole cell transcriptome searches for 17 ESTs mapping on non-coding regions confirmed their transcription. The 10 swinger-transcribed genes identified more than once associate with cancer induction and progression, suggesting swinger transformation occurs mainly in highly transcribed genes. Swinger transformation is a unique method to identify noncanonical RNAs obtained from NGS, which identifies putative ncRNA transcribed regions. Results suggest that swinger transcription occurs in highly active genes in normal and genetically unstable cancer cells.
Collapse
Affiliation(s)
- Ganesh Warthi
- Aix Marseille Univ, IRD, APHM, SSA, VITROME, IHU-Méditerranée Infection, Marseille, France.,IHU-Méditerranée Infection, Marseille, France
| | - Pierre-Edouard Fournier
- Aix Marseille Univ, IRD, APHM, SSA, VITROME, IHU-Méditerranée Infection, Marseille, France.,IHU-Méditerranée Infection, Marseille, France
| | - Hervé Seligmann
- The National Natural History Collections, The Hebrew University of Jerusalem, Jerusalem, Israel.,Université Grenoble Alpes, Faculty of Medicine, Laboratory AGEIS EA 7407, Team Tools for e-Gnosis Medical & Labcom CNRS/UGA/OrangeLabs Telecoms4Health, La Tronche, France
| |
Collapse
|