1
|
Zheng JY, Jiang G, Gao FH, Ren SN, Zhu CY, Xie J, Li Z, Yin W, Xia X, Li Y, Wang HL. MCTASmRNA: A deep learning framework for alternative splicing events classification. Int J Biol Macromol 2025; 300:139941. [PMID: 39842565 DOI: 10.1016/j.ijbiomac.2025.139941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Revised: 01/07/2025] [Accepted: 01/14/2025] [Indexed: 01/24/2025]
Abstract
Alternative splicing (AS) plays crucial post-transcriptional gene function regulation roles in eukaryotic. Despite progress in studying AS at the RNA level, existing methods for AS event identification face challenges such as inefficiency, lengthy processing times, and limitations in capturing the complexity of RNA sequences. To overcome these challenges, we evaluated 10 AS detection tools and selected rMATS for dataset construction. We then developed a multi-scale convolutional and Transformer-based model (MCTASmRNA) to classify AS events in mRNA sequences without relying on a reference genome. To handle the problem of large intra-class and small inter-class difference in AS event sequences, we incorporated an efficient channel attention mechanism and designed a new joint loss function to optimize MCTASmRNA training. MCTASmRNA outperformed baseline models, with an accuracy improvement and exhibited enhanced cross-species generalizability. This model provides valuable support for AS research across different organisms. Future work will focus on optimizing and expanding the model to further explore the complex mechanisms underlying AS.
Collapse
Affiliation(s)
- Juan-Yu Zheng
- School of Information Science and Technology, School of Artificial Intelligence, Beijing Forestry University, Beijing 100083, People's Republic of China
| | - Gao Jiang
- School of Information Science and Technology, School of Artificial Intelligence, Beijing Forestry University, Beijing 100083, People's Republic of China
| | - Fu-Hai Gao
- School of Information Science and Technology, School of Artificial Intelligence, Beijing Forestry University, Beijing 100083, People's Republic of China
| | - Shu-Ning Ren
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, People's Republic of China
| | - Chen-Yu Zhu
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, People's Republic of China
| | - Jianbo Xie
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, People's Republic of China
| | - Zhonghai Li
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, People's Republic of China
| | - Weilun Yin
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, People's Republic of China
| | - Xinli Xia
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, People's Republic of China
| | - Yun Li
- School of Information Science and Technology, School of Artificial Intelligence, Beijing Forestry University, Beijing 100083, People's Republic of China.
| | - Hou-Ling Wang
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, People's Republic of China.
| |
Collapse
|
2
|
Felipe Benites L, Stephens TG, Van Etten J, James T, Christian WC, Barry K, Grigoriev IV, McDermott TR, Bhattacharya D. Hot springs viruses at Yellowstone National Park have ancient origins and are adapted to thermophilic hosts. Commun Biol 2024; 7:312. [PMID: 38594478 PMCID: PMC11003980 DOI: 10.1038/s42003-024-05931-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 02/16/2024] [Indexed: 04/11/2024] Open
Abstract
Geothermal springs house unicellular red algae in the class Cyanidiophyceae that dominate the microbial biomass at these sites. Little is known about host-virus interactions in these environments. We analyzed the virus community associated with red algal mats in three neighboring habitats (creek, endolithic, soil) at Lemonade Creek, Yellowstone National Park (YNP), USA. We find that despite proximity, each habitat houses a unique collection of viruses, with the giant viruses, Megaviricetes, dominant in all three. The early branching phylogenetic position of genes encoded on metagenome assembled virus genomes (vMAGs) suggests that the YNP lineages are of ancient origin and not due to multiple invasions from mesophilic habitats. The existence of genomic footprints of adaptation to thermophily in the vMAGs is consistent with this idea. The Cyanidiophyceae at geothermal sites originated ca. 1.5 Bya and are therefore relevant to understanding biotic interactions on the early Earth.
Collapse
Affiliation(s)
- L Felipe Benites
- Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, New Brunswick, NJ, 08901, USA
| | - Timothy G Stephens
- Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, New Brunswick, NJ, 08901, USA
| | - Julia Van Etten
- Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, New Brunswick, NJ, 08901, USA
- Graduate Program in Ecology and Evolution, Rutgers, The State University of New Jersey, New Brunswick, NJ, 08901, USA
| | - Timeeka James
- Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, New Brunswick, NJ, 08901, USA
| | - William C Christian
- Department of Land Resources and Environmental Sciences, Montana State University, Bozeman, Montana, USA
| | - Kerrie Barry
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Igor V Grigoriev
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA, 94720, USA
| | - Timothy R McDermott
- Department of Chemistry and Biochemistry, Montana State University, Bozeman, Montana, USA
| | - Debashish Bhattacharya
- Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, New Brunswick, NJ, 08901, USA.
| |
Collapse
|
3
|
Ramoneda J, Hoffert M, Stallard-Olivera E, Casamayor EO, Fierer N. Leveraging genomic information to predict environmental preferences of bacteria. THE ISME JOURNAL 2024; 18:wrae195. [PMID: 39361898 PMCID: PMC11488383 DOI: 10.1093/ismejo/wrae195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Revised: 09/24/2024] [Accepted: 10/02/2024] [Indexed: 10/05/2024]
Abstract
Genomic information is now available for a broad diversity of bacteria, including uncultivated taxa. However, we have corresponding knowledge on environmental preferences (i.e. bacterial growth responses across gradients in oxygen, pH, temperature, salinity, and other environmental conditions) for a relatively narrow swath of bacterial diversity. These limits to our understanding of bacterial ecologies constrain our ability to predict how assemblages will shift in response to global change factors, design effective probiotics, or guide cultivation efforts. We need innovative approaches that take advantage of expanding genome databases to accurately infer the environmental preferences of bacteria and validate the accuracy of these inferences. By doing so, we can broaden our quantitative understanding of the environmental preferences of the majority of bacterial taxa that remain uncharacterized. With this perspective, we highlight why it is important to infer environmental preferences from genomic information and discuss the range of potential strategies for doing so. In particular, we highlight concrete examples of how both cultivation-independent and cultivation-dependent approaches can be integrated with genomic data to develop predictive models. We also emphasize the limitations and pitfalls of these approaches and the specific knowledge gaps that need to be addressed to successfully expand our understanding of the environmental preferences of bacteria.
Collapse
Affiliation(s)
- Josep Ramoneda
- Department of Ecology and Complexity, Center of Advanced Studies of Blanes (CEAB), Spanish Research Council (CSIC), Blanes, Spain
- Cooperative Institute for Research in Environmental Sciences (CIRES), University of Colorado, Boulder, Colorado, United States
| | - Michael Hoffert
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, CO, United States
| | - Elias Stallard-Olivera
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, CO, United States
| | - Emilio O Casamayor
- Department of Ecology and Complexity, Center of Advanced Studies of Blanes (CEAB), Spanish Research Council (CSIC), Blanes, Spain
| | - Noah Fierer
- Cooperative Institute for Research in Environmental Sciences (CIRES), University of Colorado, Boulder, Colorado, United States
- Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, CO, United States
| |
Collapse
|
4
|
Lewin LE, Daniels KG, Hurst LD. Genes for highly abundant proteins in Escherichia coli avoid 5' codons that promote ribosomal initiation. PLoS Comput Biol 2023; 19:e1011581. [PMID: 37878567 PMCID: PMC10599525 DOI: 10.1371/journal.pcbi.1011581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 10/09/2023] [Indexed: 10/27/2023] Open
Abstract
In many species highly expressed genes (HEGs) over-employ the synonymous codons that match the more abundant iso-acceptor tRNAs. Bacterial transgene codon randomization experiments report, however, that enrichment with such "translationally optimal" codons has little to no effect on the resultant protein level. By contrast, consistent with the view that ribosomal initiation is rate limiting, synonymous codon usage following the 5' ATG greatly influences protein levels, at least in part by modifying RNA stability. For the design of bacterial transgenes, for simple codon based in silico inference of protein levels and for understanding selection on synonymous mutations, it would be valuable to computationally determine initiation optimality (IO) scores for codons for any given species. One attractive approach is to characterize the 5' codon enrichment of HEGs compared with the most lowly expressed genes, just as translational optimality scores of codons have been similarly defined employing the full gene body. Here we determine the viability of this approach employing a unique opportunity: for Escherichia coli there is both the most extensive protein abundance data for native genes and a unique large-scale transgene codon randomization experiment enabling objective definition of the 5' codons that cause, rather than just correlate with, high protein abundance (that we equate with initiation optimality, broadly defined). Surprisingly, the 5' ends of native genes that specify highly abundant proteins avoid such initiation optimal codons. We find that this is probably owing to conflicting selection pressures particular to native HEGs, including selection favouring low initiation rates, this potentially enabling high efficiency of ribosomal usage and low noise. While the classical HEG enrichment approach does not work, rendering simple prediction of native protein abundance from 5' codon content futile, we report evidence that initiation optimality scores derived from the transgene experiment may hold relevance for in silico transgene design for a broad spectrum of bacteria.
Collapse
Affiliation(s)
- Loveday E. Lewin
- The Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, United Kingdom
| | - Kate G. Daniels
- The Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, United Kingdom
| | - Laurence D. Hurst
- The Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, United Kingdom
| |
Collapse
|