1
|
Zheng JY, Jiang G, Gao FH, Ren SN, Zhu CY, Xie J, Li Z, Yin W, Xia X, Li Y, Wang HL. MCTASmRNA: A deep learning framework for alternative splicing events classification. Int J Biol Macromol 2025; 300:139941. [PMID: 39842565 DOI: 10.1016/j.ijbiomac.2025.139941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Revised: 01/07/2025] [Accepted: 01/14/2025] [Indexed: 01/24/2025]
Abstract
Alternative splicing (AS) plays crucial post-transcriptional gene function regulation roles in eukaryotic. Despite progress in studying AS at the RNA level, existing methods for AS event identification face challenges such as inefficiency, lengthy processing times, and limitations in capturing the complexity of RNA sequences. To overcome these challenges, we evaluated 10 AS detection tools and selected rMATS for dataset construction. We then developed a multi-scale convolutional and Transformer-based model (MCTASmRNA) to classify AS events in mRNA sequences without relying on a reference genome. To handle the problem of large intra-class and small inter-class difference in AS event sequences, we incorporated an efficient channel attention mechanism and designed a new joint loss function to optimize MCTASmRNA training. MCTASmRNA outperformed baseline models, with an accuracy improvement and exhibited enhanced cross-species generalizability. This model provides valuable support for AS research across different organisms. Future work will focus on optimizing and expanding the model to further explore the complex mechanisms underlying AS.
Collapse
Affiliation(s)
- Juan-Yu Zheng
- School of Information Science and Technology, School of Artificial Intelligence, Beijing Forestry University, Beijing 100083, People's Republic of China
| | - Gao Jiang
- School of Information Science and Technology, School of Artificial Intelligence, Beijing Forestry University, Beijing 100083, People's Republic of China
| | - Fu-Hai Gao
- School of Information Science and Technology, School of Artificial Intelligence, Beijing Forestry University, Beijing 100083, People's Republic of China
| | - Shu-Ning Ren
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, People's Republic of China
| | - Chen-Yu Zhu
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, People's Republic of China
| | - Jianbo Xie
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, People's Republic of China
| | - Zhonghai Li
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, People's Republic of China
| | - Weilun Yin
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, People's Republic of China
| | - Xinli Xia
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, People's Republic of China
| | - Yun Li
- School of Information Science and Technology, School of Artificial Intelligence, Beijing Forestry University, Beijing 100083, People's Republic of China.
| | - Hou-Ling Wang
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing 100083, People's Republic of China.
| |
Collapse
|
2
|
Ciccolella S, Cozzi D, Della Vedova G, Kuria SN, Bonizzoni P, Denti L. Differential quantification of alternative splicing events on spliced pangenome graphs. PLoS Comput Biol 2024; 20:e1012665. [PMID: 39652592 DOI: 10.1371/journal.pcbi.1012665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 12/19/2024] [Accepted: 11/21/2024] [Indexed: 12/21/2024] Open
Abstract
Pangenomes are becoming a powerful framework to perform many bioinformatics analyses taking into account the genetic variability of a population, thus reducing the bias introduced by a single reference genome. With the wider diffusion of pangenomes, integrating genetic variability with transcriptome diversity is becoming a natural extension that demands specific methods for its exploration. In this work, we extend the notion of spliced pangenomes to that of annotated spliced pangenomes; this allows us to introduce a formal definition of Alternative Splicing (AS) events on a graph structure. To investigate the usage of graph pangenomes for the quantification of AS events across conditions, we developed pantas, the first pangenomic method for the detection and differential analysis of AS events from short RNA-Seq reads. A comparison with state-of-the-art linear reference-based approaches proves that pantas achieves competitive accuracy, making spliced pangenomes effective for conducting AS events quantification and opening future directions for the analysis of population-based transcriptomes.
Collapse
Affiliation(s)
- Simone Ciccolella
- Department of Computer Science, University of Milano-Bicocca, Milan, Italy
| | - Davide Cozzi
- Department of Computer Science, University of Milano-Bicocca, Milan, Italy
| | | | | | - Paola Bonizzoni
- Department of Computer Science, University of Milano-Bicocca, Milan, Italy
| | - Luca Denti
- Department of Computer Science, University of Milano-Bicocca, Milan, Italy
- Department of Applied Informatics, Faculty of Mathematics, Physics and Informatics, Comenius University in Bratislava, Bratislava, Slovakia
| |
Collapse
|
3
|
Miyokawa R, Sasaki E. The role of FIONA1 in alternative splicing and its effects on flowering regulation in Arabidopsis thaliana. THE NEW PHYTOLOGIST 2024; 243:2055-2060. [PMID: 39056273 DOI: 10.1111/nph.19995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Accepted: 07/03/2024] [Indexed: 07/28/2024]
Affiliation(s)
- Ryo Miyokawa
- Faculty of Science, Kyushu University, 744, Motooka, Nishi-ku, Fukuoka, 819-0395, Japan
| | - Eriko Sasaki
- Faculty of Science, Kyushu University, 744, Motooka, Nishi-ku, Fukuoka, 819-0395, Japan
| |
Collapse
|
4
|
Jiang G, Zheng JY, Ren SN, Yin W, Xia X, Li Y, Wang HL. A comprehensive workflow for optimizing RNA-seq data analysis. BMC Genomics 2024; 25:631. [PMID: 38914930 PMCID: PMC11197194 DOI: 10.1186/s12864-024-10414-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 05/15/2024] [Indexed: 06/26/2024] Open
Abstract
BACKGROUND Current RNA-seq analysis software for RNA-seq data tends to use similar parameters across different species without considering species-specific differences. However, the suitability and accuracy of these tools may vary when analyzing data from different species, such as humans, animals, plants, fungi, and bacteria. For most laboratory researchers lacking a background in information science, determining how to construct an analysis workflow that meets their specific needs from the array of complex analytical tools available poses a significant challenge. RESULTS By utilizing RNA-seq data from plants, animals, and fungi, it was observed that different analytical tools demonstrate some variations in performance when applied to different species. A comprehensive experiment was conducted specifically for analyzing plant pathogenic fungal data, focusing on differential gene analysis as the ultimate goal. In this study, 288 pipelines using different tools were applied to analyze five fungal RNA-seq datasets, and the performance of their results was evaluated based on simulation. This led to the establishment of a relatively universal and superior fungal RNA-seq analysis pipeline that can serve as a reference, and certain standards for selecting analysis tools were derived for reference. Additionally, we compared various tools for alternative splicing analysis. The results based on simulated data indicated that rMATS remained the optimal choice, although consideration could be given to supplementing with tools such as SpliceWiz. CONCLUSION The experimental results demonstrate that, in comparison to the default software parameter configurations, the analysis combination results after tuning can provide more accurate biological insights. It is beneficial to carefully select suitable analysis software based on the data, rather than indiscriminately choosing tools, in order to achieve high-quality analysis results more efficiently.
Collapse
Affiliation(s)
- Gao Jiang
- School of Information Science and Technology, School of Artificial Intelligence, Beijing Forestry University, Beijing, 100083, People's Republic of China
| | - Juan-Yu Zheng
- School of Information Science and Technology, School of Artificial Intelligence, Beijing Forestry University, Beijing, 100083, People's Republic of China
| | - Shu-Ning Ren
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, 100083, People's Republic of China
| | - Weilun Yin
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, 100083, People's Republic of China
| | - Xinli Xia
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, 100083, People's Republic of China
| | - Yun Li
- School of Information Science and Technology, School of Artificial Intelligence, Beijing Forestry University, Beijing, 100083, People's Republic of China.
| | - Hou-Ling Wang
- State Key Laboratory of Tree Genetics and Breeding, National Engineering Research Center of Tree Breeding and Ecological Restoration, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, 100083, People's Republic of China.
| |
Collapse
|
5
|
Brooks TG, Lahens NF, Mrčela A, Sarantopoulou D, Nayak S, Naik A, Sengupta S, Choi PS, Grant GR. BEERS2: RNA-Seq simulation through high fidelity in silico modeling. Brief Bioinform 2024; 25:bbae164. [PMID: 38605641 PMCID: PMC11009461 DOI: 10.1093/bib/bbae164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 01/26/2024] [Accepted: 03/26/2024] [Indexed: 04/13/2024] Open
Abstract
Simulation of RNA-seq reads is critical in the assessment, comparison, benchmarking and development of bioinformatics tools. Yet the field of RNA-seq simulators has progressed little in the last decade. To address this need we have developed BEERS2, which combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline. BEERS2 takes input transcripts (typically fully length messenger RNA transcripts with polyA tails) from either customizable input or from CAMPAREE simulated RNA samples. It produces realistic reads of these transcripts as FASTQ, SAM or BAM formats with the SAM or BAM formats containing the true alignment to the reference genome. It also produces true transcript-level quantification values. BEERS2 combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline and is designed to include the effects of polyA selection and RiboZero for ribosomal depletion, hexamer priming sequence biases, GC-content biases in polymerase chain reaction (PCR) amplification, barcode read errors and errors during PCR amplification. These characteristics combine to make BEERS2 the most complete simulation of RNA-seq to date. Finally, we demonstrate the use of BEERS2 by measuring the effect of several settings on the popular Salmon pseudoalignment algorithm.
Collapse
Affiliation(s)
- Thomas G Brooks
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
| | - Nicholas F Lahens
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
| | - Antonijo Mrčela
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
| | - Dimitra Sarantopoulou
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Current address: National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Soumyashant Nayak
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Current address: Statistics and Mathematics Unit, Indian Statistical Institute, Bengaluru, Karnataka, India
| | - Amruta Naik
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Shaon Sengupta
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Children’s Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pediatrics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Peter S Choi
- Division of Cancer Pathobiology, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pathology & Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Gregory R Grant
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
6
|
Lio CT, Düz T, Hoffmann M, Willruth LL, Baumbach J, List M, Tsoy O. Comprehensive benchmark of differential transcript usage analysis for static and dynamic conditions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.14.575548. [PMID: 38313260 PMCID: PMC10836064 DOI: 10.1101/2024.01.14.575548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2024]
Abstract
RNA sequencing offers unique insights into transcriptome diversity, and a plethora of tools have been developed to analyze alternative splicing. One important task is to detect changes in the relative transcript abundance in differential transcript usage (DTU) analysis. The choice of the right analysis tool is non-trivial and depends on experimental factors such as the availability of single- or paired-end and bulk or single-cell data. To help users select the most promising tool for their task, we performed a comprehensive benchmark of DTU detection tools. We cover a wide array of experimental settings, using simulated bulk and single-cell RNA-seq data as well as real transcriptomics datasets, including time-series data. Our results suggest that DEXSeq, edgeR, and LimmaDS are better choices for paired-end data, while DSGseq and DEXSeq can be used for single-end data. In single-cell simulation settings, we showed that satuRn performs better than DTUrtle. In addition, we showed that Spycone is optimal for time series DTU/IS analysis based on the evidence provided using GO terms enrichment analysis.
Collapse
Affiliation(s)
- Chit Tong Lio
- Data Science in Systems Biology, Technical University of Munich, 85354 Freising, Germany
| | - Tolga Düz
- Chair of Computational Systems Biology, University of Hamburg, Notkestrasse 9, 22607 Hamburg, Germany
| | - Markus Hoffmann
- Data Science in Systems Biology, Technical University of Munich, 85354 Freising, Germany
- Institute for Advanced Study, Technical University of Munich, Garching D-85748, Germany
- National Institute of Diabetes, Digestive, and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Lina-Liv Willruth
- Data Science in Systems Biology, Technical University of Munich, 85354 Freising, Germany
| | - Jan Baumbach
- Chair of Computational Systems Biology, University of Hamburg, Notkestrasse 9, 22607 Hamburg, Germany
- Institute of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, 5000 Odense, Denmark
| | - Markus List
- Data Science in Systems Biology, Technical University of Munich, 85354 Freising, Germany
| | - Olga Tsoy
- Chair of Computational Systems Biology, University of Hamburg, Notkestrasse 9, 22607 Hamburg, Germany
| |
Collapse
|
7
|
Fenn A, Tsoy O, Faro T, Rößler FM, Dietrich A, Kersting J, Louadi Z, Lio CT, Völker U, Baumbach J, Kacprowski T, List M. Alternative splicing analysis benchmark with DICAST. NAR Genom Bioinform 2023; 5:lqad044. [PMID: 37260511 PMCID: PMC10227362 DOI: 10.1093/nargab/lqad044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 04/13/2023] [Accepted: 05/05/2023] [Indexed: 06/02/2023] Open
Abstract
Alternative splicing is a major contributor to transcriptome and proteome diversity in health and disease. A plethora of tools have been developed for studying alternative splicing in RNA-seq data. Previous benchmarks focused on isoform quantification and mapping. They neglected event detection tools, which arguably provide the most detailed insights into the alternative splicing process. DICAST offers a modular and extensible framework for analysing alternative splicing integrating eleven splice-aware mapping and eight event detection tools. We benchmark all tools extensively on simulated as well as whole blood RNA-seq data. STAR and HISAT2 demonstrated the best balance between performance and run time. The performance of event detection tools varies widely with no tool outperforming all others. DICAST allows researchers to employ a consensus approach to consider the most successful tools jointly for robust event detection. Furthermore, we propose the first reporting standard to unify existing formats and to guide future tool development.
Collapse
Affiliation(s)
| | | | - Tim Faro
- Chair of Experimental Bioinformatics, Technical University of Munich, 85354 Freising, Germany
| | - Fanny L M Rößler
- Chair of Experimental Bioinformatics, Technical University of Munich, 85354 Freising, Germany
| | - Alexander Dietrich
- Chair of Experimental Bioinformatics, Technical University of Munich, 85354 Freising, Germany
| | - Johannes Kersting
- Chair of Experimental Bioinformatics, Technical University of Munich, 85354 Freising, Germany
| | - Zakaria Louadi
- Chair of Experimental Bioinformatics, Technical University of Munich, 85354 Freising, Germany
- Institute for Computational Systems Biology, University of Hamburg, Notkestrasse 9, 22607 Hamburg, Germany
| | - Chit Tong Lio
- Chair of Experimental Bioinformatics, Technical University of Munich, 85354 Freising, Germany
- Institute for Computational Systems Biology, University of Hamburg, Notkestrasse 9, 22607 Hamburg, Germany
| | - Uwe Völker
- Interfaculty Institute for Genetics and Functional Genomics, University Medicine Greifswald, Felix-Hausdorff-Straße 8, D-17475 Greifswald, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Greifswald, Greifswald, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Notkestrasse 9, 22607 Hamburg, Germany
- Institute of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, 5000 Odense, Denmark
| | | | - Markus List
- To whom correspondence should be addressed. Tel: +49 8161 71 2761;
| |
Collapse
|
8
|
Brooks TG, Lahens NF, Mrčela A, Sarantopoulou D, Nayak S, Naik A, Sengupta S, Choi PS, Grant GR. BEERS2: RNA-Seq simulation through high fidelity in silico modeling. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.21.537847. [PMID: 37162982 PMCID: PMC10168222 DOI: 10.1101/2023.04.21.537847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Simulation of RNA-seq reads is critical in the assessment, comparison, benchmarking, and development of bioinformatics tools. Yet the field of RNA-seq simulators has progressed little in the last decade. To address this need we have developed BEERS2, which combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline. BEERS2 takes input transcripts (typically fully-length mRNA transcripts with polyA tails) from either customizable input or from CAMPAREE simulated RNA samples. It produces realistic reads of these transcripts as FASTQ, SAM, or BAM formats with the SAM or BAM formats containing the true alignment to the reference genome. It also produces true transcript-level quantification values. BEERS2 combines a flexible and highly configurable design with detailed simulation of the entire library preparation and sequencing pipeline and is designed to include the effects of polyA selection and RiboZero for ribosomal depletion, hexamer priming sequence biases, GC-content biases in PCR amplification, barcode read errors, and errors during PCR amplification. These characteristics combine to make BEERS2 the most complete simulation of RNA-seq to date. Finally, we demonstrate the use of BEERS2 by measuring the effect of several settings on the popular Salmon pseudoalignment algorithm.
Collapse
Affiliation(s)
- Thomas G Brooks
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
| | - Nicholas F Lahens
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
| | - Antonijo Mrčela
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
| | - Dimitra Sarantopoulou
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Current address: National Institute on Aging, National Institutes of Health, Baltimore, MD, USA
| | - Soumyashant Nayak
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Current address: Statistics and Mathematics Unit, Indian Statistical Institute, Bengaluru, Karnataka, India
| | - Amruta Naik
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Shaon Sengupta
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pediatrics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Peter S Choi
- Division of Cancer Pathobiology, Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pathology & Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Gregory R Grant
- Institute for Translational Medicine and Therapeutics, University of Pennsylvania, PA, USA
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
9
|
Using machine learning to detect the differential usage of novel gene isoforms. BMC Bioinformatics 2022; 23:45. [PMID: 35042461 PMCID: PMC8764765 DOI: 10.1186/s12859-022-04576-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Accepted: 01/10/2022] [Indexed: 11/24/2022] Open
Abstract
Background Differential isoform usage is an important driver of inter-individual phenotypic diversity and is linked to various diseases and traits. However, accurately detecting the differential usage of different gene transcripts between groups can be difficult, in particular in less well annotated genomes where the spectrum of transcript isoforms is largely unknown. Results We investigated whether machine learning approaches can detect differential isoform usage based purely on the distribution of reads across a gene region. We illustrate that gradient boosting and elastic net approaches can successfully identify large numbers of genes showing potential differential isoform usage between Europeans and Africans, that are enriched among relevant biological pathways and significantly overlap those identified by previous approaches. We demonstrate that diversity at the 3′ and 5′ ends of genes are primary drivers of these differences between populations. Conclusion Machine learning methods can effectively detect differential isoform usage from read fraction data, and can provide novel insights into the biological differences between groups. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04576-3.
Collapse
|