1
|
Ciccolella S, Cozzi D, Della Vedova G, Kuria SN, Bonizzoni P, Denti L. Differential quantification of alternative splicing events on spliced pangenome graphs. PLoS Comput Biol 2024; 20:e1012665. [PMID: 39652592 DOI: 10.1371/journal.pcbi.1012665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 12/19/2024] [Accepted: 11/21/2024] [Indexed: 12/21/2024] Open
Abstract
Pangenomes are becoming a powerful framework to perform many bioinformatics analyses taking into account the genetic variability of a population, thus reducing the bias introduced by a single reference genome. With the wider diffusion of pangenomes, integrating genetic variability with transcriptome diversity is becoming a natural extension that demands specific methods for its exploration. In this work, we extend the notion of spliced pangenomes to that of annotated spliced pangenomes; this allows us to introduce a formal definition of Alternative Splicing (AS) events on a graph structure. To investigate the usage of graph pangenomes for the quantification of AS events across conditions, we developed pantas, the first pangenomic method for the detection and differential analysis of AS events from short RNA-Seq reads. A comparison with state-of-the-art linear reference-based approaches proves that pantas achieves competitive accuracy, making spliced pangenomes effective for conducting AS events quantification and opening future directions for the analysis of population-based transcriptomes.
Collapse
Affiliation(s)
- Simone Ciccolella
- Department of Computer Science, University of Milano-Bicocca, Milan, Italy
| | - Davide Cozzi
- Department of Computer Science, University of Milano-Bicocca, Milan, Italy
| | | | | | - Paola Bonizzoni
- Department of Computer Science, University of Milano-Bicocca, Milan, Italy
| | - Luca Denti
- Department of Computer Science, University of Milano-Bicocca, Milan, Italy
- Department of Applied Informatics, Faculty of Mathematics, Physics and Informatics, Comenius University in Bratislava, Bratislava, Slovakia
| |
Collapse
|
2
|
Avila Cartes J, Bonizzoni P, Ciccolella S, Della Vedova G, Denti L, Didelot X, Monti DC, Pirola Y. RecGraph: recombination-aware alignment of sequences to variation graphs. Bioinformatics 2024; 40:btae292. [PMID: 38676570 PMCID: PMC11256948 DOI: 10.1093/bioinformatics/btae292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 02/23/2024] [Accepted: 04/25/2024] [Indexed: 04/29/2024] Open
Abstract
MOTIVATION Bacterial genomes present more variability than human genomes, which requires important adjustments in computational tools that are developed for human data. In particular, bacteria exhibit a mosaic structure due to homologous recombinations, but this fact is not sufficiently captured by standard read mappers that align against linear reference genomes. The recent introduction of pangenomics provides some insights in that context, as a pangenome graph can represent the variability within a species. However, the concept of sequence-to-graph alignment that captures the presence of recombinations has not been previously investigated. RESULTS In this paper, we present the extension of the notion of sequence-to-graph alignment to a variation graph that incorporates a recombination, so that the latter are explicitly represented and evaluated in an alignment. Moreover, we present a dynamic programming approach for the special case where there is at most a recombination-we implement this case as RecGraph. From a modelling point of view, a recombination corresponds to identifying a new path of the variation graph, where the new arc is composed of two halves, each extracted from an original path, possibly joined by a new arc. Our experiments show that RecGraph accurately aligns simulated recombinant bacterial sequences that have at most a recombination, providing evidence for the presence of recombination events. AVAILABILITY AND IMPLEMENTATION Our implementation is open source and available at https://github.com/AlgoLab/RecGraph.
Collapse
Affiliation(s)
- Jorge Avila Cartes
- Department of Informatics, Systems and Communication, University of Milano – Bicocca. Viale Sarca 336, Milano 20126, Italy
| | - Paola Bonizzoni
- Department of Informatics, Systems and Communication, University of Milano – Bicocca. Viale Sarca 336, Milano 20126, Italy
| | - Simone Ciccolella
- Department of Informatics, Systems and Communication, University of Milano – Bicocca. Viale Sarca 336, Milano 20126, Italy
| | - Gianluca Della Vedova
- Department of Informatics, Systems and Communication, University of Milano – Bicocca. Viale Sarca 336, Milano 20126, Italy
| | - Luca Denti
- Department of Informatics, Systems and Communication, University of Milano – Bicocca. Viale Sarca 336, Milano 20126, Italy
| | - Xavier Didelot
- Department of Statistics and School of Life Sciences, University of Warwick, Coventry CV4 7AL, United Kingdom
| | - Davide Cesare Monti
- Department of Informatics, Systems and Communication, University of Milano – Bicocca. Viale Sarca 336, Milano 20126, Italy
| | - Yuri Pirola
- Department of Informatics, Systems and Communication, University of Milano – Bicocca. Viale Sarca 336, Milano 20126, Italy
| |
Collapse
|
3
|
Bendik J, Kalavacherla S, Webster N, Califano J, Fertig EJ, Ochs MF, Carter H, Guo T. OutSplice: A Novel Tool for the Identification of Tumor-Specific Alternative Splicing Events. BIOMEDINFORMATICS 2023; 3:853-868. [PMID: 40236985 PMCID: PMC11997874 DOI: 10.3390/biomedinformatics3040053] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/17/2025]
Abstract
Protein variation that occurs during alternative splicing has been shown to play a major role in disease onset and oncogenesis. Due to this, we have developed OutSplice, a user-friendly algorithm to classify splicing outliers in tumor samples compared to a distribution of normal samples. Several tools have previously been developed to help uncover splicing events, each coming with varying methodologies, complexities, and features that can make it difficult for a new researcher to use or to determine which tool they should be using. Therefore, we benchmarked several algorithms to determine which may be best for a particular user's needs and demonstrate how OutSplice differs from these methodologies. We find that despite detecting a lower number of genes with significant aberrant events, OutSplice is able to identify those that are biologically impactful. Additionally, we identify 17 genes that contain significant splicing alterations in tumor tissue that were discovered across at least 5 of the tested algorithms, making them good candidates for future studies. Overall, researchers should consider a combined use of OutSplice with other splicing software to help provide additional validation for aberrant splicing events and to narrow down biologically relevant events.
Collapse
Affiliation(s)
- Joseph Bendik
- Moores Cancer Center, University of California San Diego, San Diego, CA 92037, USA
| | - Sandhya Kalavacherla
- Moores Cancer Center, University of California San Diego, San Diego, CA 92037, USA
| | - Nicholas Webster
- Moores Cancer Center, University of California San Diego, San Diego, CA 92037, USA
| | - Joseph Califano
- Moores Cancer Center, University of California San Diego, San Diego, CA 92037, USA
- Gleiberman Head and Neck Cancer Center, University of California, San Diego, CA 92037, USA
- Department of Otolaryngology-Head and Neck Surgery, University of California San Diego, San Diego, CA 92037, USA
| | - Elana J. Fertig
- Quantitative Sciences Division and Convergence Institute, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University, Baltimore, MD 21224, USA
- Department of Oncology, Johns Hopkins University, Baltimore, MD 21224, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21224, USA
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD 21224, USA
| | - Michael F. Ochs
- Department of Mathematics and Statistics, The College of New Jersey, Ewing, NJ 08628, USA
| | - Hannah Carter
- Moores Cancer Center, University of California San Diego, San Diego, CA 92037, USA
- Division of Medical Genetics, Department of Medicine, University of California San Diego, San Diego, CA 92093, USA
| | - Theresa Guo
- Moores Cancer Center, University of California San Diego, San Diego, CA 92037, USA
- Gleiberman Head and Neck Cancer Center, University of California, San Diego, CA 92037, USA
- Department of Otolaryngology-Head and Neck Surgery, University of California San Diego, San Diego, CA 92037, USA
| |
Collapse
|
4
|
Borozan L, Rojas Ringeling F, Kao SY, Nikonova E, Monteagudo-Mesas P, Matijević D, Spletter ML, Canzar S. Counting pseudoalignments to novel splicing events. Bioinformatics 2023; 39:btad419. [PMID: 37432342 PMCID: PMC10348833 DOI: 10.1093/bioinformatics/btad419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 04/21/2023] [Accepted: 07/10/2023] [Indexed: 07/12/2023] Open
Abstract
MOTIVATION Alternative splicing (AS) of introns from pre-mRNA produces diverse sets of transcripts across cell types and tissues, but is also dysregulated in many diseases. Alignment-free computational methods have greatly accelerated the quantification of mRNA transcripts from short RNA-seq reads, but they inherently rely on a catalog of known transcripts and might miss novel, disease-specific splicing events. By contrast, alignment of reads to the genome can effectively identify novel exonic segments and introns. Event-based methods then count how many reads align to predefined features. However, an alignment is more expensive to compute and constitutes a bottleneck in many AS analysis methods. RESULTS Here, we propose fortuna, a method that guesses novel combinations of annotated splice sites to create transcript fragments. It then pseudoaligns reads to fragments using kallisto and efficiently derives counts of the most elementary splicing units from kallisto's equivalence classes. These counts can be directly used for AS analysis or summarized to larger units as used by other widely applied methods. In experiments on synthetic and real data, fortuna was around 7× faster than traditional align and count approaches, and was able to analyze almost 300 million reads in just 15 min when using four threads. It mapped reads containing mismatches more accurately across novel junctions and found more reads supporting aberrant splicing events in patients with autism spectrum disorder than existing methods. We further used fortuna to identify novel, tissue-specific splicing events in Drosophila. AVAILABILITY AND IMPLEMENTATION fortuna source code is available at https://github.com/canzarlab/fortuna.
Collapse
Affiliation(s)
- Luka Borozan
- Department of Mathematics, Josip Juraj Strossmayer University of Osijek, Osijek 31000, Croatia
| | - Francisca Rojas Ringeling
- Gene Center, Ludwig-Maximilians-Universität München, Munich 81377, Germany
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, United States
| | - Shao-Yen Kao
- Biomedical Center, Department of Physiological Chemistry, Ludwig-Maximilians-Universität München, Planegg-Martinsried 82152, Germany
| | - Elena Nikonova
- Biomedical Center, Department of Physiological Chemistry, Ludwig-Maximilians-Universität München, Planegg-Martinsried 82152, Germany
| | | | - Domagoj Matijević
- Department of Mathematics, Josip Juraj Strossmayer University of Osijek, Osijek 31000, Croatia
| | - Maria L Spletter
- Biomedical Center, Department of Physiological Chemistry, Ludwig-Maximilians-Universität München, Planegg-Martinsried 82152, Germany
- School of Science and Engineering, Division of Biological & Biomedical Systems, University of Missouri Kansas City, Kansas City, MO 64110, United States
| | - Stefan Canzar
- Gene Center, Ludwig-Maximilians-Universität München, Munich 81377, Germany
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, United States
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, United States
| |
Collapse
|
5
|
Fenn A, Tsoy O, Faro T, Rößler FM, Dietrich A, Kersting J, Louadi Z, Lio CT, Völker U, Baumbach J, Kacprowski T, List M. Alternative splicing analysis benchmark with DICAST. NAR Genom Bioinform 2023; 5:lqad044. [PMID: 37260511 PMCID: PMC10227362 DOI: 10.1093/nargab/lqad044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 04/13/2023] [Accepted: 05/05/2023] [Indexed: 06/02/2023] Open
Abstract
Alternative splicing is a major contributor to transcriptome and proteome diversity in health and disease. A plethora of tools have been developed for studying alternative splicing in RNA-seq data. Previous benchmarks focused on isoform quantification and mapping. They neglected event detection tools, which arguably provide the most detailed insights into the alternative splicing process. DICAST offers a modular and extensible framework for analysing alternative splicing integrating eleven splice-aware mapping and eight event detection tools. We benchmark all tools extensively on simulated as well as whole blood RNA-seq data. STAR and HISAT2 demonstrated the best balance between performance and run time. The performance of event detection tools varies widely with no tool outperforming all others. DICAST allows researchers to employ a consensus approach to consider the most successful tools jointly for robust event detection. Furthermore, we propose the first reporting standard to unify existing formats and to guide future tool development.
Collapse
Affiliation(s)
| | | | - Tim Faro
- Chair of Experimental Bioinformatics, Technical University of Munich, 85354 Freising, Germany
| | - Fanny L M Rößler
- Chair of Experimental Bioinformatics, Technical University of Munich, 85354 Freising, Germany
| | - Alexander Dietrich
- Chair of Experimental Bioinformatics, Technical University of Munich, 85354 Freising, Germany
| | - Johannes Kersting
- Chair of Experimental Bioinformatics, Technical University of Munich, 85354 Freising, Germany
| | - Zakaria Louadi
- Chair of Experimental Bioinformatics, Technical University of Munich, 85354 Freising, Germany
- Institute for Computational Systems Biology, University of Hamburg, Notkestrasse 9, 22607 Hamburg, Germany
| | - Chit Tong Lio
- Chair of Experimental Bioinformatics, Technical University of Munich, 85354 Freising, Germany
- Institute for Computational Systems Biology, University of Hamburg, Notkestrasse 9, 22607 Hamburg, Germany
| | - Uwe Völker
- Interfaculty Institute for Genetics and Functional Genomics, University Medicine Greifswald, Felix-Hausdorff-Straße 8, D-17475 Greifswald, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Greifswald, Greifswald, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Notkestrasse 9, 22607 Hamburg, Germany
- Institute of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, 5000 Odense, Denmark
| | | | - Markus List
- To whom correspondence should be addressed. Tel: +49 8161 71 2761;
| |
Collapse
|
6
|
Oreper D, Klaeger S, Jhunjhunwala S, Delamarre L. The peptide woods are lovely, dark and deep: Hunting for novel cancer antigens. Semin Immunol 2023; 67:101758. [PMID: 37027981 DOI: 10.1016/j.smim.2023.101758] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 03/22/2023] [Accepted: 03/22/2023] [Indexed: 04/08/2023]
Abstract
Harnessing the patient's immune system to control a tumor is a proven avenue for cancer therapy. T cell therapies as well as therapeutic vaccines, which target specific antigens of interest, are being explored as treatments in conjunction with immune checkpoint blockade. For these therapies, selecting the best suited antigens is crucial. Most of the focus has thus far been on neoantigens that arise from tumor-specific somatic mutations. Although there is clear evidence that T-cell responses against mutated neoantigens are protective, the large majority of these mutations are not immunogenic. In addition, most somatic mutations are unique to each individual patient and their targeting requires the development of individualized approaches. Therefore, novel antigen types are needed to broaden the scope of such treatments. We review high throughput approaches for discovering novel tumor antigens and some of the key challenges associated with their detection, and discuss considerations when selecting tumor antigens to target in the clinic.
Collapse
Affiliation(s)
- Daniel Oreper
- Genentech, 1 DNA way, South San Francisco, 94080 CA, USA.
| | - Susan Klaeger
- Genentech, 1 DNA way, South San Francisco, 94080 CA, USA.
| | | | | |
Collapse
|
7
|
Wang Y, Li S, Nong B, Zhou W, Xu S, Songyang Z, Xiong Y. Comprehensive RNA-Seq Analysis Pipeline for Non-Model Organisms and Its Application in Schmidtea mediterranea. Genes (Basel) 2023; 14:genes14050989. [PMID: 37239350 DOI: 10.3390/genes14050989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 04/17/2023] [Accepted: 04/19/2023] [Indexed: 05/28/2023] Open
Abstract
RNA sequencing (RNA-seq) is a high-throughput technology that provides in-depth information on transcriptome. The advancement and dropping costs of RNA sequencing, accompanied by more available reference genomes for different species, make transcriptome analysis in non-model organisms possible. Current obstacles in analyzing RNA-seq data include a lack of functional annotation, which may complicate the process of linking genes to corresponding functions. Here, we provide a one-stop RNA-seq analysis pipeline, PipeOne-NM, for transcriptome functional annotation, non-coding RNA identification, and transcripts alternative splicing analysis of non-model organisms, intended for use with Illumina platform-based RNA-seq data. We performed PipeOne-NM on 237 Schmidtea mediterranea RNA-seq runs and assembled a transcriptome with 84,827 sequences from 49,320 genes, identifying 64,582 mRNA from 35,485 genes, 20,217 lncRNA from 17,084 genes, and 3481 circRNAs from 1103 genes. In addition, we performed a co-expression analysis of lncRNA and mRNA and identified that 1319 lncRNA co-express with at least one mRNA. Further analysis of samples from S. mediterranea sexual and asexual strains revealed the role of sexual reproduction in gene expression profiles. Samples from different parts of asexual S. mediterranea revealed that differential expression profiles of different body parts correlated with the function of conduction of nerve impulses. In conclusion, PipeOne-NM has the potential to provide comprehensive transcriptome information for non-model organisms on a single platform.
Collapse
Affiliation(s)
- Yanzhi Wang
- Key Laboratory of Gene Engineering of the Ministry of Education, Institute of Healthy Aging Research, School of Life Sciences, Sun Yat-sen University, Guangzhou 510006, China
| | - Sijun Li
- Key Laboratory of Gene Engineering of the Ministry of Education, Institute of Healthy Aging Research, School of Life Sciences, Sun Yat-sen University, Guangzhou 510006, China
| | - Baoting Nong
- Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Breast Tumor Center, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou 510006, China
| | - Weiping Zhou
- Maternal and Child Health Research Institute, Translational Medicine Center, Guangdong Women and Children Hospital, Guangzhou 511400, China
| | - Shuhua Xu
- School of Life Sciences, Fudan University, Shanghai 200433, China
| | - Zhou Songyang
- Key Laboratory of Gene Engineering of the Ministry of Education, Institute of Healthy Aging Research, School of Life Sciences, Sun Yat-sen University, Guangzhou 510006, China
| | - Yuanyan Xiong
- Key Laboratory of Gene Engineering of the Ministry of Education, Institute of Healthy Aging Research, School of Life Sciences, Sun Yat-sen University, Guangzhou 510006, China
| |
Collapse
|
8
|
Haplotype-aware pantranscriptome analyses using spliced pangenome graphs. Nat Methods 2023; 20:239-247. [PMID: 36646895 DOI: 10.1038/s41592-022-01731-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 11/28/2022] [Indexed: 01/18/2023]
Abstract
Pangenomics is emerging as a powerful computational paradigm in bioinformatics. This field uses population-level genome reference structures, typically consisting of a sequence graph, to mitigate reference bias and facilitate analyses that were challenging with previous reference-based methods. In this work, we extend these methods into transcriptomics to analyze sequencing data using the pantranscriptome: a population-level transcriptomic reference. Our toolchain, which consists of additions to the VG toolkit and a standalone tool, RPVG, can construct spliced pangenome graphs, map RNA sequencing data to these graphs, and perform haplotype-aware expression quantification of transcripts in a pantranscriptome. We show that this workflow improves accuracy over state-of-the-art RNA sequencing mapping methods, and that it can efficiently quantify haplotype-specific transcript expression without needing to characterize the haplotypes of a sample beforehand.
Collapse
|
9
|
Sibbesen JA, Eizenga JM, Novak AM, Sirén J, Chang X, Garrison E, Paten B. Haplotype-aware pantranscriptome analyses using spliced pangenome graphs. Nat Methods 2023; 20:239-247. [PMID: 36646895 DOI: 10.1101/2021.03.26.437240] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Accepted: 11/28/2022] [Indexed: 05/24/2023]
Abstract
Pangenomics is emerging as a powerful computational paradigm in bioinformatics. This field uses population-level genome reference structures, typically consisting of a sequence graph, to mitigate reference bias and facilitate analyses that were challenging with previous reference-based methods. In this work, we extend these methods into transcriptomics to analyze sequencing data using the pantranscriptome: a population-level transcriptomic reference. Our toolchain, which consists of additions to the VG toolkit and a standalone tool, RPVG, can construct spliced pangenome graphs, map RNA sequencing data to these graphs, and perform haplotype-aware expression quantification of transcripts in a pantranscriptome. We show that this workflow improves accuracy over state-of-the-art RNA sequencing mapping methods, and that it can efficiently quantify haplotype-specific transcript expression without needing to characterize the haplotypes of a sample beforehand.
Collapse
Affiliation(s)
| | | | - Adam M Novak
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Jouni Sirén
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Xian Chang
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Erik Garrison
- University of Tennessee Health Science Center, Memphis, TN, USA
| | | |
Collapse
|
10
|
Therapeutic Vaccines Targeting Neoantigens to Induce T-Cell Immunity against Cancers. Pharmaceutics 2022; 14:pharmaceutics14040867. [PMID: 35456701 PMCID: PMC9029780 DOI: 10.3390/pharmaceutics14040867] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 04/11/2022] [Accepted: 04/13/2022] [Indexed: 12/12/2022] Open
Abstract
Cancer immunotherapy has achieved multiple clinical benefits and has become an indispensable component of cancer treatment. Targeting tumor-specific antigens, also known as neoantigens, plays a crucial role in cancer immunotherapy. T cells of adaptive immunity that recognize neoantigens, but do not induce unwanted off-target effects, have demonstrated high efficacy and low side effects in cancer immunotherapy. Tumor neoantigens derived from accumulated genetic instability can be characterized using emerging technologies, such as high-throughput sequencing, bioinformatics, predictive algorithms, mass-spectrometry analyses, and immunogenicity validation. Neoepitopes with a higher affinity for major histocompatibility complexes can be identified and further applied to the field of cancer vaccines. Therapeutic vaccines composed of tumor lysates or cells and DNA, mRNA, or peptides of neoantigens have revoked adaptive immunity to kill cancer cells in clinical trials. Broad clinical applicability of these therapeutic cancer vaccines has emerged. In this review, we discuss recent progress in neoantigen identification and applications for cancer vaccines and the results of ongoing trials.
Collapse
|
11
|
Baaijens JA, Bonizzoni P, Boucher C, Della Vedova G, Pirola Y, Rizzi R, Sirén J. Computational graph pangenomics: a tutorial on data structures and their applications. NATURAL COMPUTING 2022; 21:81-108. [PMID: 36969737 PMCID: PMC10038355 DOI: 10.1007/s11047-022-09882-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 02/14/2022] [Indexed: 05/08/2023]
Abstract
Computational pangenomics is an emerging research field that is changing the way computer scientists are facing challenges in biological sequence analysis. In past decades, contributions from combinatorics, stringology, graph theory and data structures were essential in the development of a plethora of software tools for the analysis of the human genome. These tools allowed computational biologists to approach ambitious projects at population scale, such as the 1000 Genomes Project. A major contribution of the 1000 Genomes Project is the characterization of a broad spectrum of genetic variations in the human genome, including the discovery of novel variations in the South Asian, African and European populations-thus enhancing the catalogue of variability within the reference genome. Currently, the need to take into account the high variability in population genomes as well as the specificity of an individual genome in a personalized approach to medicine is rapidly pushing the abandonment of the traditional paradigm of using a single reference genome. A graph-based representation of multiple genomes, or a graph pangenome, is replacing the linear reference genome. This means completely rethinking well-established procedures to analyze, store, and access information from genome representations. Properly addressing these challenges is crucial to face the computational tasks of ambitious healthcare projects aiming to characterize human diversity by sequencing 1M individuals (Stark et al. 2019). This tutorial aims to introduce readers to the most recent advances in the theory of data structures for the representation of graph pangenomes. We discuss efficient representations of haplotypes and the variability of genotypes in graph pangenomes, and highlight applications in solving computational problems in human and microbial (viral) pangenomes.
Collapse
Affiliation(s)
- Jasmijn A. Baaijens
- Department of Intelligent Systems, Delft University of Technology, Van Mourik Broekmanweg 6, 2628XE Delft, The Netherlands
- Department of Biomedical Informatics, Harvard University, 10 Shattuck St, Boston, MA 02115, USA
| | - Paola Bonizzoni
- Department of Informatics, Systems and Communication (DISCo), University of Milano-Bicocca, V.le Sarca, 336, 20126 Milan, Italy
| | - Christina Boucher
- Department of Computer and Information Science and Engineering, University of Florida, 432 Newell Dr, Gainesville, FL 32603, USA
| | - Gianluca Della Vedova
- Department of Informatics, Systems and Communication (DISCo), University of Milano-Bicocca, V.le Sarca, 336, 20126 Milan, Italy
| | - Yuri Pirola
- Department of Informatics, Systems and Communication (DISCo), University of Milano-Bicocca, V.le Sarca, 336, 20126 Milan, Italy
| | - Raffaella Rizzi
- Department of Informatics, Systems and Communication (DISCo), University of Milano-Bicocca, V.le Sarca, 336, 20126 Milan, Italy
| | - Jouni Sirén
- Genomics Institute, University of California, 1156 High St., Santa Cruz, CA 95064, USA
| |
Collapse
|
12
|
Ebrahimie E, Rahimirad S, Tahsili M, Mohammadi-Dehcheshmeh M. Alternative RNA splicing in stem cells and cancer stem cells: Importance of transcript-based expression analysis. World J Stem Cells 2021; 13:1394-1416. [PMID: 34786151 PMCID: PMC8567453 DOI: 10.4252/wjsc.v13.i10.1394] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Revised: 06/21/2021] [Accepted: 09/14/2021] [Indexed: 02/06/2023] Open
Abstract
Alternative ribonucleic acid (RNA) splicing can lead to the assembly of different protein isoforms with distinctive functions. The outcome of alternative splicing (AS) can result in a complete loss of function or the acquisition of new functions. There is a gap in knowledge of abnormal RNA splice variants promoting cancer stem cells (CSCs), and their prospective contribution in cancer progression. AS directly regulates the self-renewal features of stem cells (SCs) and stem-like cancer cells. Notably, octamer-binding transcription factor 4A spliced variant of octamer-binding transcription factor 4 contributes to maintaining stemness properties in both SCs and CSCs. The epithelial to mesenchymal transition pathway regulates the AS events in CSCs to maintain stemness. The alternative spliced variants of CSCs markers, including cluster of differentiation 44, aldehyde dehydrogenase, and doublecortin-like kinase, α6β1 integrin, have pivotal roles in increasing self-renewal properties and maintaining the pluripotency of CSCs. Various splicing analysis tools are considered in this study. LeafCutter software can be considered as the best tool for differential splicing analysis and identification of the type of splicing events. Additionally, LeafCutter can be used for efficient mapping splicing quantitative trait loci. Altogether, the accumulating evidence re-enforces the fact that gene and protein expression need to be investigated in parallel with alternative splice variants.
Collapse
Affiliation(s)
- Esmaeil Ebrahimie
- School of Animal and Veterinary Sciences, The University of Adelaide, Adelaide 5005, South Australia, Australia
- La Trobe Genomics Research Platform, School of Life Sciences, College of Science, Health and Engineering, La Trobe University, Melbourne 3086, Australia
- School of Biosciences, The University of Melbourne, Melbourne 3010, Australia,
| | - Samira Rahimirad
- Department of Medical Genetics, National Institute of Genetic Engineering and Biotechnology, Tehran 1497716316, Iran
- Division of Urology, Department of Surgery, McGill University and the Research Institute of the McGill University Health Centre, Montreal H4A 3J1, Quebec, Canada
| | | | | |
Collapse
|
13
|
Alqassem I, Sonthalia Y, Klitzke-Feser E, Shim H, Canzar S. McSplicer: a probabilistic model for estimating splice site usage from RNA-seq data. Bioinformatics 2021; 37:2004–2011. [PMID: 33515239 PMCID: PMC8337008 DOI: 10.1093/bioinformatics/btab050] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Revised: 01/20/2021] [Accepted: 01/21/2021] [Indexed: 11/23/2022] Open
Abstract
MOTIVATION Alternative splicing removes intronic sequences from pre-mRNAs in alternative ways to produce different forms (isoforms) of mature mRNA. The composition of expressed transcripts gives specific functionalities to cells in a particular condition or developmental stage. In addition, a large fraction of human disease mutations affect splicing and lead to aberrant mRNA and protein products. Current methods that interrogate the transcriptome based on RNA-seq either suffer from short read length when trying to infer full-length transcripts, or are restricted to predefined units of alternative splicing that they quantify from local read evidence. RESULTS Instead of attempting to quantify individual outcomes of the splicing process such as local splicing events or full-length transcripts, we propose to quantify alternative splicing using a simplified probabilistic model of the underlying splicing process. Our model is based on the usage of individual splice sites and can generate arbitrarily complex types of splicing patterns. In our implementation, McSplicer, we estimate the parameters of our model using all read data at once and we demonstrate in our experiments that this yields more accurate estimates compared to competing methods. Our model is able to describe multiple effects of splicing mutations using few, easy to interpret parameters, as we illustrate in an experiment on RNA-seq data from autism spectrum disorder patients. AVAILABILITY McSplicer source code is available at https://github.com/canzarlab/McSplicer and has been deposited in archived format at https://doi.org/10.5281/zenodo.4449881. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Israa Alqassem
- Gene Center, Ludwig-Maximilians-Universität München, Munich, 81377, Germany
| | | | | | - Heejung Shim
- Melbourne Integrative Genomics (MIG), School of Mathematics and Statistics, University of Melbourne, Melbourne, 3010, Australia
| | - Stefan Canzar
- Gene Center, Ludwig-Maximilians-Universität München, Munich, 81377, Germany
| |
Collapse
|
14
|
Zea DJ, Laskina S, Baudin A, Richard H, Laine E. Assessing conservation of alternative splicing with evolutionary splicing graphs. Genome Res 2021; 31:1462-1473. [PMID: 34266979 PMCID: PMC8327911 DOI: 10.1101/gr.274696.120] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 06/11/2021] [Indexed: 12/29/2022]
Abstract
Understanding how protein function has evolved and diversified is of great importance for human genetics and medicine. Here, we tackle the problem of describing the whole transcript variability observed in several species by generalizing the definition of splicing graph. We provide a practical solution to construct parsimonious evolutionary splicing graphs where each node is a minimal transcript building block defined across species. We show a clear link between the functional relevance, tissue regulation, and conservation of alternative transcripts on a set of 50 genes. By scaling up to the whole human protein-coding genome, we identify a few thousand genes where alternative splicing modulates the number and composition of pseudorepeats. We have implemented our approach in ThorAxe, an efficient, versatile, robust, and freely available computational tool.
Collapse
Affiliation(s)
- Diego Javier Zea
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France
| | - Sofya Laskina
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
| | - Alexis Baudin
- Sorbonne Université, CNRS, LIP6, F-75005 Paris, France
| | - Hugues Richard
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
| | - Elodie Laine
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France
| |
Collapse
|
15
|
Denti L, Pirola Y, Previtali M, Ceccato T, Della Vedova G, Rizzi R, Bonizzoni P. Shark: fishing relevant reads in an RNA-Seq sample. Bioinformatics 2021; 37:464-472. [PMID: 32926128 PMCID: PMC8088329 DOI: 10.1093/bioinformatics/btaa779] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 08/17/2020] [Accepted: 09/02/2020] [Indexed: 11/19/2022] Open
Abstract
Motivation Recent advances in high-throughput RNA-Seq technologies allow to produce massive datasets. When a study focuses only on a handful of genes, most reads are not relevant and degrade the performance of the tools used to analyze the data. Removing irrelevant reads from the input dataset leads to improved efficiency without compromising the results of the study. Results We introduce a novel computational problem, called gene assignment and we propose an efficient alignment-free approach to solve it. Given an RNA-Seq sample and a panel of genes, a gene assignment consists in extracting from the sample, the reads that most probably were sequenced from those genes. The problem becomes more complicated when the sample exhibits evidence of novel alternative splicing events. We implemented our approach in a tool called Shark and assessed its effectiveness in speeding up differential splicing analysis pipelines. This evaluation shows that Shark is able to significantly improve the performance of RNA-Seq analysis tools without having any impact on the final results. Availability and implementation The tool is distributed as a stand-alone module and the software is freely available at https://github.com/AlgoLab/shark. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Luca Denti
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano 20126, Italy
| | - Yuri Pirola
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano 20126, Italy
| | - Marco Previtali
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano 20126, Italy
| | - Tamara Ceccato
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano 20126, Italy
| | - Gianluca Della Vedova
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano 20126, Italy
| | - Raffaella Rizzi
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano 20126, Italy
| | - Paola Bonizzoni
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano 20126, Italy
| |
Collapse
|
16
|
Mahadani P, Hazra A. Expression and splicing dynamics of WRKY family genes along physiological exigencies of tea plant (Camellia sinensis). Biologia (Bratisl) 2021. [DOI: 10.1007/s11756-021-00784-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
17
|
Su Z, Huang D. Alternative Splicing of Pre-mRNA in the Control of Immune Activity. Genes (Basel) 2021; 12:genes12040574. [PMID: 33921058 PMCID: PMC8071365 DOI: 10.3390/genes12040574] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2021] [Revised: 04/12/2021] [Accepted: 04/14/2021] [Indexed: 02/07/2023] Open
Abstract
The human immune response is a complex process that responds to numerous exogenous antigens in preventing infection by microorganisms, as well as to endogenous components in the surveillance of tumors and autoimmune diseases, and a great number of molecules are necessary to carry the functional complexity of immune activity. Alternative splicing of pre-mRNA plays an important role in immune cell development and regulation of immune activity through yielding diverse transcriptional isoforms to supplement the function of limited genes associated with the immune reaction. In addition, multiple factors have been identified as being involved in the control of alternative splicing at the cis, trans, or co-transcriptional level, and the aberrant splicing of RNA leads to the abnormal modulation of immune activity in infections, immune diseases, and tumors. In this review, we summarize the recent discoveries on the generation of immune-associated alternative splice variants, clinical disorders, and possible regulatory mechanisms. We also discuss the immune responses to the neoantigens produced by alternative splicing, and finally, we issue some alternative splicing and immunity correlated questions based on our knowledge.
Collapse
Affiliation(s)
- Zhongjing Su
- Department of Histology and Embryology, Shantou University Medical College, No. 22, Xinling Road, Shantou 515041, China
- Correspondence: (Z.S.); (D.H.)
| | - Dongyang Huang
- Department of Cell Biology, Shantou University Medical College, No. 22, Xinling Road, Shantou 515041, China
- Correspondence: (Z.S.); (D.H.)
| |
Collapse
|
18
|
Zheng JT, Lin CX, Fang ZY, Li HD. Intron Retention as a Mode for RNA-Seq Data Analysis. Front Genet 2020; 11:586. [PMID: 32733531 PMCID: PMC7358572 DOI: 10.3389/fgene.2020.00586] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 05/14/2020] [Indexed: 12/16/2022] Open
Abstract
Intron retention (IR) is an alternative splicing mode whereby introns, rather than being spliced out as usual, are retained in mature mRNAs. It was previously considered a consequence of mis-splicing and received very limited attention. Only recently has IR become of interest for transcriptomic data analysis owing to its recognized roles in gene expression regulation and associations with complex diseases. In this article, we first review the function of IR in regulating gene expression in a number of biological processes, such as neuron differentiation and activation of CD4+ T cells. Next, we briefly review its association with diseases, such as Alzheimer's disease and cancers. Then, we describe state-of-the-art methods for IR detection, including RNA-seq analysis tools IRFinder and iREAD, highlighting their underlying principles and discussing their advantages and limitations. Finally, we discuss the challenges for IR detection and potential ways in which IR detection methods could be improved.
Collapse
Affiliation(s)
- Jian-Tao Zheng
- Hunan Provincial Key Lab on Bioinformatics, Center for Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Cui-Xiang Lin
- Hunan Provincial Key Lab on Bioinformatics, Center for Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Zhao-Yu Fang
- School of Mathematics and Statistics, Central South University, Changsha, China
| | - Hong-Dong Li
- Hunan Provincial Key Lab on Bioinformatics, Center for Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
19
|
Gunady MK, Mount SM, Corrada Bravo H. Yanagi: Fast and interpretable segment-based alternative splicing and gene expression analysis. BMC Bioinformatics 2019; 20:421. [PMID: 31409274 PMCID: PMC6693274 DOI: 10.1186/s12859-019-2947-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2018] [Accepted: 06/12/2019] [Indexed: 12/13/2022] Open
Abstract
Background Ultra-fast pseudo-alignment approaches are the tool of choice in transcript-level RNA sequencing (RNA-seq) analyses. Unfortunately, these methods couple the tasks of pseudo-alignment and transcript quantification. This coupling precludes the direct usage of pseudo-alignment to other expression analyses, including alternative splicing or differential gene expression analysis, without including a non-essential transcript quantification step. Results In this paper, we introduce a transcriptome segmentation approach to decouple these two tasks. We propose an efficient algorithm to generate maximal disjoint segments given a transcriptome reference library on which ultra-fast pseudo-alignment can be used to produce per-sample segment counts. We show how to apply these maximally unambiguous count statistics in two specific expression analyses – alternative splicing and gene differential expression – without the need of a transcript quantification step. Our experiments based on simulated and experimental data showed that the use of segment counts, like other methods that rely on local coverage statistics, provides an advantage over approaches that rely on transcript quantification in detecting and correctly estimating local splicing in the case of incomplete transcript annotations. Conclusions The transcriptome segmentation approach implemented in Yanagi exploits the computational and space efficiency of pseudo-alignment approaches. It significantly expands their applicability and interpretability in a variety of RNA-seq analyses by providing the means to model and capture local coverage variation in these analyses. Electronic supplementary material The online version of this article (10.1186/s12859-019-2947-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mohamed K Gunady
- Department of Computer Science, University of Maryland, College Park, Maryland, USA.,Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, USA
| | - Stephen M Mount
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland, USA
| | - Héctor Corrada Bravo
- Department of Computer Science, University of Maryland, College Park, Maryland, USA. .,Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, USA.
| |
Collapse
|
20
|
Smith CC, Selitsky SR, Chai S, Armistead PM, Vincent BG, Serody JS. Alternative tumour-specific antigens. Nat Rev Cancer 2019; 19:465-478. [PMID: 31278396 PMCID: PMC6874891 DOI: 10.1038/s41568-019-0162-4] [Citation(s) in RCA: 238] [Impact Index Per Article: 39.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/29/2019] [Indexed: 12/20/2022]
Abstract
The study of tumour-specific antigens (TSAs) as targets for antitumour therapies has accelerated within the past decade. The most commonly studied class of TSAs are those derived from non-synonymous single-nucleotide variants (SNVs), or SNV neoantigens. However, to increase the repertoire of available therapeutic TSA targets, 'alternative TSAs', defined here as high-specificity tumour antigens arising from non-SNV genomic sources, have recently been evaluated. Among these alternative TSAs are antigens derived from mutational frameshifts, splice variants, gene fusions, endogenous retroelements and other processes. Unlike the patient-specific nature of SNV neoantigens, some alternative TSAs may have the advantage of being widely shared by multiple tumours, allowing for universal, off-the-shelf therapies. In this Opinion article, we will outline the biology, available computational tools, preclinical and/or clinical studies and relevant cancers for each alternative TSA class, as well as discuss both current challenges preventing the therapeutic application of alternative TSAs and potential solutions to aid in their clinical translation.
Collapse
Affiliation(s)
- Christof C Smith
- Department of Microbiology and Immunology, UNC School of Medicine, Marsico Hall, Chapel Hill, NC, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Sara R Selitsky
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Lineberger Bioinformatics Core, Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Marsico Hall, Chapel Hill, NC, USA
| | - Shengjie Chai
- Department of Microbiology and Immunology, UNC School of Medicine, Marsico Hall, Chapel Hill, NC, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Paul M Armistead
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Division of Hematology/Oncology, Department of Medicine, Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Benjamin G Vincent
- Department of Microbiology and Immunology, UNC School of Medicine, Marsico Hall, Chapel Hill, NC, USA.
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Division of Hematology/Oncology, Department of Medicine, Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Program in Computational Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| | - Jonathan S Serody
- Department of Microbiology and Immunology, UNC School of Medicine, Marsico Hall, Chapel Hill, NC, USA.
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Division of Hematology/Oncology, Department of Medicine, Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
- Program in Computational Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|
21
|
Bonizzoni P, Della Vedova G, Pirola Y, Previtali M, Rizzi R. Multithread Multistring Burrows-Wheeler Transform and Longest Common Prefix Array. J Comput Biol 2019; 26:948-961. [PMID: 31140836 DOI: 10.1089/cmb.2018.0230] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Indexing huge collections of strings, such as those produced by the widespread sequencing technologies, heavily relies on multistring generalizations of the Burrows-Wheeler transform (BWT) and the longest common prefix (LCP) array, since solving efficiently both problems are essential ingredients of several algorithms on a collection of strings, such as those for genome assembly. In this article, we explore a multithread computational strategy for building the BWT and LCP array. Our algorithm applies a divide and conquer approach that leads to parallel computation of multistring BWT and LCP array.
Collapse
Affiliation(s)
- Paola Bonizzoni
- Dipartimento di Informatica Sistemistica e Comunicazione, Università degli Studi di Milano-Bicocca, Milan, Italy
| | - Gianluca Della Vedova
- Dipartimento di Informatica Sistemistica e Comunicazione, Università degli Studi di Milano-Bicocca, Milan, Italy
| | - Yuri Pirola
- Dipartimento di Informatica Sistemistica e Comunicazione, Università degli Studi di Milano-Bicocca, Milan, Italy
| | - Marco Previtali
- Dipartimento di Informatica Sistemistica e Comunicazione, Università degli Studi di Milano-Bicocca, Milan, Italy
| | - Raffaella Rizzi
- Dipartimento di Informatica Sistemistica e Comunicazione, Università degli Studi di Milano-Bicocca, Milan, Italy
| |
Collapse
|