1
|
Huo J, Liu Z, Zhang X, Li C, Xiang D, Fu G, Lin W, Wu L, Gong S, Zhao J, Wang Z, Wang X, Xiao Z, Hao F, Ren Y, Sun YH, Zhao G. Comprehensive visceral transcriptome profiling of three pig breeds along altitudinal gradients in Yunnan. Sci Data 2025; 12:735. [PMID: 40319063 PMCID: PMC12049544 DOI: 10.1038/s41597-025-05070-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2025] [Accepted: 04/24/2025] [Indexed: 05/07/2025] Open
Abstract
High-altitude hypoxia acclimatization requires comprehensive physiological regulation in highland immigrants, yet the genetic mechanisms remain unclear. Yunnan's vertical zoning with pig breeds distributed across varying elevations provides an excellent model for investigating hypoxic adaptation. Here, we examined three indigenous Yunnan pig breeds: Diannan small-ear pigs (DSE, 500 m), Baoshan pigs (BS, 1500 m), and Diqing Tibetan pigs (DT, 3200 m). Using PacBio Iso-Seq, we obtained comprehensive full-length transcriptomes from five tissues (heart, kidney, liver, lung, and spleen), identifying 51,774 transcripts, including 34,813 novel ones, 74,843 Alternative Splicing (AS) events across 10,686 AS genes and pinpointed five actin-binding genes through weighted gene co-expression network analysis (WGCNA). Our research significantly improved porcine genome annotation and provided a high-quality transcriptome resource for investigating the genetic mechanisms of high-altitude hypoxia adaptation. This work lays a solid foundation for future studies in genetics, evolutionary biology, and environmental adaptation.
Collapse
Affiliation(s)
- Jinlong Huo
- College of Animal Science and Technology, Yunnan Agricultural University, Kunming, 650201, Yunnan, China.
| | - Zhipeng Liu
- College of Animal Science and Technology, Yunnan Agricultural University, Kunming, 650201, Yunnan, China
| | - Xia Zhang
- Department of Biological and Food Engineering, Lyuliang University, Lvliang, 033001, Shanxi, China
| | - Changyao Li
- College of Animal Science and Technology, Yunnan Agricultural University, Kunming, 650201, Yunnan, China
| | - Decai Xiang
- Yunnan Academy of Animal Husbandry and Veterinary Sciences, Kunming, 650224, Yunnan, China
| | - Guowen Fu
- College of Veterinary Medicine, Yunnan Agricultural University, Kunming, 650201, Yunnan, China
| | - Wan Lin
- College of Animal Science and Technology, Yunnan Agricultural University, Kunming, 650201, Yunnan, China
| | - Lingxiang Wu
- College of Animal Science and Technology, Yunnan Agricultural University, Kunming, 650201, Yunnan, China
| | - Shaorong Gong
- Baoshan Pig Research Institute, Baoshan, 678200, Yunnan, China
| | - Jiading Zhao
- Baoshan Pig Research Institute, Baoshan, 678200, Yunnan, China
| | - Zhen Wang
- Institute of Animal Husbandry and Veterinary Science of Diqing Tibetan Autonomous Prefecture, Diqing, 674499, Yunnan, China
| | - Xiaohong Wang
- Animal Health Supervision Institute, Bureau of Agriculture and Rural Affairs of Shangri-la, Shangri-la, 674499, Yunnan, China
| | - Zhiping Xiao
- Pure Land Agricultural Development Co., LTD, Shangri-la, 674401, Yunnan, China
| | - Fanfan Hao
- School of Medicine and Dentistry, University of Rochester Medical center, Rochester, New York, 14642, USA
| | - Yue Ren
- School of Medicine and Dentistry, University of Rochester Medical center, Rochester, New York, 14642, USA
| | - Yu H Sun
- Department of Biology, University of Rochester, Rochester, New York, 14627, USA.
| | - Guiying Zhao
- College of Animal Science and Technology, Yunnan Agricultural University, Kunming, 650201, Yunnan, China.
| |
Collapse
|
2
|
Castric V, Batista RA, Carré A, Mousavi S, Mazoyer C, Godé C, Gallina S, Ponitzki C, Theron A, Bellec A, Marande W, Santoni S, Mariotti R, Rubini A, Legrand S, Billiard S, Vekemans X, Vernet P, Saumitou-Laprade P. The homomorphic self-incompatibility system in Oleaceae is controlled by a hemizygous genomic region expressing a gibberellin pathway gene. Curr Biol 2024; 34:1967-1976.e6. [PMID: 38626763 DOI: 10.1016/j.cub.2024.03.047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 02/29/2024] [Accepted: 03/25/2024] [Indexed: 04/18/2024]
Abstract
In flowering plants, outcrossing is commonly ensured by self-incompatibility (SI) systems. These can be homomorphic (typically with many different allelic specificities) or can accompany flower heteromorphism (mostly with just two specificities and corresponding floral types). The SI system of the Oleaceae family is unusual, with the long-term maintenance of only two specificities but often without flower morphology differences. To elucidate the genomic architecture and molecular basis of this SI system, we obtained chromosome-scale genome assemblies of Phillyrea angustifolia individuals and related them to a genetic map. The S-locus region proved to have a segregating 543-kb indel unique to one specificity, suggesting a hemizygous region, as observed in all distylous systems so far studied at the genomic level. Only one of the predicted genes in this indel region is found in the olive tree, Olea europaea, genome, also within a segregating indel. We describe complete association between the presence/absence of this gene and the SI types determined for individuals of seven distantly related Oleaceae species. This gene is predicted to be involved in catabolism of the gibberellic acid (GA) hormone, and experimental manipulation of GA levels in developing buds modified the male and female SI responses of the two specificities in different ways. Our results provide a unique example of a homomorphic SI system, where a single conserved gibberellin-related gene in a hemizygous indel underlies the long-term maintenance of two groups of reproductive compatibility.
Collapse
Affiliation(s)
- Vincent Castric
- Univ. Lille, CNRS, UMR 8198, Evo-Eco-Paleo, F-59000 Lille, France
| | - Rita A Batista
- Univ. Lille, CNRS, UMR 8198, Evo-Eco-Paleo, F-59000 Lille, France
| | - Amélie Carré
- Univ. Lille, CNRS, UMR 8198, Evo-Eco-Paleo, F-59000 Lille, France
| | - Soraya Mousavi
- CNR, Institute of Biosciences and Bioresources (IBBR), 06128 Perugia, Italy
| | - Clément Mazoyer
- Univ. Lille, CNRS, UMR 8198, Evo-Eco-Paleo, F-59000 Lille, France
| | - Cécile Godé
- Univ. Lille, CNRS, UMR 8198, Evo-Eco-Paleo, F-59000 Lille, France
| | - Sophie Gallina
- Univ. Lille, CNRS, UMR 8198, Evo-Eco-Paleo, F-59000 Lille, France
| | - Chloé Ponitzki
- Univ. Lille, CNRS, UMR 8198, Evo-Eco-Paleo, F-59000 Lille, France
| | - Anthony Theron
- INRAE, CNRGV French Plant Genomic Resource Center, F-31326 Castanet Tolosan, France
| | - Arnaud Bellec
- INRAE, CNRGV French Plant Genomic Resource Center, F-31326 Castanet Tolosan, France
| | - William Marande
- INRAE, CNRGV French Plant Genomic Resource Center, F-31326 Castanet Tolosan, France
| | - Sylvain Santoni
- UMR DIAPC Diversité et adaptation des plantes cultivées, F-34398 Montpellier, France
| | - Roberto Mariotti
- CNR, Institute of Biosciences and Bioresources (IBBR), 06128 Perugia, Italy
| | - Andrea Rubini
- CNR, Institute of Biosciences and Bioresources (IBBR), 06128 Perugia, Italy
| | - Sylvain Legrand
- Univ. Lille, CNRS, UMR 8198, Evo-Eco-Paleo, F-59000 Lille, France
| | - Sylvain Billiard
- Univ. Lille, CNRS, UMR 8198, Evo-Eco-Paleo, F-59000 Lille, France
| | - Xavier Vekemans
- Univ. Lille, CNRS, UMR 8198, Evo-Eco-Paleo, F-59000 Lille, France
| | - Philippe Vernet
- Univ. Lille, CNRS, UMR 8198, Evo-Eco-Paleo, F-59000 Lille, France
| | | |
Collapse
|
3
|
Shabbir M, Mithani A. Roast: a tool for reference-free optimization of supertranscriptome assemblies. BMC Bioinformatics 2024; 25:2. [PMID: 38166712 PMCID: PMC10763045 DOI: 10.1186/s12859-023-05614-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 12/12/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND Transcriptomic studies involving organisms for which reference genomes are not available typically start by generating de novo transcriptome or supertranscriptome assembly from the raw RNA-seq reads. Assembling a supertranscriptome is, however, a challenging task due to significantly varying abundance of mRNA transcripts, alternative splicing, and sequencing errors. As a result, popular de novo supertranscriptome assembly tools generate assemblies containing contigs that are partially-assembled, fragmented, false chimeras or have local mis-assemblies leading to decreased assembly accuracy. Commonly available tools for assembly improvement rely primarily on running BLAST using closely related species making their accuracy and reliability conditioned on the availability of the data for closely related organisms. RESULTS We present ROAST, a tool for optimization of supertranscriptome assemblies that uses paired-end RNA-seq data from Illumina sequencing platform to iteratively identify and fix assembly errors solely using the error signatures generated by RNA-seq alignment tools including soft-clips, unexpected expression coverage, and reads with mates unmapped or mapped on a different contig to identify and fix various supertranscriptome assembly errors without performing BLAST searches against other organisms. Evaluation results using simulated as well as real datasets show that ROAST significantly improves assembly quality by identifying and fixing various assembly errors. CONCLUSION ROAST provides a reference-free approach to optimizing supertranscriptome assemblies highlighting its utility in refining de novo supertranscriptome assemblies of non-model organisms.
Collapse
Affiliation(s)
- Madiha Shabbir
- Department of Life Sciences, Syed Babar Ali School of Science and Engineering, Lahore University of Management Sciences (LUMS), DHA, Lahore, 54792, Pakistan
| | - Aziz Mithani
- Department of Life Sciences, Syed Babar Ali School of Science and Engineering, Lahore University of Management Sciences (LUMS), DHA, Lahore, 54792, Pakistan.
| |
Collapse
|
4
|
Williams L, Tomescu AI, Mumey B. Flow Decomposition With Subpath Constraints. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:360-370. [PMID: 35104222 DOI: 10.1109/tcbb.2022.3147697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Flow network decomposition is a natural model for problems where we are given a flow network arising from superimposing a set of weighted paths and would like to recover the underlying data, i.e., decompose the flow into the original paths and their weights. Thus, variations on flow decomposition are often used as subroutines in multiassembly problems such as RNA transcript assembly. In practice, we frequently have access to information beyond flow values in the form of subpaths, and many tools incorporate these heuristically. But despite acknowledging their utility in practice, previous work has not formally addressed the effect of subpath constraints on the accuracy of flow network decomposition approaches. We formalize the flow decomposition with subpath constraints problem, give the first algorithms for it, and study its usefulness for recovering ground truth decompositions. For finding a minimum decomposition, we propose both a heuristic and an FPT algorithm. Experiments on RNA transcript datasets show that for instances with larger solution path sets, the addition of subpath constraints finds 13% more ground truth solutions when minimal decompositions are found exactly, and 30% more ground truth solutions when minimal decompositions are found heuristically.
Collapse
|
5
|
Caceres M, Mumey B, Husic E, Rizzi R, Cairo M, Sahlin K, Tomescu AI. Safety in Multi-Assembly via Paths Appearing in All Path Covers of a DAG. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3673-3684. [PMID: 34847041 DOI: 10.1109/tcbb.2021.3131203] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
A multi-assembly problem asks to reconstruct multiple genomic sequences from mixed reads sequenced from all of them. Standard formulations of such problems model a solution as a path cover in a directed acyclic graph, namely a set of paths that together cover all vertices of the graph. Since multi-assembly problems admit multiple solutions in practice, we consider an approach commonly used in standard genome assembly: output only partial solutions (contigs, or safe paths), that appear in all path cover solutions. We study constrained path covers, a restriction on the path cover solution that incorporate practical constraints arising in multi-assembly problems. We give efficient algorithms finding all maximal safe paths for constrained path covers. We compute the safe paths of splicing graphs constructed from transcript annotations of different species. Our algorithms run in less than 15 seconds per species and report RNA contigs that are over 99% precise and are up to 8 times longer than unitigs. Moreover, RNA contigs cover over 70% of the transcripts and their coding sequences in most cases. With their increased length to unitigs, high precision, and fast construction time, maximal safe paths can provide a better base set of sequences for transcript assembly programs.
Collapse
|
6
|
Lee SG, Na D, Park C. Comparability of reference-based and reference-free transcriptome analysis approaches at the gene expression level. BMC Bioinformatics 2021; 22:310. [PMID: 34674628 PMCID: PMC8529712 DOI: 10.1186/s12859-021-04226-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2021] [Accepted: 06/01/2021] [Indexed: 11/10/2022] Open
Abstract
Background Lately, high-throughput RNA sequencing has been extensively used to elucidate the transcriptome landscape and dynamics of cell types of different species. In particular, for most non-model organisms lacking complete reference genomes with high-quality annotation of genetic information, reference-free (RF) de novo transcriptome analyses, rather than reference-based (RB) approaches, are widely used, and RF analyses have substantially contributed toward understanding the mechanisms regulating key biological processes and functions. To date, numerous bioinformatics studies have been conducted for assessing the workflow, production rate, and completeness of transcriptome assemblies within and between RF and RB datasets. However, the degree of consistency and variability of results obtained by analyzing gene expression levels through these two different approaches have not been adequately documented. Results In the present study, we evaluated the differences in expression profiles obtained with RF and RB approaches and revealed that the former tends to be satisfactorily replaced by the latter with respect to transcriptome repertoires, as well as from a gene expression quantification perspective. In addition, we urge cautious interpretation of these findings. Several genes that are lowly expressed, have long coding sequences, or belong to large gene families must be validated carefully, whenever gene expression levels are calculated using the RF method. Conclusions Our empirical results indicate important contributions toward addressing transcriptome-related biological questions in non-model organisms. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04226-0.
Collapse
Affiliation(s)
- Sung-Gwon Lee
- School of Biological Sciences and Technology, Chonnam National University, Gwangju, 61186, Republic of Korea
| | - Dokyun Na
- Department of Biomedical Engineering, Chung-Ang University, Seoul, 06974, Republic of Korea
| | - Chungoo Park
- School of Biological Sciences and Technology, Chonnam National University, Gwangju, 61186, Republic of Korea.
| |
Collapse
|
7
|
Luo Y, Liao X, Wu FX, Wang J. Computational Approaches for Transcriptome Assembly Based on Sequencing Technologies. Curr Bioinform 2020. [DOI: 10.2174/1574893614666190410155603] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Transcriptome assembly plays a critical role in studying biological properties and
examining the expression levels of genomes in specific cells. It is also the basis of many
downstream analyses. With the increase of speed and the decrease in cost, massive sequencing
data continues to accumulate. A large number of assembly strategies based on different
computational methods and experiments have been developed. How to efficiently perform
transcriptome assembly with high sensitivity and accuracy becomes a key issue. In this work, the
issues with transcriptome assembly are explored based on different sequencing technologies.
Specifically, transcriptome assemblies with next-generation sequencing reads are divided into
reference-based assemblies and de novo assemblies. The examples of different species are used to
illustrate that long reads produced by the third-generation sequencing technologies can cover fulllength
transcripts without assemblies. In addition, different transcriptome assemblies using the
Hybrid-seq methods and other tools are also summarized. Finally, we discuss the future directions
of transcriptome assemblies.
Collapse
Affiliation(s)
- Yuwen Luo
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Xingyu Liao
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, Saskatchewan, Canada
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
8
|
Utilization of Tissue Ploidy Level Variation in de Novo Transcriptome Assembly of Pinus sylvestris. G3-GENES GENOMES GENETICS 2019; 9:3409-3421. [PMID: 31427456 PMCID: PMC6778806 DOI: 10.1534/g3.119.400357] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Compared to angiosperms, gymnosperms lag behind in the availability of assembled and annotated genomes. Most genomic analyses in gymnosperms, especially conifer tree species, rely on the use of de novo assembled transcriptomes. However, the level of allelic redundancy and transcript fragmentation in these assembled transcriptomes, and their effect on downstream applications have not been fully investigated. Here, we assessed three assembly strategies for short-reads data, including the utility of haploid megagametophyte tissue during de novo assembly as single-allele guides, for six individuals and five different tissues in Pinus sylvestris. We then contrasted haploid and diploid tissue genotype calls obtained from the assembled transcriptomes to evaluate the extent of paralog mapping. The use of the haploid tissue during assembly increased its completeness without reducing the number of assembled transcripts. Our results suggest that current strategies that rely on available genomic resources as guidance to minimize allelic redundancy are less effective than the application of strategies that cluster redundant assembled transcripts. The strategy yielding the lowest levels of allelic redundancy among the assembled transcriptomes assessed here was the generation of SuperTranscripts with Lace followed by CD-HIT clustering. However, we still observed some levels of heterozygosity (multiple gene fragments per transcript reflecting allelic redundancy) in this assembled transcriptome on the haploid tissue, indicating that further filtering is required before using these assemblies for downstream applications. We discuss the influence of allelic redundancy when these reference transcriptomes are used to select regions for probe design of exome capture baits and for estimation of population genetic diversity.
Collapse
|
9
|
Rey C, Veber P, Boussau B, Sémon M. CAARS: comparative assembly and annotation of RNA-Seq data. Bioinformatics 2019; 35:2199-2207. [PMID: 30452539 PMCID: PMC6596894 DOI: 10.1093/bioinformatics/bty903] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Revised: 09/13/2018] [Accepted: 11/16/2018] [Indexed: 02/05/2023] Open
Abstract
MOTIVATION RNA sequencing (RNA-Seq) is a widely used approach to obtain transcript sequences in non-model organisms, notably for performing comparative analyses. However, current bioinformatic pipelines do not take full advantage of pre-existing reference data in related species for improving RNA-Seq assembly, annotation and gene family reconstruction. RESULTS We built an automated pipeline named CAARS to combine novel data from RNA-Seq experiments with existing multi-species gene family alignments. RNA-Seq reads are assembled into transcripts by both de novo and assisted assemblies. Then, CAARS incorporates transcripts into gene families, builds gene alignments and trees and uses phylogenetic information to classify the genes as orthologs and paralogs of existing genes. We used CAARS to assemble and annotate RNA-Seq data in rodents and fishes using distantly related genomes as reference, a difficult case for this kind of analysis. We showed CAARS assemblies are more complete and accurate than those assembled by a standard pipeline consisting of de novo assembly coupled with annotation by sequence similarity on a guide species. In addition to annotated transcripts, CAARS provides gene family alignments and trees, annotated with orthology relationships, directly usable for downstream comparative analyses. AVAILABILITY AND IMPLEMENTATION CAARS is implemented in Python and Ocaml and is freely available at https://github.com/carinerey/caars. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Carine Rey
- UnivLyon, Université Claude Bernard Lyon 1, ENS de Lyon, CNRS UMR, INSERM U1210, LBMC, F-69007, Lyon, France
| | - Philippe Veber
- UnivLyon, Université Claude Bernard Lyon 1, CNRS, UMR, LBBE, F-69100, Villeurbanne, France
| | - Bastien Boussau
- UnivLyon, Université Claude Bernard Lyon 1, CNRS, UMR, LBBE, F-69100, Villeurbanne, France
| | - Marie Sémon
- UnivLyon, Université Claude Bernard Lyon 1, ENS de Lyon, CNRS UMR, INSERM U1210, LBMC, F-69007, Lyon, France
| |
Collapse
|
10
|
Fu S, Chang PL, Friesen ML, Teakle NL, Tarone AM, Sze SH. Identifying similar transcripts in a related organism from de Bruijn graphs of RNA-Seq data, with applications to the study of salt and waterlogging tolerance in Melilotus. BMC Genomics 2019; 20:425. [PMID: 31167652 PMCID: PMC6551239 DOI: 10.1186/s12864-019-5702-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Background A popular strategy to study alternative splicing in non-model organisms starts from sequencing the entire transcriptome, then assembling the reads by using de novo transcriptome assembly algorithms to obtain predicted transcripts. A similarity search algorithm is then applied to a related organism to infer possible function of these predicted transcripts. While some of these predictions may be inaccurate and transcripts with low coverage are often missed, we observe that it is possible to obtain a more complete set of transcripts to facilitate possible functional assignments by starting the search from the intermediate de Bruijn graph that contains all branching possibilities. Results We develop an algorithm to extract similar transcripts in a related organism by starting the search from the de Bruijn graph that represents the transcriptome instead of from predicted transcripts. We show that our algorithm is able to recover more similar transcripts than existing algorithms, with large improvements in obtaining longer transcripts and a finer resolution of isoforms. We apply our algorithm to study salt and waterlogging tolerance in two Melilotus species by constructing new RNA-Seq libraries. Conclusions We have developed an algorithm to identify paths in the de Bruijn graph that correspond to similar transcripts in a related organism directly. Our strategy bypasses the transcript prediction step in RNA-Seq data and makes use of support from evolutionary information. Electronic supplementary material The online version of this article (10.1186/s12864-019-5702-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Shuhua Fu
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, 77843, TX, USA
| | - Peter L Chang
- Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, 90089, CA, USA
| | - Maren L Friesen
- Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, 90089, CA, USA.,Department of Crop and Soil Sciences, Washington State University, Pullman, 99164, WA, USA.,Department of Plant Pathology, Washington State University, Pullman, 99164, WA, USA
| | - Natasha L Teakle
- Centre for Ecohydrology, The University of Western Australia, 35 Stirling Highway, Crawley, 6009, WA, Australia.,School of Plant Biology (M084), Faculty of Natural and Agricultural Sciences, The University of Western Australia, 35 Stirling Highway, Crawley, 6009, WA, Australia
| | - Aaron M Tarone
- Department of Entomology, Texas A&M University, College Station, 77843, TX, USA
| | - Sing-Hoi Sze
- Department of Biochemistry & Biophysics, Texas A&M University, College Station, 77843, TX, USA. .,Department of Computer Science and Engineering, Texas A&M University, College Station, 77843, TX, USA.
| |
Collapse
|
11
|
Lowe EK, Cuomo C, Arnone MI. Omics approaches to study gene regulatory networks for development in echinoderms. Brief Funct Genomics 2018; 16:299-308. [PMID: 28957458 DOI: 10.1093/bfgp/elx012] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Gene regulatory networks (GRNs) describe the interactions for a developmental process at a given time and space. Historically, perturbation experiments represent one of the key methods for analyzing and reconstructing a GRN, and the GRN governing early development in the sea urchin embryo stands as one of the more deeply dissected so far. As technology progresses, so do the methods used to address different biological questions. Next-generation sequencing (NGS) has become a standard experimental technique for genome and transcriptome sequencing and studies of protein-DNA interactions and DNA accessibility. While several efforts have been made toward the integration of different omics approaches for the study of the regulatory genome in many animals, in a few cases, these are applied with the purpose of reconstructing and experimentally testing developmental GRNs. Here, we review emerging approaches integrating multiple NGS technologies for the prediction and validation of gene interactions within echinoderm GRNs. These approaches can be applied to both 'model' and 'non-model' organisms. Although a number of issues still need to be addressed, advances in NGS applications, such as assay for transposase-accessible chromatin sequencing, combined with the availability of embryos belonging to different species, all separated by various evolutionary distances and accessible to experimental regulatory biology, place echinoderms in an unprecedented position for the reconstruction and evolutionary comparison of developmental GRNs. We conclude that sequencing technologies and integrated omics approaches allow the examination of GRNs on a genome-wide scale only if biological perturbation and cis-regulatory analyses are experimentally accessible, as in the case of echinoderm embryos.
Collapse
|
12
|
Armero A, Baudouin L, Bocs S, This D. Improving transcriptome de novo assembly by using a reference genome of a related species: Translational genomics from oil palm to coconut. PLoS One 2017; 12:e0173300. [PMID: 28334050 PMCID: PMC5363918 DOI: 10.1371/journal.pone.0173300] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Accepted: 02/17/2017] [Indexed: 01/20/2023] Open
Abstract
The palms are a family of tropical origin and one of the main constituents of the ecosystems of these regions around the world. The two main species of palm represent different challenges: coconut (Cocos nucifera L.) is a source of multiple goods and services in tropical communities, while oil palm (Elaeis guineensis Jacq) is the main protagonist of the oil market. In this study, we present a workflow that exploits the comparative genomics between a target species (coconut) and a reference species (oil palm) to improve the transcriptomic data, providing a proteome useful to answer functional or evolutionary questions. This workflow reduces redundancy and fragmentation, two inherent problems of transcriptomic data, while preserving the functional representation of the target species. Our approach was validated in Arabidopsis thaliana using Arabidopsis lyrata and Capsella rubella as references species. This analysis showed the high sensitivity and specificity of our strategy, relatively independent of the reference proteome. The workflow increased the length of proteins products in A. thaliana by 13%, allowing, often, to recover 100% of the protein sequence length. In addition redundancy was reduced by a factor greater than 3. In coconut, the approach generated 29,366 proteins, 1,246 of these proteins deriving from new contigs obtained with the BRANCH software. The coconut proteome presented a functional profile similar to that observed in rice and an important number of metabolic pathways related to secondary metabolism. The new sequences found with BRANCH software were enriched in functions related to biotic stress. Our strategy can be used as a complementary step to de novo transcriptome assembly to get a representative proteome of a target species. The results of the current analysis are available on the website PalmComparomics (http://palm-comparomics.southgreen.fr/).
Collapse
Affiliation(s)
- Alix Armero
- Montpellier SupAgro, UMR AGAP, Montpellier, France
| | | | - Stéphanie Bocs
- CIRAD, UMR AGAP, Montpellier, France
- South Green Bioinformatics Platform, Montpellier, France
| | | |
Collapse
|
13
|
Kong F, Saldarriaga OA, Spratt H, Osorio EY, Travi BL, Luxon BA, Melby PC. Transcriptional Profiling in Experimental Visceral Leishmaniasis Reveals a Broad Splenic Inflammatory Environment that Conditions Macrophages toward a Disease-Promoting Phenotype. PLoS Pathog 2017; 13:e1006165. [PMID: 28141856 PMCID: PMC5283737 DOI: 10.1371/journal.ppat.1006165] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2016] [Accepted: 01/03/2017] [Indexed: 11/23/2022] Open
Abstract
Visceral Leishmaniasis (VL), caused by the intracellular protozoan Leishmania donovani, is characterized by relentlessly increasing visceral parasite replication, cachexia, massive splenomegaly, pancytopenia and ultimately death. Progressive disease is considered to be due to impaired effector T cell function and/or failure of macrophages to be activated to kill the intracellular parasite. In previous studies, we used the Syrian hamster (Mesocricetus auratus) as a model because it mimics the progressive nature of active human VL. We demonstrated previously that mixed expression of macrophage-activating (IFN-γ) and regulatory (IL-4, IL-10, IL-21) cytokines, parasite-induced expression of macrophage arginase 1 (Arg1), and decreased production of nitric oxide are key immunopathologic factors. Here we examined global changes in gene expression to define the splenic environment and phenotype of splenic macrophages during progressive VL. We used RNA sequencing coupled with de novo transcriptome assembly, because the Syrian hamster does not have a fully sequenced and annotated reference genome. Differentially expressed transcripts identified a highly inflammatory spleen environment with abundant expression of type I and type II interferon response genes. However, high IFN-γ expression was ineffective in directing exclusive M1 macrophage polarization, suppressing M2-associated gene expression, and restraining parasite replication and disease. While many IFN-inducible transcripts were upregulated in the infected spleen, fewer were induced in splenic macrophages in VL. Paradoxically, IFN-γ enhanced parasite growth and induced the counter-regulatory molecules Arg1, Ido1 and Irg1 in splenic macrophages. This was mediated, at least in part, through IFN-γ-induced activation of STAT3 and expression of IL-10, which suggests that splenic macrophages in VL are conditioned to respond to macrophage activation signals with a counter-regulatory response that is ineffective and even disease-promoting. Accordingly, inhibition of STAT3 activation led to a reduced parasite load in infected macrophages. Thus, the STAT3 pathway offers a rational target for adjunctive host-directed therapy to interrupt the pathogenesis of VL. Visceral leishmaniasis (VL) is a neglected parasitic disease that is caused by the intracellular protozoan Leishmania donovani. Patients with this disease suffer from muscle wasting, enlargement of the spleen, reduced blood counts and ultimately will die without treatment. Progressive disease is considered to be due to impaired cellular immunity, with T cell or macrophage dysfunction, or both. We studied the Syrian hamster as an infection model because it mimics the progressive nature of human disease. We examined global changes in gene expression in the spleen and splenic macrophages during experimental VL and identified a highly inflammatory spleen environment with abundant expression of interferon and interferon-response genes that would be expected to control the infection. However, the high level of IFN-γ expression was ineffective in mediating a protective macrophage response, restraining parasite replication and halting progression of disease. We found that IFN-γ itself stimulated parasite growth in splenic macrophages and induced expression of counter-regulatory molecules, which may paradoxically make the host more susceptible. These data give insights into the nature of the immune response that promotes the infection, and identifies potential targets for therapeutic intervention.
Collapse
Affiliation(s)
- Fanping Kong
- Bioinformatics Program, University of Texas Medical Branch, Galveston, Texas, United States of America
- Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, Texas, United States of America
| | - Omar A. Saldarriaga
- Department of Internal Medicine, Division of Infectious Diseases, University of Texas Medical Branch, Galveston, Texas, United States of America
| | - Heidi Spratt
- Bioinformatics Program, University of Texas Medical Branch, Galveston, Texas, United States of America
- Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, Texas, United States of America
- Department of Preventive Medicine and Community Health, University of Texas Medical Branch, Galveston, Texas, United States of America
- * E-mail: (PCM); (HS)
| | - E. Yaneth Osorio
- Department of Internal Medicine, Division of Infectious Diseases, University of Texas Medical Branch, Galveston, Texas, United States of America
| | - Bruno L. Travi
- Department of Internal Medicine, Division of Infectious Diseases, University of Texas Medical Branch, Galveston, Texas, United States of America
- Department of Microbiology and Immunology, University of Texas Medical Branch, Galveston, Texas, United States of America
- Center for Tropical Diseases and Institute for Human Infection and Immunity, University of Texas Medical Branch, Galveston, Texas, United States of America
| | - Bruce A. Luxon
- Bioinformatics Program, University of Texas Medical Branch, Galveston, Texas, United States of America
- Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, Texas, United States of America
| | - Peter C. Melby
- Department of Internal Medicine, Division of Infectious Diseases, University of Texas Medical Branch, Galveston, Texas, United States of America
- Department of Microbiology and Immunology, University of Texas Medical Branch, Galveston, Texas, United States of America
- Center for Tropical Diseases and Institute for Human Infection and Immunity, University of Texas Medical Branch, Galveston, Texas, United States of America
- Department of Pathology, University of Texas Medical Branch, Galveston, Texas, United States of America
- * E-mail: (PCM); (HS)
| |
Collapse
|
14
|
Huang X, Chen XG, Armbruster PA. Comparative performance of transcriptome assembly methods for non-model organisms. BMC Genomics 2016; 17:523. [PMID: 27464550 PMCID: PMC4964045 DOI: 10.1186/s12864-016-2923-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2015] [Accepted: 07/07/2016] [Indexed: 12/19/2022] Open
Abstract
Background The technological revolution in next-generation sequencing has brought unprecedented opportunities to study any organism of interest at the genomic or transcriptomic level. Transcriptome assembly is a crucial first step for studying the molecular basis of phenotypes of interest using RNA-Sequencing (RNA-Seq). However, the optimal strategy for assembling vast amounts of short RNA-Seq reads remains unresolved, especially for organisms without a sequenced genome. This study compared four transcriptome assembly methods, including a widely used de novo assembler (Trinity), two transcriptome re-assembly strategies utilizing proteomic and genomic resources from closely related species (reference-based re-assembly and TransPS) and a genome-guided assembler (Cufflinks). Results These four assembly strategies were compared using a comprehensive transcriptomic database of Aedes albopictus, for which a genome sequence has recently been completed. The quality of the various assemblies was assessed by the number of contigs generated, contig length distribution, percent paired-end read mapping, and gene model representation via BLASTX. Our results reveal that de novo assembly generates a similar number of gene models relative to genome-guided assembly with a fragmented reference, but produces the highest level of redundancy and requires the most computational power. Using a closely related reference genome to guide transcriptome assembly can generate biased contig sequences. Increasing the number of reads used in the transcriptome assembly tends to increase the redundancy within the assembly and decrease both median contig length and percent identity between contigs and reference protein sequences. Conclusions This study provides general guidance for transcriptome assembly of RNA-Seq data from organisms with or without a sequenced genome. The optimal transcriptome assembly strategy will depend upon the subsequent downstream analyses. However, our results emphasize the efficacy of de novo assembly, which can be as effective as genome-guided assembly when the reference genome assembly is fragmented. If a genome assembly and sufficient computational resources are available, it can be beneficial to combine de novo and genome-guided assemblies. Caution should be taken when using a closely related reference genome to guide transcriptome assembly. The quantity of read pairs used in the transcriptome assembly does not necessarily correlate with the quality of the assembly. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2923-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xin Huang
- Department of Biology, Georgetown University, 37th and O Streets NW, Washington, DC, 20057, USA.
| | - Xiao-Guang Chen
- Key Laboratory of Prevention and Control for Emerging Infectious Diseases of Guangdong Higher Institutes, Department of Pathogen Biology, School of Public Health and Tropical Medicine, Southern Medical University, Guangzhou, China
| | - Peter A Armbruster
- Department of Biology, Georgetown University, 37th and O Streets NW, Washington, DC, 20057, USA
| |
Collapse
|
15
|
Bonizzoni P, Dondi R, Klau GW, Pirola Y, Pisanti N, Zaccaria S. On the Minimum Error Correction Problem for Haplotype Assembly in Diploid and Polyploid Genomes. J Comput Biol 2016; 23:718-36. [PMID: 27280382 DOI: 10.1089/cmb.2015.0220] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In diploid genomes, haplotype assembly is the computational problem of reconstructing the two parental copies, called haplotypes, of each chromosome starting from sequencing reads, called fragments, possibly affected by sequencing errors. Minimum error correction (MEC) is a prominent computational problem for haplotype assembly and, given a set of fragments, aims at reconstructing the two haplotypes by applying the minimum number of base corrections. MEC is computationally hard to solve, but some approximation-based or fixed-parameter approaches have been proved capable of obtaining accurate results on real data. In this work, we expand the current characterization of the computational complexity of MEC from the approximation and the fixed-parameter tractability point of view. In particular, we show that MEC is not approximable within a constant factor, whereas it is approximable within a logarithmic factor in the size of the input. Furthermore, we answer open questions on the fixed-parameter tractability for parameters of classical or practical interest: the total number of corrections and the fragment length. In addition, we present a direct 2-approximation algorithm for a variant of the problem that has also been applied in the framework of clustering data. Finally, since polyploid genomes, such as those of plants and fishes, are composed of more than two copies of the chromosomes, we introduce a novel formulation of MEC, namely the k-ploid MEC problem, that extends the traditional problem to deal with polyploid genomes. We show that the novel formulation is still both computationally hard and hard to approximate. Nonetheless, from the parameterized point of view, we prove that the problem is tractable for parameters of practical interest such as the number of haplotypes and the coverage, or the number of haplotypes and the fragment length.
Collapse
Affiliation(s)
- Paola Bonizzoni
- 1 Department of Computer Science (DISCO), University of Milano-Bicocca , Milan, Italy
| | - Riccardo Dondi
- 2 Department of Social and Human Sciences, University of Bergamo , Bergamo, Italy
| | - Gunnar W Klau
- 3 Life Sciences Group, Centrum Wiskunde & Informatica (CWI) , Amsterdam, The Netherlands .,4 ERABLE Team , INRIA, Lyon, France
| | - Yuri Pirola
- 1 Department of Computer Science (DISCO), University of Milano-Bicocca , Milan, Italy
| | - Nadia Pisanti
- 4 ERABLE Team , INRIA, Lyon, France .,5 Department of Computer Science, University of Pisa , Pisa, Italy
| | - Simone Zaccaria
- 1 Department of Computer Science (DISCO), University of Milano-Bicocca , Milan, Italy
| |
Collapse
|
16
|
Bar I, Cummins S, Elizur A. Transcriptome analysis reveals differentially expressed genes associated with germ cell and gonad development in the Southern bluefin tuna (Thunnus maccoyii). BMC Genomics 2016; 17:217. [PMID: 26965070 PMCID: PMC4785667 DOI: 10.1186/s12864-016-2397-8] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Accepted: 01/14/2016] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Controlling and managing the breeding of bluefin tuna (Thunnus spp.) in captivity is an imperative step towards obtaining a sustainable supply of these fish in aquaculture production systems. Germ cell transplantation (GCT) is an innovative technology for the production of inter-species surrogates, by transplanting undifferentiated germ cells derived from a donor species into larvae of a host species. The transplanted surrogates will then grow and mature to produce donor-derived seed, thus providing a simpler alternative to maintaining large-bodied broodstock such as the bluefin tuna. Implementation of GCT for new species requires the development of molecular tools to follow the fate of the transplanted germ cells. These tools are based on key reproductive and germ cell-specific genes. RNA-Sequencing (RNA-Seq) provides a rapid, cost-effective method for high throughput gene identification in non-model species. This study utilized RNA-Seq to identify key genes expressed in the gonads of Southern bluefin tuna (Thunnus maccoyii, SBT) and their specific expression patterns in male and female gonad cells. RESULTS Key genes involved in the reproductive molecular pathway and specifically, germ cell development in gonads, were identified using analysis of RNA-Seq transcriptomes of male and female SBT gonad cells. Expression profiles of transcripts from ovary and testis cells were compared, as well as testis germ cell-enriched fraction prepared with Percoll gradient, as used in GCT studies. Ovary cells demonstrated over-expression of genes related to stem cell maintenance, while in testis cells, transcripts encoding for reproduction-associated receptors, sex steroids and hormone synthesis and signaling genes were over-expressed. Within the testis cells, the Percoll-enriched fraction showed over-expression of genes that are related to post-meiosis germ cell populations. CONCLUSIONS Gonad development and germ cell related genes were identified from SBT gonads and their expression patterns in ovary and testis cells were determined. These expression patterns correlate with the reproductive developmental stage of the sampled fish. The majority of the genes described in this study were sequenced for the first time in T. maccoyii. The wealth of SBT gonadal and germ cell-related gene sequences made publicly available by this study provides an extensive resource for further GCT and reproductive molecular biology studies of this commercially valuable fish.
Collapse
Affiliation(s)
- Ido Bar
- Genecology Research Centre, Faculty of Science, Health, Education and Engineering, University of the Sunshine Coast, 4558 Maroochydore DC, Queensland, Australia
| | - Scott Cummins
- Genecology Research Centre, Faculty of Science, Health, Education and Engineering, University of the Sunshine Coast, 4558 Maroochydore DC, Queensland, Australia
| | - Abigail Elizur
- Genecology Research Centre, Faculty of Science, Health, Education and Engineering, University of the Sunshine Coast, 4558 Maroochydore DC, Queensland, Australia
| |
Collapse
|
17
|
Liu J, Li G, Chang Z, Yu T, Liu B, McMullen R, Chen P, Huang X. BinPacker: Packing-Based De Novo Transcriptome Assembly from RNA-seq Data. PLoS Comput Biol 2016; 12:e1004772. [PMID: 26894997 PMCID: PMC4760927 DOI: 10.1371/journal.pcbi.1004772] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2015] [Accepted: 01/18/2016] [Indexed: 02/06/2023] Open
Abstract
High-throughput RNA-seq technology has provided an unprecedented opportunity to reveal the very complex structures of transcriptomes. However, it is an important and highly challenging task to assemble vast amounts of short RNA-seq reads into transcriptomes with alternative splicing isoforms. In this study, we present a novel de novo assembler, BinPacker, by modeling the transcriptome assembly problem as tracking a set of trajectories of items with their sizes representing coverage of their corresponding isoforms by solving a series of bin-packing problems. This approach, which subtly integrates coverage information into the procedure, has two exclusive features: 1) only splicing junctions are involved in the assembling procedure; 2) massive pell-mell reads are assembled seemingly by moving a comb along junction edges on a splicing graph. Being tested on both real and simulated RNA-seq datasets, it outperforms almost all the existing de novo assemblers on all the tested datasets, and even outperforms those ab initio assemblers on the real dog dataset. In addition, it runs substantially faster and requires less memory space than most of the assemblers. BinPacker is published under GNU GENERAL PUBLIC LICENSE and the source is available from: http://sourceforge.net/projects/transcriptomeassembly/files/BinPacker_1.0.tar.gz/download. Quick installation version is available from: http://sourceforge.net/projects/transcriptomeassembly/files/BinPacker_binary.tar.gz/download. The availability of RNA-seq technology drives the development of algorithms for transcriptome assembly from very short RNA sequences. However, the problem of how to (de novo) assemble transcriptome using RNA-seq datasets has not been modeled well; e.g. sequence coverage information has even not been accurately and effectively integrated into the appropriate assembling procedure, leading to a bottleneck that all the existing (de novo) strategies have encountered. We present a novel approach to remodel the problem as tracking a set of trajectories of items with their sizes representing the coverage of their corresponding isoforms by solving a series of bin-packing problems. This approach, which subtly integrates the coverage information into the procedure, has two exclusive features: 1) only splicing junctions are involved in the assembling procedure; 2) massive pell-mell reads are assembled seemingly by moving a comb along junction edges on a splicing graph. Being tested on both real and simulated RNA-seq datasets, it outperforms almost all existing de novo assemblers on all the tested datasets, even outperforms those ab initio assemblers on the dog dataset, in terms of commonly used comparison standards.
Collapse
Affiliation(s)
- Juntao Liu
- School of Mathematics, Shandong University, Jinan, China
| | - Guojun Li
- School of Mathematics, Shandong University, Jinan, China
- * E-mail: (GL); (XH)
| | - Zheng Chang
- School of Mathematics, Shandong University, Jinan, China
| | - Ting Yu
- School of Mathematics, Shandong University, Jinan, China
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, China
| | - Rick McMullen
- High Performance Computing Center, University of Arkansas, Fayetteville, Arkansas, United States of America
| | - Pengyin Chen
- Crop, Soil, and Environmental Sciences, University of Arkansas, Fayetteville, Arkansas, United States of America
| | - Xiuzhen Huang
- Department of Computer Science, Arkansas State University, Jonesboro, Arkansas, United States of America
- * E-mail: (GL); (XH)
| |
Collapse
|
18
|
Deng F, Chen SY. dbHT-Trans: An Efficient Tool for Filtering the Protein-Encoding Transcripts Assembled by RNA-Seq According to Search for Homologous Proteins. J Comput Biol 2015; 23:1-9. [PMID: 26484655 DOI: 10.1089/cmb.2015.0137] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
In RNA-Seq studies, there are still many challenges for reliably assembling transcripts. Both genome-guided and de novo methods always produce too many false transcripts because of known and unknown factors. Therefore, the postassembly quality filtering is necessary before performing downstream analyses. Here, we present an automatic and efficient tool of dbHT-Trans for filtering the protein-encoding transcripts assembled by RNA-Seq. For each candidate transcript, we first deduced all potential open reading frames and translated them into amino acid sequences. By searching against the reference protein database, a transcript would be predicted a false one when it has no homologous sequence. Using this method, it is expected to filter out the falsely assembled transcripts of protein-encoding genes. Application of dbHT-Trans to the annotated transcriptome of mouse revealed that the sensitivity was almost 90% for recalling protein-encoding transcripts. After this quality filtering, the numbers of assembled genes became more consistent between Cufflinks and Trinity tools. To significantly decrease the data storage, we transformed all intermediate data into descriptive metadata and stored by the MySQL database, which will be utilized by downstream analyses in a real-time style. The source codes, example data, and manual of dbHT-Trans are freely available on the GitHub repository.
Collapse
Affiliation(s)
- Feilong Deng
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University , Chengdu, China
| | - Shi-Yi Chen
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University , Chengdu, China
| |
Collapse
|
19
|
Chang Z, Li G, Liu J, Zhang Y, Ashby C, Liu D, Cramer CL, Huang X. Bridger: a new framework for de novo transcriptome assembly using RNA-seq data. Genome Biol 2015; 16:30. [PMID: 25723335 PMCID: PMC4342890 DOI: 10.1186/s13059-015-0596-2] [Citation(s) in RCA: 191] [Impact Index Per Article: 19.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2014] [Accepted: 01/23/2015] [Indexed: 11/24/2022] Open
Abstract
We present a new de novo transcriptome assembler, Bridger, which takes advantage of techniques employed in Cufflinks to overcome limitations of the existing de novo assemblers. When tested on dog, human, and mouse RNA-seq data, Bridger assembled more full-length reference transcripts while reporting considerably fewer candidate transcripts, hence greatly reducing false positive transcripts in comparison with the state-of-the-art assemblers. It runs substantially faster and requires much less memory space than most assemblers. More interestingly, Bridger reaches a comparable level of sensitivity and accuracy with Cufflinks. Bridger is available at https://sourceforge.net/projects/rnaseqassembly/files/?source=navbar.
Collapse
|
20
|
Legeai F, Derrien T. Identification of long non-coding RNAs in insects genomes. CURRENT OPINION IN INSECT SCIENCE 2015; 7:37-44. [PMID: 32846672 DOI: 10.1016/j.cois.2015.01.003] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2014] [Revised: 01/07/2015] [Accepted: 01/07/2015] [Indexed: 06/11/2023]
Abstract
The development of high throughput sequencing technologies (HTS) has allowed researchers to better assess the complexity and diversity of the transcriptome. Among the many classes of non-coding RNAs (ncRNAs) identified the last decade, long non-coding RNAs (lncRNAs) represent a diverse and numerous repertoire of important ncRNAs, reinforcing the view that they are of central importance to the cell machinery in all branches of life. Although lncRNAs have been involved in essential biological processes such as imprinting, gene regulation or dosage compensation especially in mammals, the repertoire of lncRNAs is poorly characterized for many non-model organisms. In this review, we first focus on what is known about experimentally validated lncRNAs in insects and then review bioinformatic methods to annotate lncRNAs in the genomes of hexapods.
Collapse
Affiliation(s)
- Fabrice Legeai
- INRA, UMR1349, Institute of Genetics, Environment and Plant Protection, Domaine de la Motte, BP35327, 35653 Le Rheu cedex, France; IRISA/INRIA GenScale, Campus Beaulieu, 35000 Rennes, France.
| | - Thomas Derrien
- CNRS, UMR 6290, Institut de Génétique et Développement de Rennes, Université de Rennes 1, 2 Avenue du Pr. Léon Bernard, 35000 Rennes, France
| |
Collapse
|
21
|
Rizzi R, Tomescu AI, Mäkinen V. On the complexity of Minimum Path Cover with Subpath Constraints for multi-assembly. BMC Bioinformatics 2014; 15 Suppl 9:S5. [PMID: 25252805 PMCID: PMC4168716 DOI: 10.1186/1471-2105-15-s9-s5] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Multi-assembly problems have gathered much attention in the last years, as Next-Generation Sequencing technologies have started being applied to mixed settings, such as reads from the transcriptome (RNA-Seq), or from viral quasi-species. One classical model that has resurfaced in many multi-assembly methods (e.g. in Cufflinks, ShoRAH, BRANCH, CLASS) is the Minimum Path Cover (MPC) Problem, which asks for the minimum number of directed paths that cover all the nodes of a directed acyclic graph. The MPC Problem is highly popular because the acyclicity of the graph ensures its polynomial-time solvability. RESULTS In this paper, we consider two generalizations of it dealing with integrating constraints arising from long reads or paired-end reads; these extensions have also been considered by two recent methods, but not fully solved. More specifically, we study the two problems where also a set of subpaths, or pairs of subpaths, of the graph have to be entirely covered by some path in the MPC. We show that in the case of long reads (subpaths), the generalized problem can be solved in polynomial-time by a reduction to the classical MPC Problem. We also consider the weighted case, and show that it can be solved in polynomial-time by a reduction to a min-cost circulation problem. As a side result, we also improve the time complexity of the classical minimum weight MPC Problem. In the case of paired-end reads (pairs of subpaths), the generalized problem becomes NP-hard, but we show that it is fixed-parameter tractable (FPT) in the total number of constraints. This computational dichotomy between long reads and paired-end reads is also a general insight into multi-assembly problems.
Collapse
Affiliation(s)
- Romeo Rizzi
- Department of Computer Science, University of Verona, Italy
| | - Alexandru I Tomescu
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Veli Mäkinen
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
| |
Collapse
|
22
|
Abubucker S, McNulty SN, Rosa BA, Mitreva M. Identification and characterization of alternative splicing in parasitic nematode transcriptomes. Parasit Vectors 2014; 7:151. [PMID: 24690220 PMCID: PMC3997825 DOI: 10.1186/1756-3305-7-151] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2014] [Accepted: 03/14/2014] [Indexed: 12/05/2022] Open
Abstract
Background Alternative splicing (AS) of mRNA is a vital mechanism for enhancing genomic complexity in eukaryotes. Spliced isoforms of the same gene can have diverse molecular and biological functions and are often differentially expressed across various tissues, times, and conditions. Thus, AS has important implications in the study of parasitic nematodes with complex life cycles. Transcriptomic datasets are available from many species, but data must be revisited with splice-aware assembly protocols to facilitate the study of AS in helminthes. Methods We sequenced cDNA from the model worm Caenorhabditis elegans using 454/Roche technology for use as an experimental dataset. Reads were assembled with Newbler software, invoking the cDNA option. Several combinations of parameters were tested and assembled transcripts were verified by comparison with previously reported C. elegans genes and transcript isoforms and with Illumina RNAseq data. Results Thoughtful adjustment of program parameters increased the percentage of assembled transcripts that matched known C. elegans sequences, decreased mis-assembly rates (i.e., cis- and trans-chimeras), and improved the coverage of the geneset. The optimized protocol was used to update de novo transcriptome assemblies from nine parasitic nematode species, including important pathogens of humans and domestic animals. Our assemblies indicated AS rates in the range of 20-30%, typically with 2-3 transcripts per AS locus, depending on the species. Transcript isoforms from the nine species were translated and searched for similarity to known proteins and functional domains. Some 21 InterPro domains, including several involved in nucleotide and chromatin binding, were statistically correlated with AS genetic loci. In most cases, the Roche/454 data explored in this study are the only sequences available from the species in question; however, the recently published genome of the human hookworm Necator americanus provided an additional opportunity to validate our results. Conclusions Our optimized assembly parameters facilitated the first survey of AS among parasitic nematodes. The nine transcriptome assemblies, their protein translations, and basic annotations are available from Nematode.net as a resource for the research community. These should be useful for studies of specific genes and gene families of interest as well as for curating draft genome assemblies as they become available.
Collapse
Affiliation(s)
| | | | | | - Makedonka Mitreva
- The Genome Institute, Washington University School of Medicine, 4444 Forest Park Boulevard, St, Louis, MO 63108, USA.
| |
Collapse
|