1
|
Shmakov NА. Improving the quality of barley transcriptome de novo assembling by using a hybrid approach for lines with varying spike and stem coloration. Vavilovskii Zhurnal Genet Selektsii 2021; 25:30-38. [PMID: 34901701 PMCID: PMC8627909 DOI: 10.18699/vj21.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Revised: 01/15/2021] [Accepted: 01/15/2021] [Indexed: 11/19/2022] Open
Abstract
De novo transcriptome assembly is an important stage of RNA-seq data computational analysis. It allows the researchers to obtain the sequences of transcripts presented in the biological sample of interest. The availability of accurate and complete transcriptome sequence of the organism of interest is, in turn, an indispensable condition for further analysis of RNA-seq data. Through years of transcriptomic research, the bioinformatics community has developed a number of assembler programs for transcriptome reconstruction from short reads of RNA-seq libraries. Different assemblers makes it possible to conduct a de novo transcriptome reconstruction and a genome-guided reconstruction. The majority of the assemblers working with RNA-seq data are based on the De Bruijn graph method of sequence reconstruction. However, specif ics of their procedures can vary drastically, as do their results. A number of authors recommend a hybrid approach to transcriptome reconstruction based on combining the results of several assemblers in order to achieve a better transcriptome assembly. The advantage of this approach has been demonstrated in a number of studies, with RNA-seq experiments conducted on the Illumina platform. In this paper, we propose a hybrid approach for creating a transcriptome assembly of the barley Hordeum vulgare isogenic line Bowman and two nearly isogenic lines contrasting in spike pigmentation, based on the results of sequencing on the IonTorrent platform. This approach implements several de novo assemblers: Trinity, Trans-ABySS and rnaSPAdes. Several assembly metrics were examined: the percentage of reference transcripts observed in the assemblies, the percentage of RNA-seq reads involved, and BUSCO scores. It was shown that, based on the summation of these metrics, transcriptome meta-assembly surpasses individual transcriptome assemblies it consists of.
Collapse
Affiliation(s)
- N А Shmakov
- Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia Kurchatov Genomics Center, Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| |
Collapse
|
2
|
Modern Approaches for Transcriptome Analyses in Plants. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2021; 1346:11-50. [DOI: 10.1007/978-3-030-80352-0_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
3
|
Mora-Márquez F, Vázquez-Poletti JL, Chano V, Collada C, Soto Á, de Heredia UL. Hardware Performance Evaluation of De novo Transcriptome Assembly Software in Amazon Elastic Compute Cloud. Curr Bioinform 2020. [DOI: 10.2174/1574893615666191219095817] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Bioinformatics software for RNA-seq analysis has a high computational
requirement in terms of the number of CPUs, RAM size, and processor characteristics.
Specifically, de novo transcriptome assembly demands large computational infrastructure due to
the massive data size, and complexity of the algorithms employed. Comparative studies on the
quality of the transcriptome yielded by de novo assemblers have been previously published,
lacking, however, a hardware efficiency-oriented approach to help select the assembly hardware
platform in a cost-efficient way.
Objective:
We tested the performance of two popular de novo transcriptome assemblers, Trinity
and SOAPdenovo-Trans (SDNT), in terms of cost-efficiency and quality to assess limitations, and
provided troubleshooting and guidelines to run transcriptome assemblies efficiently.
Methods:
We built virtual machines with different hardware characteristics (CPU number, RAM
size) in the Amazon Elastic Compute Cloud of the Amazon Web Services. Using simulated and
real data sets, we measured the elapsed time, cost, CPU percentage and output size of small and
large data set assemblies.
Results:
For small data sets, SDNT outperformed Trinity by an order the magnitude, significantly
reducing the time duration and costs of the assembly. For large data sets, Trinity performed better
than SDNT. Both the assemblers provide good quality transcriptomes.
Conclusion:
The selection of the optimal transcriptome assembler and provision of computational
resources depend on the combined effect of size and complexity of RNA-seq experiments.
Collapse
Affiliation(s)
- Fernando Mora-Márquez
- GI Sistemas Naturales e Historia Forestal, Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio Natural, Universidad Politecnica de Madrid, Ciudad Universitaria, 28040 Madrid, Spain
| | - José Luis Vázquez-Poletti
- GI Arquitectura de Sistemas Distribuidos, Dpto. Arquitectura de Computadores y Automatica, Facultad de Informatica, Universidad Complutense de Madrid, Ciudad Universitaria, 28040 Madrid, Spain
| | - Víctor Chano
- GI Sistemas Naturales e Historia Forestal, Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio Natural, Universidad Politecnica de Madrid, Ciudad Universitaria, 28040 Madrid, Spain
| | - Carmen Collada
- GI Sistemas Naturales e Historia Forestal, Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio Natural, Universidad Politecnica de Madrid, Ciudad Universitaria, 28040 Madrid, Spain
| | - Álvaro Soto
- GI Sistemas Naturales e Historia Forestal, Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio Natural, Universidad Politecnica de Madrid, Ciudad Universitaria, 28040 Madrid, Spain
| | - Unai López de Heredia
- GI Sistemas Naturales e Historia Forestal, Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio Natural, Universidad Politecnica de Madrid, Ciudad Universitaria, 28040 Madrid, Spain
| |
Collapse
|
4
|
Sharma G, Aminedi R, Saxena D, Gupta A, Banerjee P, Jain D, Chandran D. Effector mining from the Erysiphe pisi haustorial transcriptome identifies novel candidates involved in pea powdery mildew pathogenesis. MOLECULAR PLANT PATHOLOGY 2019; 20:1506-1522. [PMID: 31603276 PMCID: PMC6804345 DOI: 10.1111/mpp.12862] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Pea powdery mildew (PM) is an important fungal disease caused by an obligate biotroph, Erysiphe pisi (Ep), which significantly impacts pea production worldwide. The phytopathogen secretes a plethora of effectors, primarily through specialized infection structures termed haustoria, to establish a dynamic relationship with its host. To identify Ep effector candidates, a cDNA library of enriched haustoria from Ep-infected pea leaves was sequenced. The Ep transcriptome encodes 622 Ep candidate secreted proteins (CSPs), of which 167 were predicted to be candidate secreted effector proteins (CSEPs). Phylogenetic analysis indicates that Ep CSEPs are highly diverse, but, unlike cereal PM CSEPs, exhibit extensive sequence similarity with effectors from other PMs. Quantitative real-time PCR of a subset of EpCSEP/CSPs revealed that the majority are preferentially expressed in haustoria and exhibit infection stage-specific expression patterns. The functional roles of EpCSEP001, EpCSEP009 and EpCSP083 were probed by host-induced gene silencing (HIGS) via a double-stranded (ds) RNA-mediated RNAi approach. Foliar application of individual EpCSEP/CSP dsRNAs resulted in a marked reduction in PM disease symptoms. These findings were consistent with microscopic and molecular studies, suggesting that these Ep CSEP/CSPs play important roles in pea PM pathogenesis. Homology modelling revealed that EpCSEP001 and EpCSEP009 are analogous to fungal ribonucleases and belong to the RALPH family of effectors. This is the first study to identify and functionally validate candidate effectors from the agriculturally relevant pea PM, and highlights the utility of transcriptomics and HIGS to elucidate the key proteins associated with Ep pathogenesis.
Collapse
Affiliation(s)
- Gunjan Sharma
- Laboratory of Plant–Microbe InteractionsRegional Centre for BiotechnologyNCR Biotech Science ClusterFaridabadHaryanaIndia
| | - Raghavendra Aminedi
- Laboratory of Plant–Microbe InteractionsRegional Centre for BiotechnologyNCR Biotech Science ClusterFaridabadHaryanaIndia
| | - Divya Saxena
- Laboratory of Plant–Microbe InteractionsRegional Centre for BiotechnologyNCR Biotech Science ClusterFaridabadHaryanaIndia
- School of Computational and Integrative SciencesJawaharlal Nehru UniversityNew DelhiIndia
| | - Arunima Gupta
- Laboratory of Plant–Microbe InteractionsRegional Centre for BiotechnologyNCR Biotech Science ClusterFaridabadHaryanaIndia
| | - Priyajit Banerjee
- Transcription Regulation Lab, Regional Centre for BiotechnologyNCR Biotech Science ClusterFaridabadHaryanaIndia
- Kalinga Institute of Industrial TechnologyBhubaneswarOrissaIndia
| | - Deepti Jain
- Transcription Regulation Lab, Regional Centre for BiotechnologyNCR Biotech Science ClusterFaridabadHaryanaIndia
| | - Divya Chandran
- Laboratory of Plant–Microbe InteractionsRegional Centre for BiotechnologyNCR Biotech Science ClusterFaridabadHaryanaIndia
| |
Collapse
|
5
|
Ferreira NGC, Morgado RG, Cunha L, Novo M, Soares AMVM, Morgan AJ, Loureiro S, Kille P. Unravelling the molecular mechanisms of nickel in woodlice. ENVIRONMENTAL RESEARCH 2019; 176:108507. [PMID: 31203050 DOI: 10.1016/j.envres.2019.05.038] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Revised: 05/13/2019] [Accepted: 05/22/2019] [Indexed: 06/09/2023]
Abstract
During the last few years, there has been an alarming increase in the amount of nickel (Ni) being released into the environment, primarily due to its use in the production of stainless steel but also from other sources such as batteries manufacturing and consequent disposal. The established biotic ligand models provide precise estimates for Ni bioavailability, in contrast, studies describing the mechanisms underpinning toxicological effect of Ni are scarce. This study exploits RNA-seq to determine the transcriptomic responses of isopods using Porcellionides pruinosus as an example of a terrestrial metal-resistant woodlouse. Furthermore, the recently proposed model for Ni adverse outcome pathways (Ni-AOP) presents an unprecedented opportunity to fit isopod responses to Ni toxicity and define Porcellionides pruinosus as a metalomic model. Prior to this study, P. pruinosus represented an important environmental sentinel, though lacking genetic/omic data. The reference transcriptome generated here thus represents a major advance and a novel resource. A detailed annotation of the transcripts obtained is presented together with the homology to genes/gene products from Metazoan and Arthropoda phylum, Gene Ontology (GO) classification, clusters of orthologous groups (COG) and assignment to KEGG metabolic pathways. The differential gene expression comparison was determined in response to nickel (Ni) exposure and used to derive the enriched pathways and processes. It revealed a significant impact on ion trafficking and storage, oxidative stress, neurotoxicity, reproduction impairment, genetics and epigenetics. Many of the processes observed support the current Ni-AOP although the data highlights that the current model can be improved by including epigenetic endpoints, which represents key chronic risks under a scenario of Ni toxicity.
Collapse
Affiliation(s)
- Nuno G C Ferreira
- Department of Biology & CESAM, University of Aveiro, 3810-193, Aveiro, Portugal; Cardiff University, School of Biosciences, Museum Avenue, CF10 3AX Cardiff - Wales, UK; Centro Interdisciplinar De Investigação Marinha E Ambiental, Terminal de Cruzeiros do Porto de Leixões/Av, General Norton de Matos s/n, 4450-208, Matosinhos, Portugal.
| | - Rui G Morgado
- Department of Biology & CESAM, University of Aveiro, 3810-193, Aveiro, Portugal
| | - Luís Cunha
- School of Applied Sciences, Faculty of Computing, Engineering and Science, University of South Wales, Pontypridd Campus, CF37 4AT UK
| | - Marta Novo
- Biodiversidad, Ecología y Evolución. Facultad de Biología, Universidad Complutense de Madrid, José Antonio Nováis, 2, 28040, Madrid, Spain
| | - Amadeu M V M Soares
- Department of Biology & CESAM, University of Aveiro, 3810-193, Aveiro, Portugal
| | - Andrew J Morgan
- Cardiff University, UK; Cardiff University, School of Biosciences, Museum Avenue, CF10 3AX Cardiff - Wales, UK
| | - Susana Loureiro
- Department of Biology & CESAM, University of Aveiro, 3810-193, Aveiro, Portugal
| | - Peter Kille
- Cardiff University, School of Biosciences, Museum Avenue, CF10 3AX Cardiff - Wales, UK.
| |
Collapse
|
6
|
Morandin C, Pulliainen U, Bos N, Schultner E. De novo transcriptome assembly and its annotation for the black ant Formica fusca at the larval stage. Sci Data 2018; 5:180282. [PMID: 30561435 PMCID: PMC6298252 DOI: 10.1038/sdata.2018.282] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2018] [Accepted: 10/23/2018] [Indexed: 11/09/2022] Open
Abstract
Communication and nutrition are major drivers of fitness in ants. While communication is paramount to colony cohesion, nutrition is decisive in regulating reproductive division of labor among colony members. However, neither of these has been studied from a molecular perspective in developing individuals. Here, we report the availability of the first transcriptome resources for larvae of the ant Formica fusca, a species with excellent discrimination abilities and thus the potential to become a model system for studying molecular mechanisms of communication. We generated a comprehensive, high-coverage RNA-seq data set using Illumina RNA-seq technology by sequencing 24 individual 1st - 2nd instar larvae collected from four experimental groups (6 samples per treatment, 49 million mean reads per sample, coverage between 194-253×). A total of 24,765 unigenes were generated using a combination of genome-guided and de novo transcriptome assembly. A comprehensive assembly pipeline and annotation lists are provided. This dataset adds valuable transcriptomic resources for further study of developmental gene expression, transcriptional regulation and functional gene activity in ant larvae.
Collapse
Affiliation(s)
- Claire Morandin
- Organismal and Evolutionary Biology Research Programme, Faculty of Biological and Environmental Sciences, University of Helsinki, Helsinki, Finland
| | - Unni Pulliainen
- Organismal and Evolutionary Biology Research Programme, Faculty of Biological and Environmental Sciences, University of Helsinki, Helsinki, Finland
- Tvärminne Zoological Station, University of Helsinki, J.A. Palménin tie 260, FI-10900 Hanko, Finland
| | - Nick Bos
- Organismal and Evolutionary Biology Research Programme, Faculty of Biological and Environmental Sciences, University of Helsinki, Helsinki, Finland
| | - Eva Schultner
- Institut für Zoologie, Universität Regensburg, Regensburg, Germany
| |
Collapse
|
7
|
Singh KS, Beadle K, Troczka BJ, Field L, Davies E, Williamson M, Nauen R, Bass C. Extension of Partial Gene Transcripts by Iterative Mapping of RNA-Seq Raw Reads. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 16:1036-1041. [PMID: 30106739 DOI: 10.1109/tcbb.2018.2865309] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Many non-model organisms lack reference genomes and the sequencing and de novo assembly of an organism's transcriptome is an affordable means by which to characterize the coding component of its genome. Despite the advances that have made this possible, assembling a transcriptome without a known reference usually results in a collection of full-length and partial gene transcripts. The downstream analysis of genes represented as partial transcripts then often requires further experimental work in the laboratory in order to obtain full- length sequences. We have explored whether partial transcripts, encoding genes of interest present in de novo assembled transcriptomes of a model and non-model insect species, could be further extended by iterative mapping against the raw transcriptome sequencing reads. Partial sequences encoding cytochrome P450s and carboxyl/cholinesterase were used in this analysis because they are large multigene families and exhibit significant variation in expression. We present an effective method to improve the continuity of partial transcripts in silico that, in the absence of a reference genome, maybe a quick and cost-effective alternative to their extension by laboratory experimentation. Our approach resulted in the successful extension of incompletely assembled transcripts, often to full length. We experimentally validated these results \textit{in silico} and using real-time PCR and sequencing.
Collapse
|
8
|
Orgeur M, Martens M, Börno ST, Timmermann B, Duprez D, Stricker S. A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model. Biol Open 2018; 7:bio.028498. [PMID: 29183907 PMCID: PMC5827264 DOI: 10.1242/bio.028498] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
The sequence of the chicken genome, like several other draft genome sequences, is presently not fully covered. Gaps, contigs assigned with low confidence and uncharacterized chromosomes result in gene fragmentation and imprecise gene annotation. Transcript abundance estimation from RNA sequencing (RNA-seq) data relies on read quality, library complexity and expression normalization. In addition, the quality of the genome sequence used to map sequencing reads, and the gene annotation that defines gene features, must also be taken into account. A partially covered genome sequence causes the loss of sequencing reads from the mapping step, while an inaccurate definition of gene features induces imprecise read counts from the assignment step. Both steps can significantly bias interpretation of RNA-seq data. Here, we describe a dual transcript-discovery approach combining a genome-guided gene prediction and a de novo transcriptome assembly. This dual approach enabled us to increase the assignment rate of RNA-seq data by nearly 20% as compared to when using only the chicken reference annotation, contributing therefore to a more accurate estimation of transcript abundance. More generally, this strategy could be applied to any organism with partial genome sequence and/or lacking a manually-curated reference annotation in order to improve the accuracy of gene expression studies.
Collapse
Affiliation(s)
- Mickael Orgeur
- Freie Universität Berlin, Institut für Chemie und Biochemie, Thielallee 63, 14195 Berlin, Germany.,Max Planck Institute for Molecular Genetics, Development and Disease Group, Ihnestrasse 63-73, 14195 Berlin, Germany.,Sorbonne Universités, UPMC Univ. Paris 06, CNRS UMR 7622, Inserm U1156, IBPS-Developmental Biology Laboratory, 9 Quai Saint-Bernard, 75252 Paris Cedex 05, France
| | - Marvin Martens
- Sorbonne Universités, UPMC Univ. Paris 06, CNRS UMR 7622, Inserm U1156, IBPS-Developmental Biology Laboratory, 9 Quai Saint-Bernard, 75252 Paris Cedex 05, France
| | - Stefan T Börno
- Max Planck Institute for Molecular Genetics, Development and Disease Group, Ihnestrasse 63-73, 14195 Berlin, Germany
| | - Bernd Timmermann
- Max Planck Institute for Molecular Genetics, Development and Disease Group, Ihnestrasse 63-73, 14195 Berlin, Germany
| | - Delphine Duprez
- Sorbonne Universités, UPMC Univ. Paris 06, CNRS UMR 7622, Inserm U1156, IBPS-Developmental Biology Laboratory, 9 Quai Saint-Bernard, 75252 Paris Cedex 05, France
| | - Sigmar Stricker
- Freie Universität Berlin, Institut für Chemie und Biochemie, Thielallee 63, 14195 Berlin, Germany .,Max Planck Institute for Molecular Genetics, Development and Disease Group, Ihnestrasse 63-73, 14195 Berlin, Germany
| |
Collapse
|
9
|
Challenges and advances for transcriptome assembly in non-model species. PLoS One 2017; 12:e0185020. [PMID: 28931057 PMCID: PMC5607178 DOI: 10.1371/journal.pone.0185020] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2017] [Accepted: 09/04/2017] [Indexed: 12/28/2022] Open
Abstract
Analyses of high-throughput transcriptome sequences of non-model organisms are based on two main approaches: de novo assembly and genome-guided assembly using mapping to assign reads prior to assembly. Given the limits of mapping reads to a reference when it is highly divergent, as is frequently the case for non-model species, we evaluate whether using blastn would outperform mapping methods for read assignment in such situations (>15% divergence). We demonstrate its high performance by using simulated reads of lengths corresponding to those generated by the most common sequencing platforms, and over a realistic range of genetic divergence (0% to 30% divergence). Here we focus on gene identification and not on resolving the whole set of transcripts (i.e. the complete transcriptome). For simulated datasets, the transcriptome-guided assembly based on blastn recovers 94.8% of genes irrespective of read length at 0% divergence; however, assignment rate of reads is negatively correlated with both increasing divergence level and reducing read lengths. Nevertheless, we still observe 92.6% of recovered genes at 30% divergence irrespective of read length. This analysis also produces a categorization of genes relative to their assignment, and suggests guidelines for data processing prior to analyses of comparative transcriptomics and gene expression to minimize potential inferential bias associated with incorrect transcript assignment. We also compare the performances of de novo assembly alone vs in combination with a transcriptome-guided assembly based on blastn both via simulation and empirically, using data from a cyprinid fish species and from an oak species. For any simulated scenario, the transcriptome-guided assembly using blastn outperforms the de novo approach alone, including when the divergence level is beyond the reach of traditional mapping methods. Combining de novo assembly and a related reference transcriptome for read assignment also addresses the bias/error in contigs caused by the dependence on a related reference alone. Empirical data corroborate these findings when assembling transcriptomes from the two non-model organisms: Parachondrostoma toxostoma (fish) and Quercus pubescens (plant). For the fish species, out of the 31,944 genes known from D. rerio, the guided and de novo assemblies recover respectively 20,605 and 20,032 genes but the performance of the guided assembly approach is much higher for both the contiguity and completeness metrics. For the oak, out of the 29,971 genes known from Vitis vinifera, the transcriptome-guided and de novo assemblies display similar performance, but the new guided approach detects 16,326 genes where the de novo assembly only detects 9,385 genes.
Collapse
|
10
|
Díez-Vives C, Moitinho-Silva L, Nielsen S, Reynolds D, Thomas T. Expression of eukaryotic-like protein in the microbiome of sponges. Mol Ecol 2017; 26:1432-1451. [PMID: 28036141 DOI: 10.1111/mec.14003] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2016] [Revised: 12/08/2016] [Accepted: 12/09/2016] [Indexed: 01/04/2023]
Abstract
Eukaryotic-like proteins (ELPs) are classes of proteins that are found in prokaryotes, but have a likely evolutionary origin in eukaryotes. ELPs have been postulated to mediate host-microbiome interactions. Recent work has discovered that prokaryotic symbionts of sponges contain abundant and diverse genes for ELPs, which could modulate interactions with their filter-feeding and phagocytic host. However, the extent to which these ELP genes are actually used and expressed by the symbionts is poorly understood. Here, we use metatranscriptomics to investigate ELP expression in the microbiomes of three different sponges (Cymbastella concentrica, Scopalina sp. and Tedania anhelens). We developed a workflow with optimized rRNA removal and in silico subtraction of host sequences to obtain a reliable symbiont metatranscriptome. This showed that between 1.3% and 2.3% of all symbiont transcripts contain genes for ELPs. Two classes of ELPs (cadherin and tetratricopeptide repeats) were abundantly expressed in the C. concentrica and Scopalina sp. microbiomes, while ankyrin repeat ELPs were predominant in the T. anhelens metatranscriptome. Comparison with transcripts that do not encode ELPs indicated a constitutive expression of ELPs across a range of bacterial and archaeal symbionts. Expressed ELPs also contained domains involved in protein secretion and/or were co-expressed with proteins involved in extracellular transport. This suggests these ELPs are likely exported, which could allow for direct interaction with the sponge. Our study shows that ELP genes in sponge symbionts represent actively expressed functions that could mediate molecular interaction between symbiosis partners.
Collapse
Affiliation(s)
- C Díez-Vives
- Centre for Marine Bio-Innovation, The University of New South Wales, Sydney, NSW, Australia
| | - L Moitinho-Silva
- Centre for Marine Bio-Innovation, The University of New South Wales, Sydney, NSW, Australia
| | - S Nielsen
- Centre for Marine Bio-Innovation, The University of New South Wales, Sydney, NSW, Australia
| | - D Reynolds
- Centre for Marine Bio-Innovation, The University of New South Wales, Sydney, NSW, Australia
| | - T Thomas
- Centre for Marine Bio-Innovation, The University of New South Wales, Sydney, NSW, Australia
| |
Collapse
|
11
|
Single-cell TCRseq: paired recovery of entire T-cell alpha and beta chain transcripts in T-cell receptors from single-cell RNAseq. Genome Med 2016; 8:80. [PMID: 27460926 PMCID: PMC4962388 DOI: 10.1186/s13073-016-0335-7] [Citation(s) in RCA: 89] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2015] [Accepted: 07/11/2016] [Indexed: 11/24/2022] Open
Abstract
Accurate characterization of the repertoire of the T-cell receptor (TCR) alpha and beta chains is critical to understanding adaptive immunity. Such characterization has many applications across such fields as vaccine development and response, clone-tracking in cancer, and immunotherapy. Here we present a new methodology called single-cell TCRseq (scTCRseq) for the identification and assembly of full-length rearranged V(D)J T-cell receptor sequences from paired-end single-cell RNA sequencing reads. The method allows accurate identification of the V(D)J rearrangements for each individual T-cell and has the novel ability to recover paired alpha and beta segments. Source code is available at https://github.com/ElementoLab/scTCRseq.
Collapse
|
12
|
Bar I, Cummins S, Elizur A. Transcriptome analysis reveals differentially expressed genes associated with germ cell and gonad development in the Southern bluefin tuna (Thunnus maccoyii). BMC Genomics 2016; 17:217. [PMID: 26965070 PMCID: PMC4785667 DOI: 10.1186/s12864-016-2397-8] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Accepted: 01/14/2016] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Controlling and managing the breeding of bluefin tuna (Thunnus spp.) in captivity is an imperative step towards obtaining a sustainable supply of these fish in aquaculture production systems. Germ cell transplantation (GCT) is an innovative technology for the production of inter-species surrogates, by transplanting undifferentiated germ cells derived from a donor species into larvae of a host species. The transplanted surrogates will then grow and mature to produce donor-derived seed, thus providing a simpler alternative to maintaining large-bodied broodstock such as the bluefin tuna. Implementation of GCT for new species requires the development of molecular tools to follow the fate of the transplanted germ cells. These tools are based on key reproductive and germ cell-specific genes. RNA-Sequencing (RNA-Seq) provides a rapid, cost-effective method for high throughput gene identification in non-model species. This study utilized RNA-Seq to identify key genes expressed in the gonads of Southern bluefin tuna (Thunnus maccoyii, SBT) and their specific expression patterns in male and female gonad cells. RESULTS Key genes involved in the reproductive molecular pathway and specifically, germ cell development in gonads, were identified using analysis of RNA-Seq transcriptomes of male and female SBT gonad cells. Expression profiles of transcripts from ovary and testis cells were compared, as well as testis germ cell-enriched fraction prepared with Percoll gradient, as used in GCT studies. Ovary cells demonstrated over-expression of genes related to stem cell maintenance, while in testis cells, transcripts encoding for reproduction-associated receptors, sex steroids and hormone synthesis and signaling genes were over-expressed. Within the testis cells, the Percoll-enriched fraction showed over-expression of genes that are related to post-meiosis germ cell populations. CONCLUSIONS Gonad development and germ cell related genes were identified from SBT gonads and their expression patterns in ovary and testis cells were determined. These expression patterns correlate with the reproductive developmental stage of the sampled fish. The majority of the genes described in this study were sequenced for the first time in T. maccoyii. The wealth of SBT gonadal and germ cell-related gene sequences made publicly available by this study provides an extensive resource for further GCT and reproductive molecular biology studies of this commercially valuable fish.
Collapse
Affiliation(s)
- Ido Bar
- Genecology Research Centre, Faculty of Science, Health, Education and Engineering, University of the Sunshine Coast, 4558 Maroochydore DC, Queensland, Australia
| | - Scott Cummins
- Genecology Research Centre, Faculty of Science, Health, Education and Engineering, University of the Sunshine Coast, 4558 Maroochydore DC, Queensland, Australia
| | - Abigail Elizur
- Genecology Research Centre, Faculty of Science, Health, Education and Engineering, University of the Sunshine Coast, 4558 Maroochydore DC, Queensland, Australia
| |
Collapse
|
13
|
Deng F, Chen SY. dbHT-Trans: An Efficient Tool for Filtering the Protein-Encoding Transcripts Assembled by RNA-Seq According to Search for Homologous Proteins. J Comput Biol 2015; 23:1-9. [PMID: 26484655 DOI: 10.1089/cmb.2015.0137] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
In RNA-Seq studies, there are still many challenges for reliably assembling transcripts. Both genome-guided and de novo methods always produce too many false transcripts because of known and unknown factors. Therefore, the postassembly quality filtering is necessary before performing downstream analyses. Here, we present an automatic and efficient tool of dbHT-Trans for filtering the protein-encoding transcripts assembled by RNA-Seq. For each candidate transcript, we first deduced all potential open reading frames and translated them into amino acid sequences. By searching against the reference protein database, a transcript would be predicted a false one when it has no homologous sequence. Using this method, it is expected to filter out the falsely assembled transcripts of protein-encoding genes. Application of dbHT-Trans to the annotated transcriptome of mouse revealed that the sensitivity was almost 90% for recalling protein-encoding transcripts. After this quality filtering, the numbers of assembled genes became more consistent between Cufflinks and Trinity tools. To significantly decrease the data storage, we transformed all intermediate data into descriptive metadata and stored by the MySQL database, which will be utilized by downstream analyses in a real-time style. The source codes, example data, and manual of dbHT-Trans are freely available on the GitHub repository.
Collapse
Affiliation(s)
- Feilong Deng
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University , Chengdu, China
| | - Shi-Yi Chen
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University , Chengdu, China
| |
Collapse
|
14
|
Kavembe GD, Franchini P, Irisarri I, Machado-Schiaffino G, Meyer A. Genomics of Adaptation to Multiple Concurrent Stresses: Insights from Comparative Transcriptomics of a Cichlid Fish from One of Earth’s Most Extreme Environments, the Hypersaline Soda Lake Magadi in Kenya, East Africa. J Mol Evol 2015; 81:90-109. [DOI: 10.1007/s00239-015-9696-6] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Accepted: 08/29/2015] [Indexed: 11/29/2022]
|
15
|
Kornobis E, Cabellos L, Aguilar F, Frías-López C, Rozas J, Marco J, Zardoya R. TRUFA: A User-Friendly Web Server for de novo RNA-seq Analysis Using Cluster Computing. Evol Bioinform Online 2015; 11:97-104. [PMID: 26056424 PMCID: PMC4444131 DOI: 10.4137/ebo.s23873] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2015] [Revised: 03/09/2015] [Accepted: 03/16/2015] [Indexed: 01/08/2023] Open
Abstract
Application of next-generation sequencing (NGS) methods for transcriptome analysis (RNA-seq) has become increasingly accessible in recent years and are of great interest to many biological disciplines including, eg, evolutionary biology, ecology, biomedicine, and computational biology. Although virtually any research group can now obtain RNA-seq data, only a few have the bioinformatics knowledge and computation facilities required for transcriptome analysis. Here, we present TRUFA (TRanscriptome User-Friendly Analysis), an open informatics platform offering a web-based interface that generates the outputs commonly used in de novo RNA-seq analysis and comparative transcriptomics. TRUFA provides a comprehensive service that allows performing dynamically raw read cleaning, transcript assembly, annotation, and expression quantification. Due to the computationally intensive nature of such analyses, TRUFA is highly parallelized and benefits from accessing high-performance computing resources. The complete TRUFA pipeline was validated using four previously published transcriptomic data sets. TRUFA’s results for the example datasets showed globally similar results when comparing with the original studies, and performed particularly better when analyzing the green tea dataset. The platform permits analyzing RNA-seq data in a fast, robust, and user-friendly manner. Accounts on TRUFA are provided freely upon request at https://trufa.ifca.es.
Collapse
Affiliation(s)
- Etienne Kornobis
- Departamento de biodiversidad y biología evolutiva, Museo Nacional de Ciencias Naturales MNCN (CSIC), Madrid, Spain
| | - Luis Cabellos
- Instituto de Física de Cantabria, IFCA (CSIC-UC), Edificio Juan Jordá, Santander, Spain
| | - Fernando Aguilar
- Instituto de Física de Cantabria, IFCA (CSIC-UC), Edificio Juan Jordá, Santander, Spain
| | - Cristina Frías-López
- Departament de Genètica and Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| | - Julio Rozas
- Departament de Genètica and Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| | - Jesús Marco
- Instituto de Física de Cantabria, IFCA (CSIC-UC), Edificio Juan Jordá, Santander, Spain
| | - Rafael Zardoya
- Departamento de biodiversidad y biología evolutiva, Museo Nacional de Ciencias Naturales MNCN (CSIC), Madrid, Spain
| |
Collapse
|
16
|
Du L, Li W, Fan Z, Shen F, Yang M, Wang Z, Jian Z, Hou R, Yue B, Zhang X. First insights into the giant panda (Ailuropoda melanoleuca) blood transcriptome: a resource for novel gene loci and immunogenetics. Mol Ecol Resour 2015; 15:1001-13. [DOI: 10.1111/1755-0998.12367] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2014] [Revised: 12/22/2014] [Accepted: 12/26/2014] [Indexed: 12/20/2022]
Affiliation(s)
- Lianming Du
- Key Laboratory of Bio-resources and Eco-environment; Ministry of Education; College of Life Science; Sichuan University; Chengdu Sichuan 610064 China
| | - Wujiao Li
- Key Laboratory of Bio-resources and Eco-environment; Ministry of Education; College of Life Science; Sichuan University; Chengdu Sichuan 610064 China
| | - Zhenxin Fan
- Key Laboratory of Bio-resources and Eco-environment; Ministry of Education; College of Life Science; Sichuan University; Chengdu Sichuan 610064 China
| | - Fujun Shen
- The Sichuan Key Laboratory for Conservation Biology of Endangered Wildlife; Chengdu Research Base of Giant Panda Breeding; Chengdu Sichuan 610081 China
| | - Mingyu Yang
- Key Laboratory of Bio-resources and Eco-environment; Ministry of Education; College of Life Science; Sichuan University; Chengdu Sichuan 610064 China
| | - Zili Wang
- Key Laboratory of Bio-resources and Eco-environment; Ministry of Education; College of Life Science; Sichuan University; Chengdu Sichuan 610064 China
| | - Zuoyi Jian
- Key Laboratory of Bio-resources and Eco-environment; Ministry of Education; College of Life Science; Sichuan University; Chengdu Sichuan 610064 China
| | - Rong Hou
- The Sichuan Key Laboratory for Conservation Biology of Endangered Wildlife; Chengdu Research Base of Giant Panda Breeding; Chengdu Sichuan 610081 China
| | - Bisong Yue
- Key Laboratory of Bio-resources and Eco-environment; Ministry of Education; College of Life Science; Sichuan University; Chengdu Sichuan 610064 China
| | - Xiuyue Zhang
- Key Laboratory of Bio-resources and Eco-environment; Ministry of Education; College of Life Science; Sichuan University; Chengdu Sichuan 610064 China
| |
Collapse
|
17
|
Metabolic modeling of common Escherichia coli strains in human gut microbiome. BIOMED RESEARCH INTERNATIONAL 2014; 2014:694967. [PMID: 25126572 PMCID: PMC4122010 DOI: 10.1155/2014/694967] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/21/2014] [Revised: 06/11/2014] [Accepted: 06/13/2014] [Indexed: 12/01/2022]
Abstract
The recent high-throughput sequencing has enabled the composition of Escherichia coli strains in the human microbial community to be profiled en masse. However, there are two challenges to address: (1) exploring the genetic differences between E. coli strains in human gut and (2) dynamic responses of E. coli to diverse stress conditions. As a result, we investigated the E. coli strains in human gut microbiome using deep sequencing data and reconstructed genome-wide metabolic networks for the three most common E. coli strains, including E. coli HS, UTI89, and CFT073. The metabolic models show obvious strain-specific characteristics, both in network contents and in behaviors. We predicted optimal biomass production for three models on four different carbon sources (acetate, ethanol, glucose, and succinate) and found that these stress-associated genes were involved in host-microbial interactions and increased in human obesity. Besides, it shows that the growth rates are similar among the models, but the flux distributions are different, even in E. coli core reactions. The correlations between human diabetes-associated metabolic reactions in the E. coli models were also predicted. The study provides a systems perspective on E. coli strains in human gut microbiome and will be helpful in integrating diverse data sources in the following study.
Collapse
|