1
|
Kang JN, Hur M, Kim CK, Yang SH, Lee SM. Enhancing transcriptome analysis in medicinal plants: multiple unigene sets in Astragalus membranaceus. FRONTIERS IN PLANT SCIENCE 2024; 15:1301526. [PMID: 38384760 PMCID: PMC10879423 DOI: 10.3389/fpls.2024.1301526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 01/22/2024] [Indexed: 02/23/2024]
Abstract
Astragalus membranaceus is a medicinal plant mainly used in East Asia and contains abundant secondary metabolites. Despite the importance of this plant, the available genomic and genetic information is still limited. De novo transcriptome construction is recognized as an essential method for transcriptome research when reference genome information is incomplete. In this study, we constructed three individual transcriptome sets (unigene sets) for detailed analysis of the phenylpropanoid biosynthesis pathway, a major metabolite of A. membranaceus. Set-1 was a circular consensus sequence (CCS) generated using PacBio sequencing (PacBio-seq). Set-2 consisted of hybridized assembled unigenes with Illumina sequencing (Illumina-seq) reads and PacBio CCS using rnaSPAdes. Set-3 unigenes were assembled from Illumina-seq reads using the Trinity software. Construction of multiple unigene sets provides several advantages for transcriptome analysis. First, it provides an appropriate expression filtering threshold for assembly-based unigenes: a threshold transcripts per million (TPM) ≥ 5 removed more than 88% of assembly-based unigenes, which were mostly short and low-expressing unigenes. Second, assembly-based unigenes compensated for the incomplete length of PacBio CCSs: the ends of the 5`/3` untranslated regions of phenylpropanoid-related unigenes derived from set-1 were incomplete, which suggests that PacBio CCSs are unlikely to be full-length transcripts. Third, more isoform unigenes could be obtained from multiple unigene sets; isoform unigenes missing in Set-1 were detected in set-2 and set-3. Finally, gene ontology and Kyoto Encyclopedia of Genes and Genomes analyses showed that phenylpropanoid biosynthesis and carbohydrate metabolism were highly activated in A. membranaceus roots. Various sequencing technologies and assemblers have been developed for de novo transcriptome analysis. However, no technique is perfect for de novo transcriptome analysis, suggesting the need to construct multiple unigene sets. This method enables efficient transcript filtering and detection of longer and more diverse transcripts.
Collapse
Affiliation(s)
- Ji-Nam Kang
- Genomics Division, National Institute of Agricultural Sciences, Jeonju-si, Jeollabuk-do, Republic of Korea
| | - Mok Hur
- Department of Herbal Crop Resources, National Institute of Horticultural & Herbal Science, Eumseong-gun, Chungcheongbuk-do, Republic of Korea
| | - Chang-Kug Kim
- Genomics Division, National Institute of Agricultural Sciences, Jeonju-si, Jeollabuk-do, Republic of Korea
| | - So-Hee Yang
- Genomics Division, National Institute of Agricultural Sciences, Jeonju-si, Jeollabuk-do, Republic of Korea
| | - Si-Myung Lee
- Genomics Division, National Institute of Agricultural Sciences, Jeonju-si, Jeollabuk-do, Republic of Korea
| |
Collapse
|
2
|
Alvarez RV, Landsman D. GTax: improving de novo transcriptome assembly by removing foreign RNA contamination. Genome Biol 2024; 25:12. [PMID: 38191464 PMCID: PMC10773103 DOI: 10.1186/s13059-023-03141-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 12/08/2023] [Indexed: 01/10/2024] Open
Abstract
The cost and complexity of generating a complete reference genome means that many organisms lack an annotated reference. An alternative is to use a de novo reference transcriptome. This technology is cost-effective but is susceptible to off-target RNA contamination. In this manuscript, we present GTax, a taxonomy-structured database of genomic sequences that can be used with BLAST to detect and remove foreign contamination in RNA sequencing samples before assembly. In addition, we use a de novo transcriptome assembly of Solanum lycopersicum (tomato) to demonstrate that removing foreign contamination in sequencing samples reduces the number of assembled chimeric transcripts.
Collapse
Affiliation(s)
- Roberto Vera Alvarez
- Computational Biology Branch, National Center for Biotechnology Information, Intramural Research Program, National Library of Medicine, NIH, Bethesda, MD, USA
| | - David Landsman
- Computational Biology Branch, National Center for Biotechnology Information, Intramural Research Program, National Library of Medicine, NIH, Bethesda, MD, USA.
| |
Collapse
|
3
|
Bejerman N, Dietzgen R, Debat H. Novel Tri-Segmented Rhabdoviruses: A Data Mining Expedition Unveils the Cryptic Diversity of Cytorhabdoviruses. Viruses 2023; 15:2402. [PMID: 38140643 PMCID: PMC10747219 DOI: 10.3390/v15122402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 12/07/2023] [Accepted: 12/08/2023] [Indexed: 12/24/2023] Open
Abstract
Cytorhabdoviruses (genus Cytorhabdovirus, family Rhabdoviridae) are plant-infecting viruses with enveloped, bacilliform virions. Established members of the genus Cytorhabdovirus have unsegmented single-stranded negative-sense RNA genomes (ca. 10-16 kb) which encode four to ten proteins. Here, by exploring large publicly available metatranscriptomics datasets, we report the identification and genomic characterization of 93 novel viruses with genetic and evolutionary cues of cytorhabdoviruses. Strikingly, five unprecedented viruses with tri-segmented genomes were also identified. This finding represents the first tri-segmented viruses in the family Rhabdoviridae, and they should be classified in a novel genus within this family for which we suggest the name "Trirhavirus". Interestingly, the nucleocapsid and polymerase were the only typical rhabdoviral proteins encoded by those tri-segmented viruses, whereas in three of them, a protein similar to the emaravirus (family Fimoviridae) silencing suppressor was found, while the other predicted proteins had no matches in any sequence databases. Genetic distance and evolutionary insights suggest that all these novel viruses may represent members of novel species. Phylogenetic analyses, of both novel and previously classified plant rhabdoviruses, provide compelling support for the division of the genus Cytorhabdovirus into three distinct genera. This proposed reclassification not only enhances our understanding of the evolutionary dynamics within this group of plant rhabdoviruses but also illuminates the remarkable genomic diversity they encompass. This study not only represents a significant expansion of the genomics of cytorhabdoviruses that will enable future research on the evolutionary peculiarity of this genus but also shows the plasticity in the rhabdovirus genome organization with the discovery of tri-segmented members with a unique evolutionary trajectory.
Collapse
Affiliation(s)
- Nicolas Bejerman
- Instituto de Patología Vegetal—Centro de Investigaciones Agropecuarias—Instituto Nacional de Tecnología Agropecuaria (IPAVE—CIAP—INTA), Camino 60 Cuadras Km 5,5, Córdoba X5020ICA, Argentina
- Unidad de Fitopatología y Modelización Agrícola, Consejo Nacional de Investigaciones Científicas y Técnicas, Camino 60 Cuadras Km 5,5, Córdoba X5020ICA, Argentina
| | - Ralf Dietzgen
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St. Lucia, QLD 4072, Australia
| | - Humberto Debat
- Instituto de Patología Vegetal—Centro de Investigaciones Agropecuarias—Instituto Nacional de Tecnología Agropecuaria (IPAVE—CIAP—INTA), Camino 60 Cuadras Km 5,5, Córdoba X5020ICA, Argentina
- Unidad de Fitopatología y Modelización Agrícola, Consejo Nacional de Investigaciones Científicas y Técnicas, Camino 60 Cuadras Km 5,5, Córdoba X5020ICA, Argentina
| |
Collapse
|
4
|
Farkas C, Recabal A, Mella A, Candia-Herrera D, Olivero MG, Haigh JJ, Tarifeño-Saldivia E, Caprile T. annotate_my_genomes: an easy-to-use pipeline to improve genome annotation and uncover neglected genes by hybrid RNA sequencing. Gigascience 2022; 11:6874526. [PMID: 36472574 PMCID: PMC9724561 DOI: 10.1093/gigascience/giac099] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 07/22/2022] [Accepted: 09/28/2022] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND The advancement of hybrid sequencing technologies is increasingly expanding genome assemblies that are often annotated using hybrid sequencing transcriptomics, leading to improved genome characterization and the identification of novel genes and isoforms in a wide variety of organisms. RESULTS We developed an easy-to-use genome-guided transcriptome annotation pipeline that uses assembled transcripts from hybrid sequencing data as input and distinguishes between coding and long non-coding RNAs by integration of several bioinformatic approaches, including gene reconciliation with previous annotations in GTF format. We demonstrated the efficiency of this approach by correctly assembling and annotating all exons from the chicken SCO-spondin gene (containing more than 105 exons), including the identification of missing genes in the chicken reference annotations by homology assignments. CONCLUSIONS Our method helps to improve the current transcriptome annotation of the chicken brain. Our pipeline, implemented on Anaconda/Nextflow and Docker is an easy-to-use package that can be applied to a broad range of species, tissues, and research areas helping to improve and reconcile current annotations. The code and datasets are publicly available at https://github.com/cfarkas/annotate_my_genomes.
Collapse
Affiliation(s)
| | - Antonia Recabal
- Departamento de Biología Celular, Facultad de Ciencias Biológicas, Universidad de Concepción, Chile
| | - Andy Mella
- Instituto de Ciencias Naturales, Universidad de las Américas, Chile,Centro Integrativo de Biología y Química Aplicada (CIBQA), Universidad Bernardo O'Higgins, Santiago 8370854, Chile
| | - Daniel Candia-Herrera
- Departamento de Bioquímica y Biología Molecular, Facultad de Ciencias Biológicas, Universidad de Concepción, Chile
| | - Maryori González Olivero
- Departamento de Biología Celular, Facultad de Ciencias Biológicas, Universidad de Concepción, Chile
| | - Jody Jonathan Haigh
- CancerCare Manitoba Research Institute, Winnipeg, MB, Canada,Department of Pharmacology and Therapeutics, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, MB, Canada
| | | | | |
Collapse
|
5
|
da Silva EMG, Rebello KM, Choi YJ, Gregorio V, Paschoal AR, Mitreva M, McKerrow JH, Neves-Ferreira AGDC, Passetti F. Identification of Novel Genes and Proteoforms in Angiostrongylus costaricensis through a Proteogenomic Approach. Pathogens 2022; 11:1273. [PMID: 36365024 PMCID: PMC9694666 DOI: 10.3390/pathogens11111273] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 10/15/2022] [Accepted: 10/20/2022] [Indexed: 07/22/2023] Open
Abstract
RNA sequencing (RNA-Seq) and mass-spectrometry-based proteomics data are often integrated in proteogenomic studies to assist in the prediction of eukaryote genome features, such as genes, splicing, single-nucleotide (SNVs), and single-amino-acid variants (SAAVs). Most genomes of parasite nematodes are draft versions that lack transcript- and protein-level information and whose gene annotations rely only on computational predictions. Angiostrongylus costaricensis is a roundworm species that causes an intestinal inflammatory disease, known as abdominal angiostrongyliasis (AA). Currently, there is no drug available that acts directly on this parasite, mostly due to the sparse understanding of its molecular characteristics. The available genome of A. costaricensis, specific to the Costa Rica strain, is a draft version that is not supported by transcript- or protein-level evidence. This study used RNA-Seq and MS/MS data to perform an in-depth annotation of the A. costaricensis genome. Our prediction improved the reference annotation with (a) novel coding and non-coding genes; (b) pieces of evidence of alternative splicing generating new proteoforms; and (c) a list of SNVs between the Brazilian (Crissiumal) and the Costa Rica strain. To the best of our knowledge, this is the first time that a multi-omics approach has been used to improve the genome annotation of A. costaricensis. We hope this improved genome annotation can assist in the future development of drugs, kits, and vaccines to treat, diagnose, and prevent AA caused by either the Brazil strain (Crissiumal) or the Costa Rica strain.
Collapse
Affiliation(s)
- Esdras Matheus Gomes da Silva
- Instituto Carlos Chagas, Fiocruz, Curitiba 81350-010, PR, Brazil
- Laboratory of Toxinology, Oswaldo Cruz Institute, Fiocruz, Rio de Janeiro 21040-900, RJ, Brazil
| | - Karina Mastropasqua Rebello
- Laboratory of Toxinology, Oswaldo Cruz Institute, Fiocruz, Rio de Janeiro 21040-900, RJ, Brazil
- Laboratory of Integrated Studies in Protozoology, Oswaldo Cruz Institute, Fiocruz, Rio de Janeiro 21040-360, RJ, Brazil
| | - Young-Jun Choi
- Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Vitor Gregorio
- Bioinformatics and Pattern Recognition Group (Bioinfo-CP), Department of Computer Science (DACOM), Federal University of Technology-Parana (UTFPR), Cornélio Procópio 86300-000, PR, Brazil
| | - Alexandre Rossi Paschoal
- Bioinformatics and Pattern Recognition Group (Bioinfo-CP), Department of Computer Science (DACOM), Federal University of Technology-Parana (UTFPR), Cornélio Procópio 86300-000, PR, Brazil
| | - Makedonka Mitreva
- Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - James H. McKerrow
- Center for Discovery and Innovation in Parasitic Diseases, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, CA 92093, USA
| | | | - Fabio Passetti
- Instituto Carlos Chagas, Fiocruz, Curitiba 81350-010, PR, Brazil
| |
Collapse
|
6
|
Wang X, Liu G, Xie S, Pan L, Tan Q. Growth and Meat Quality of Grass Carp ( Ctenopharyngodon idellus) Responded to Dietary Protein (Soybean Meal) Level Through the Muscle Metabolism and Gene Expression of Myosin Heavy Chains. Front Nutr 2022; 9:833924. [PMID: 35419399 PMCID: PMC8996190 DOI: 10.3389/fnut.2022.833924] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Accepted: 01/31/2022] [Indexed: 01/23/2023] Open
Abstract
The aim of this study was to investigate the effect of dietary protein level (soybean meal) on growth performance, flesh quality of grass carp, and the related molecular mechanisms. The results showed that appropriate dietary protein levels improved the growth performance, hardness, and pH of muscle while decreasing muscle crude lipid content and cooking loss and altering the antioxidant capacity and metabolic enzymes activities. In addition, appropriate dietary protein promoted the gene expression of myhc-1, myhc-4, myf5, myod, myog, and fgf6a, whereas inhibited that of myhc-7, myhc-2, mrf4, and mstn. Transcriptome profiling of muscle revealed that the flesh quality-specific differences were related to tight junctions and intramuscular fat (IMF) accumulation. GSEA showed that fatty acid metabolism and oxidative phosphorylation were downregulated in SM5 compared with SM1. To conclude, appropriate protein levels improved the growth and flesh quality by regulating muscle antioxidant capacity and gene expression of myhcs and fat metabolism-related signaling molecules.
Collapse
Affiliation(s)
- Xiaoyu Wang
- College of Fisheries, Huazhong Agricultural University, Wuhan, China.,Engineering Research Center of Green Development for Conventional Aquatic Biological Industry in the Yangtze River Economic Belt, Ministry of Education, Wuhan, China.,Key Laboratory of Freshwater Animal Breeding, Ministry of Agriculture and Rural Affairs, Wuhan, China.,Hubei Provincial Engineering Laboratory for Pond Aquaculture, Wuhan, China
| | - Guoqing Liu
- College of Fisheries, Huazhong Agricultural University, Wuhan, China.,Engineering Research Center of Green Development for Conventional Aquatic Biological Industry in the Yangtze River Economic Belt, Ministry of Education, Wuhan, China.,Key Laboratory of Freshwater Animal Breeding, Ministry of Agriculture and Rural Affairs, Wuhan, China.,Hubei Provincial Engineering Laboratory for Pond Aquaculture, Wuhan, China
| | - Shouqi Xie
- State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China
| | - Lei Pan
- Faculty of Resources and Environmental Science, Hubei University, Wuhan, China
| | - Qingsong Tan
- College of Fisheries, Huazhong Agricultural University, Wuhan, China.,Engineering Research Center of Green Development for Conventional Aquatic Biological Industry in the Yangtze River Economic Belt, Ministry of Education, Wuhan, China.,Key Laboratory of Freshwater Animal Breeding, Ministry of Agriculture and Rural Affairs, Wuhan, China.,Hubei Provincial Engineering Laboratory for Pond Aquaculture, Wuhan, China
| |
Collapse
|
7
|
Al Kadi M, Jung N, Okuzaki D. UNAGI: Yeast Transcriptome Reconstruction and Gene Discovery Using Nanopore Sequencing. Methods Mol Biol 2022; 2477:79-89. [PMID: 35524113 DOI: 10.1007/978-1-0716-2257-5_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Computational approaches are the main approaches used in genome annotation. However, accuracy is low. Untranslated regions are not identified, complex isoforms are not predicted correctly and discovery rate of noncoding RNA is low. RNA-seq has revolutionized transcriptome reconstruction over the last decade. However, fragmentation included in cDNA sequencing leads to information loss, requiring transcripts to be assembled and reconstructed, thus affecting the accuracy of reconstructed transcriptome. Recently, long-read sequencing has been introduced with technologies such as Oxford Nanopore sequencing. cDNA is sequenced directly without fragmentation producing long reads that don't need to be assembled keeping the transcript structure intact and increasing the accuracy of transcriptome reconstruction.Here we present a protocol and a pipeline to reconstruct the transcriptome of compact genomes including yeasts. It involves generating full-length cDNA and using Oxford Nanopore ligation-based sequencing kit to sequence multiple samples in the same run. The pipeline (1) strands the generated long reads, (2) corrects the reads by mapping them to the reference genome, (3) identifies transcripts including 5'UTR and 3'UTR, (4) profiles the isoforms, filtering out artifacts resulting from low accuracy in sequencing, and (5) improves accuracy of provided annotations. Using long reads improves the accuracy of transcriptome reconstruction and helps in discovering a significant number of novel RNAs.
Collapse
Affiliation(s)
- Mohamad Al Kadi
- Department of Bacterial Infections, Research Institute for Microbial Diseases, Osaka University, Osaka, Japan
| | - Nicolas Jung
- Department of Infection Metagenomics, Research Institute for Microbial Diseases, Osaka University, Osaka, Japan
| | - Daisuke Okuzaki
- Integrated Frontier Research for Medical Science Division, Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Osaka, Japan.
- Single Cell Genomics, Human Immunology, WPI Immunology Frontier Research Center, Osaka University, Osaka, Japan.
- Genome Information Research Center, Research Institute for Microbial Diseases, Osaka University, Osaka, Japan.
| |
Collapse
|
8
|
Resolving the microalgal gene landscape at the strain level: A novel hybrid transcriptome of Emiliania huxleyi CCMP3266. Appl Environ Microbiol 2021; 88:e0141821. [PMID: 34757817 DOI: 10.1128/aem.01418-21] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Microalgae are key ecological players with a complex evolutionary history. Genomic diversity, in addition to limited availability of high-quality genomes, challenge studies that aim to elucidate molecular mechanisms underlying microalgal ecophysiology. Here, we present a novel and comprehensive transcriptomic hybrid approach to generate a reference for genetic analyses, and resolve the microalgal gene landscape at the strain level. The approach is demonstrated for a strain of the coccolithophore microalga Emiliania huxleyi, which is a species complex with considerable genome variability. The investigated strain is commonly studied as a model for algal-bacterial interactions, and was therefore sequenced in the presence of bacteria to elicit the expression of interaction-relevant genes. We applied complementary PacBio Iso-Seq full-length cDNA, and poly(A)-independent Illumina total RNA sequencing, which resulted in a de novo assembled, near complete hybrid transcriptome. In particular, hybrid sequencing improved the reconstruction of long transcripts and increased the recovery of full-length transcript isoforms. To use the resulting hybrid transcriptome as a reference for genetic analyses, we demonstrate a method that collapses the transcriptome into a genome-like dataset, termed "synthetic genome" (sGenome). We used the sGenome as a reference to visually confirm the robustness of the CCMP3266 gene assembly, to conduct differential gene expression analysis, and to characterize novel E. huxleyi genes. The newly-identified genes contribute to our understanding of E. huxleyi genome diversification, and are predicted to play a role in microbial interactions. Our transcriptomic toolkit can be implemented in various microalgae to facilitate mechanistic studies on microalgal diversity and ecology. Importance Microalgae are key players in the ecology and biogeochemistry of our oceans. Efforts to implement genomic and transcriptomic tools in laboratory studies involving microalgae suffer from the lack of published genomes. In the case of coccolithophore microalgae, the problem has long been recognized; the model species Emiliania huxleyi is a species complex with genomes composed of a core, and a large variable portion. To study the role of the variable portion in niche adaptation, and specifically in microbial interactions, strain-specific genetic information is required. Here we present a novel transcriptomic hybrid approach, and generated strain-specific genome-like information. We demonstrate our approach on an E. huxleyi strain that is co-cultivated with bacteria. By constructing a "synthetic genome", we generated comprehensive gene annotations that enabled accurate analyses of gene expression patterns. Importantly, we unveiled novel genes in the variable portion of E. huxleyi that play putative roles in microbial interactions.
Collapse
|
9
|
Comparative Analysis of PacBio and Oxford Nanopore Sequencing Technologies for Transcriptomic Landscape Identification of Penaeus monodon. Life (Basel) 2021; 11:life11080862. [PMID: 34440606 PMCID: PMC8399832 DOI: 10.3390/life11080862] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Revised: 08/07/2021] [Accepted: 08/17/2021] [Indexed: 12/16/2022] Open
Abstract
With the advantages that long-read sequencing platforms such as Pacific Biosciences (Menlo Park, CA, USA) (PacBio) and Oxford Nanopore Technologies (Oxford, UK) (ONT) can offer, various research fields such as genomics and transcriptomics can exploit their benefits. Selecting an appropriate sequencing platform is undoubtedly crucial for the success of the research outcome, thus there is a need to compare these long-read sequencing platforms and evaluate them for specific research questions. This study aims to compare the performance of PacBio and ONT platforms for transcriptomic analysis by utilizing transcriptome data from three different tissues (hepatopancreas, intestine, and gonads) of the juvenile black tiger shrimp, Penaeus monodon. We compared three important features: (i) main characteristics of the sequencing libraries and their alignment with the reference genome, (ii) transcript assembly features and isoform identification, and (iii) correlation of the quantification of gene expression levels for both platforms. Our analyses suggest that read-length bias and differences in sequencing throughput are highly influential factors when using long reads in transcriptome studies. These comparisons can provide a guideline when designing a transcriptome study utilizing these two long-read sequencing technologies.
Collapse
|
10
|
Ramberg S, Høyheim B, Østbye TKK, Andreassen R. A de novo Full-Length mRNA Transcriptome Generated From Hybrid-Corrected PacBio Long-Reads Improves the Transcript Annotation and Identifies Thousands of Novel Splice Variants in Atlantic Salmon. Front Genet 2021; 12:656334. [PMID: 33986770 PMCID: PMC8110904 DOI: 10.3389/fgene.2021.656334] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Accepted: 04/01/2021] [Indexed: 12/18/2022] Open
Abstract
Atlantic salmon (Salmo salar) is a major species produced in world aquaculture and an important vertebrate model organism for studying the process of rediploidization following whole genome duplication events (Ss4R, 80 mya). The current Salmo salar transcriptome is largely generated from genome sequence based in silico predictions supported by ESTs and short-read sequencing data. However, recent progress in long-read sequencing technologies now allows for full-length transcript sequencing from single RNA-molecules. This study provides a de novo full-length mRNA transcriptome from liver, head-kidney and gill materials. A pipeline was developed based on Iso-seq sequencing of long-reads on the PacBio platform (HQ reads) followed by error-correction of the HQ reads by short-reads from the Illumina platform. The pipeline successfully processed more than 1.5 million long-reads and more than 900 million short-reads into error-corrected HQ reads. A surprisingly high percentage (32%) represented expressed interspersed repeats, while the remaining were processed into 71 461 full-length mRNAs from 23 071 loci. Each transcript was supported by several single-molecule long-read sequences and at least three short-reads, assuring a high sequence accuracy. On average, each gene was represented by three isoforms. Comparisons to the current Atlantic salmon transcripts in the RefSeq database showed that the long-read transcriptome validated 25% of all known transcripts, while the remaining full-length transcripts were novel isoforms, but few were transcripts from novel genes. A comparison to the current genome assembly indicates that the long-read transcriptome may aid in improving transcript annotation as well as provide long-read linkage information useful for improving the genome assembly. More than 80% of transcripts were assigned GO terms and thousands of transcripts were from genes or splice-variants expressed in an organ-specific manner demonstrating that hybrid error-corrected long-read transcriptomes may be applied to study genes and splice-variants expressed in certain organs or conditions (e.g., challenge materials). In conclusion, this is the single largest contribution of full-length mRNAs in Atlantic salmon. The results will be of great value to salmon genomics research, and the pipeline outlined may be applied to generate additional de novo transcriptomes in Atlantic Salmon or applied for similar projects in other species.
Collapse
Affiliation(s)
- Sigmund Ramberg
- Department of Life Sciences and Health, Faculty of Health Sciences, OsloMet - Oslo Metropolitan University, Oslo, Norway
| | - Bjørn Høyheim
- Department of Preclinical Sciences and Pathology, Faculty of Veterinary Medicine, Norwegian University of Life Sciences, Ås, Norway
| | | | - Rune Andreassen
- Department of Life Sciences and Health, Faculty of Health Sciences, OsloMet - Oslo Metropolitan University, Oslo, Norway
| |
Collapse
|
11
|
Transcriptome Analysis of Ginkgo biloba L. Leaves across Late Developmental Stages Based on RNA-Seq and Co-Expression Network. FORESTS 2021. [DOI: 10.3390/f12030315] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The final size of plant leaves is strictly controlled by environmental and genetic factors, which coordinate cell expansion and cell cycle activity in space and time; however, the regulatory mechanisms of leaf growth are still poorly understood. Ginkgo biloba is a dioecious species native to China with medicinally and phylogenetically important characteristics, and its fan-shaped leaves are unique in gymnosperms, while the mechanism of G. biloba leaf development remains unclear. In this study we studied the transcriptome of G. biloba leaves at three developmental stages using high-throughput RNA-seq technology. Approximately 4167 differentially expressed genes (DEGs) were obtained, and a total of 12,137 genes were structure optimized together with 732 new genes identified. More than 50 growth-related factors and gene modules were identified based on DEG and Weighted Gene Co-expression Network Analysis. These results could remarkably expand the existing transcriptome resources of G. biloba, and provide references for subsequent analysis of ginkgo leaf development.
Collapse
|
12
|
Huarte HR, Puglia GD, Prjibelski AD, Raccuia SA. Seed Transcriptome Annotation Reveals Enhanced Expression of Genes Related to ROS Homeostasis and Ethylene Metabolism at Alternating Temperatures in Wild Cardoon. PLANTS 2020; 9:plants9091225. [PMID: 32961840 PMCID: PMC7570316 DOI: 10.3390/plants9091225] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Revised: 09/09/2020] [Accepted: 09/13/2020] [Indexed: 12/20/2022]
Abstract
The association among environmental cues, ethylene response, ABA signaling, and reactive oxygen species (ROS) homeostasis in the process of seed dormancy release is nowadays well-established in many species. Alternating temperatures are recognized as one of the main environmental signals determining dormancy release, but their underlying mechanisms are scarcely known. Dry after-ripened wild cardoon achenes germinated poorly at a constant temperature of 20, 15, or 10 °C, whereas germination was stimulated by 80% at alternating temperatures of 20/10 °C. Using an RNA-Seq approach, we identified 23,640 and annotated 14,078 gene transcripts expressed in dry achenes and achenes exposed to constant or alternating temperatures. Transcriptional patterns identified in dry condition included seed reserve and response to dehydration stress genes (i.e., HSPs, peroxidases, and LEAs). At a constant temperature, we observed an upregulation of ABA biosynthesis genes (i.e., NCED9), ABA-responsive genes (i.e., ABI5 and TAP), as well as other genes previously related to physiological dormancy and inhibition of germination. However, the alternating temperatures were associated with the upregulation of ethylene metabolism (i.e., ACO1, 4, and ACS10) and signaling (i.e., EXPs) genes and ROS homeostasis regulators genes (i.e., RBOH and CAT). Accordingly, the ethylene production was twice as high at alternating than at constant temperatures. The presence in the germination medium of ethylene or ROS synthesis and signaling inhibitors reduced significantly, but not completely, germination at 20/10 °C. Conversely, the presence of methyl viologen and salicylhydroxamic acid (SHAM), a peroxidase inhibitor, partially increased germination at constant temperature. Taken together, the present study provides the first insights into the gene expression patterns and physiological response associated with dormancy release at alternating temperatures in wild cardoon (Cynara cardunculus var. sylvestris).
Collapse
Affiliation(s)
- Hector R. Huarte
- CONICET/Faculty of Agricultural Sciences, National University of Lomas de Zamora, 1836 Llavallol, Argentina;
| | - Giuseppe. D. Puglia
- Institute for Agricultural and Forestry Systems in the Mediterranean (ISAFoM), Department of Biology, Agriculture and Food Science (DiSBA), National Research Council (CNR), Via Empedocle, 58, 95128 Catania, Italy;
- Correspondence: ; Tel.: +39-0956139914
| | - Andrey D. Prjibelski
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, 199004 St. Petersburg, Russia;
| | - Salvatore A. Raccuia
- Institute for Agricultural and Forestry Systems in the Mediterranean (ISAFoM), Department of Biology, Agriculture and Food Science (DiSBA), National Research Council (CNR), Via Empedocle, 58, 95128 Catania, Italy;
| |
Collapse
|