1
|
Sun J, Okada M, Tameshige T, Shimizu-Inatsugi R, Akiyama R, Nagano A, Sese J, Shimizu K. A low-coverage 3' RNA-seq to detect homeolog expression in polyploid wheat. NAR Genom Bioinform 2023; 5:lqad067. [PMID: 37448590 PMCID: PMC10336777 DOI: 10.1093/nargab/lqad067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 06/12/2023] [Accepted: 06/26/2023] [Indexed: 07/15/2023] Open
Abstract
Although allopolyploid species are common among natural and crop species, it is not easy to distinguish duplicated genes, known as homeologs, during their genomic analysis. Yet, cost-efficient RNA sequencing (RNA-seq) is to be developed for large-scale transcriptomic studies such as time-series analysis and genome-wide association studies in allopolyploids. In this study, we employed a 3' RNA-seq utilizing 3' untranslated regions (UTRs) containing frequent mutations among homeologous genes, compared to coding sequence. Among the 3' RNA-seq protocols, we examined a low-cost method Lasy-Seq using an allohexaploid bread wheat, Triticum aestivum. HISAT2 showed the best performance for 3' RNA-seq with the least mapping errors and quick computational time. The number of detected homeologs was further improved by extending 1 kb of the 3' UTR annotation. Differentially expressed genes in response to mild cold treatment detected by the 3' RNA-seq were verified with high-coverage conventional RNA-seq, although the latter detected more differentially expressed genes. Finally, downsampling showed that even a 2 million sequencing depth can still detect more than half of expressed homeologs identifiable by the conventional 32 million reads. These data demonstrate that this low-cost 3' RNA-seq facilitates large-scale transcriptomic studies of allohexaploid wheat and indicate the potential application to other allopolyploid species.
Collapse
Affiliation(s)
- Jianqiang Sun
- Research Center for Agricultural Information Technology, National Agriculture and Food Research Organization, 3-1-1 Kannondai, Tsukuba, Ibaraki 305-8517, Japan
| | - Moeko Okada
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
- Kihara Institute for Biological Research, Yokohama City University, 641-12 Maioka, Totsuka-ward, Yokohama, Kanagawa 244-0813, Japan
| | - Toshiaki Tameshige
- Kihara Institute for Biological Research, Yokohama City University, 641-12 Maioka, Totsuka-ward, Yokohama, Kanagawa 244-0813, Japan
- Division of Biological Sciences, Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5, Takayama-cho, Ikoma, Nara 630-0192, Japan
| | - Rie Shimizu-Inatsugi
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
| | - Reiko Akiyama
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
| | - Atsushi J Nagano
- Faculty of Agriculture, Ryukoku University, Yokotani 1-5, Seta Ohe-cho, Otsu, Shiga 520-2194, Japan
- Institute for Advanced Biosciences, Keio University, 403-1 Nipponkoku, Daihouji, Tsuruoka, Yamagata 997-0017, Japan
| | - Jun Sese
- Humanome Lab, Inc., 2-4-10, Tsukiji, Chuo-ku, Tokyo 104-0045, Japan
| | | |
Collapse
|
2
|
Wang M, Ji Y, Feng S, Liu C, Xiao Z, Wang X, Wang Y, Xia G. The non-random patterns of genetic variation induced by asymmetric somatic hybridization in wheat. BMC PLANT BIOLOGY 2018; 18:244. [PMID: 30332989 PMCID: PMC6192298 DOI: 10.1186/s12870-018-1474-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Accepted: 10/05/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND Asymmetric somatic hybridization is an efficient crop breeding approach by introducing several exogenous chromatin fragments, which leads to genomic shock and therefore induces genome-wide genetic variation. However, the fundamental question concerning the genetic variation such as whether it occurs randomly and suffers from selection pressure remains unknown. RESULTS Here, we explored this issue by comparing expressed sequence tags of a common wheat cultivar and its asymmetric somatic hybrid line. Both nucleotide substitutions and indels (insertions and deletions) had lower frequencies in coding sequences than in un-translated regions. The frequencies of nucleotide substitutions and indels were both comparable between chromosomes with and without introgressed fragments. Nucleotide substitutions distributed unevenly and were preferential to indel-flanking sequences, and the frequency of nucleotide substitutions at 5'-flanking sequences of indels was obviously higher in chromosomes with introgressed fragments than in those without exogenous fragment. Nucleotide substitutions and indels both had various frequencies among seven groups of allelic chromosomes, and the frequencies of nucleotide substitutions were strongly negatively correlative to those of indels. Among three sets of genomes, the frequencies of nucleotide substitutions and indels were both heterogeneous, and the frequencies of nucleotide substitutions exhibited drastically positive correlation to those of indels. CONCLUSIONS Our work demonstrates that the genetic variation induced by asymmetric somatic hybridization is attributed to both whole genomic shock and local chromosomal shock, which is a predetermined and non-random genetic event being closely associated with selection pressure. Asymmetric somatic hybrids provide a worthwhile model to further investigate the nature of genomic shock induced genetic variation.
Collapse
Affiliation(s)
- Mengcheng Wang
- The Key Laboratory of Plant Cell Engineering and Germplasm Innovation, Ministry of Education, School of Life Science, Shandong University, 27 Shandanan Road, Jinan, Shandong 250100 People’s Republic of China
| | - Yujie Ji
- College of Veterinary Medicine, Nanjing Agricultural University, Nanjing, 210095 China
| | - Shiting Feng
- The Key Laboratory of Plant Cell Engineering and Germplasm Innovation, Ministry of Education, School of Life Science, Shandong University, 27 Shandanan Road, Jinan, Shandong 250100 People’s Republic of China
| | - Chun Liu
- The Key Laboratory of Plant Cell Engineering and Germplasm Innovation, Ministry of Education, School of Life Science, Shandong University, 27 Shandanan Road, Jinan, Shandong 250100 People’s Republic of China
| | - Zhen Xiao
- The Key Laboratory of Plant Cell Engineering and Germplasm Innovation, Ministry of Education, School of Life Science, Shandong University, 27 Shandanan Road, Jinan, Shandong 250100 People’s Republic of China
| | - Xiaoping Wang
- The Key Laboratory of Plant Cell Engineering and Germplasm Innovation, Ministry of Education, School of Life Science, Shandong University, 27 Shandanan Road, Jinan, Shandong 250100 People’s Republic of China
| | - Yanxia Wang
- Shijiazhuang Academy of Agriculture and Forestry Sciences, Shijiazhuang, 050041 China
| | - Guangmin Xia
- The Key Laboratory of Plant Cell Engineering and Germplasm Innovation, Ministry of Education, School of Life Science, Shandong University, 27 Shandanan Road, Jinan, Shandong 250100 People’s Republic of China
| |
Collapse
|
3
|
Wajnberg G, Passetti F. Using high-throughput sequencing transcriptome data for INDEL detection: challenges for cancer drug discovery. Expert Opin Drug Discov 2016; 11:257-68. [PMID: 26787005 DOI: 10.1517/17460441.2016.1143813] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
INTRODUCTION A cancer cell is a mosaic of genomic and epigenomic alterations. Distinct cancer molecular signatures can be observed depending on tumor type or patient genetic background. One type of genomic alteration is the insertion and/or deletion (INDEL) of nucleotides in the DNA sequence, which may vary in length, and may change the encoded protein or modify protein domains. INDELs are associated to a large number of diseases and their detection is done based on low-throughput techniques. However, high-throughput sequencing has also started to be used for detection of novel disease-causing INDELs. This search may identify novel drug targets. AREAS COVERED This review presents examples of using high-throughput sequencing (DNA-Seq and RNA-Seq) to investigate the incidence of INDELs in coding regions of human genes. Some of these examples successfully utilized RNA-Seq to identify INDELs associated to diseases. In addition, other studies have described small INDELs related to chemo-resistance or poor outcome of patients, while structural variants were associated with a better clinical outcome. EXPERT OPINION On average, there is twice as much RNA-Seq data available at the most used repositories for such data compared to DNA-Seq. Therefore, using RNA-Seq data is a promising strategy for studying cancer samples with unknown mechanisms of drug resistance, aiming at the discovery of proteins with potential as novel drug targets.
Collapse
Affiliation(s)
- Gabriel Wajnberg
- a Laboratory of Functional Genomics and Bioinformatics, Oswaldo Cruz Institute , Fundação Oswaldo Cruz (FIOCRUZ) , Rio de Janeiro , RJ , Brazil
| | - Fabio Passetti
- a Laboratory of Functional Genomics and Bioinformatics, Oswaldo Cruz Institute , Fundação Oswaldo Cruz (FIOCRUZ) , Rio de Janeiro , RJ , Brazil
| |
Collapse
|
4
|
Abstract
Alternative mRNA splicing (AS) is a major mechanism for increasing regulatory complexity. A key concept in AS is the distinction between alternatively and constitutively spliced exons (ASEs and CSEs, respectively). ASEs and CSEs have been reported to be differentially regulated, and to have distinct biological properties. However, the recent flood of RNA-sequencing data has obscured the boundary between ASEs and CSEs. Researchers are beginning to question whether ‘authentic CSEs’ do exist, and whether the ASE/CSE distinction is biologically invalid. Here, I examine the influences of increasing transcriptome data on the human ASE/CSE classification and our past understanding of the properties of these two types of exons. Interestingly, although the percentage of human ASEs has increased dramatically in recent years, the overall distinction between ASEs and CSEs remain valid. For example, CSEs are longer, evolve more slowly, and less frequently correspond to intrinsically disordered protein regions than ASEs. In addition, only a relatively small number of human genes have their transcripts composed entirely of ASEs despite the large amount of high-throughput transcriptome information. Therefore, the ‘backbone’ concept of AS, in which CSEs constitute the invariant part and ASEs the flexible part of the transcript, appears to be generally true despite the increasing percentage of ASEs in the human exome.
Collapse
|
5
|
Chen FC. Alternative RNA structure-coupled gene regulations in tumorigenesis. Int J Mol Sci 2014; 16:452-75. [PMID: 25551597 PMCID: PMC4307256 DOI: 10.3390/ijms16010452] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2014] [Accepted: 12/16/2014] [Indexed: 12/11/2022] Open
Abstract
Alternative RNA structures (ARSs), or alternative transcript isoforms, are critical for regulating cellular phenotypes in humans. In addition to generating functionally diverse protein isoforms from a single gene, ARS can alter the sequence contents of 5'/3' untranslated regions (UTRs) and intronic regions, thus also affecting the regulatory effects of these regions. ARS may introduce premature stop codon(s) into a transcript, and render the transcript susceptible to nonsense-mediated decay, which in turn can influence the overall gene expression level. Meanwhile, ARS can regulate the presence/absence of upstream open reading frames and microRNA targeting sites in 5'UTRs and 3'UTRs, respectively, thus affecting translational efficiencies and protein expression levels. Furthermore, since ARS may alter exon-intron structures, it can influence the biogenesis of intronic microRNAs and indirectly affect the expression of the target genes of these microRNAs. The connections between ARS and multiple regulatory mechanisms underline the importance of ARS in determining cell fate. Accumulating evidence indicates that ARS-coupled regulations play important roles in tumorigenesis. Here I will review our current knowledge in this field, and discuss potential future directions.
Collapse
Affiliation(s)
- Feng-Chi Chen
- Institute of Population Health Sciences, National Health Research Institutes, Miaoli County 350, Taiwan.
| |
Collapse
|
6
|
Emerging evidence for functional peptides encoded by short open reading frames. Nat Rev Genet 2014; 15:193-204. [PMID: 24514441 DOI: 10.1038/nrg3520] [Citation(s) in RCA: 402] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Short open reading frames (sORFs) are a common feature of all genomes, but their coding potential has mostly been disregarded, partly because of the difficulty in determining whether these sequences are translated. Recent innovations in computing, proteomics and high-throughput analyses of translation start sites have begun to address this challenge and have identified hundreds of putative coding sORFs. The translation of some of these has been confirmed, although the contribution of their peptide products to cellular functions remains largely unknown. This Review examines this hitherto overlooked component of the proteome and considers potential roles for sORF-encoded peptides.
Collapse
|
7
|
Skarshewski A, Stanton-Cook M, Huber T, Al Mansoori S, Smith R, Beatson SA, Rothnagel JA. uPEPperoni: an online tool for upstream open reading frame location and analysis of transcript conservation. BMC Bioinformatics 2014; 15:36. [PMID: 24484385 PMCID: PMC3914846 DOI: 10.1186/1471-2105-15-36] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2013] [Accepted: 01/11/2014] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Several small open reading frames located within the 5' untranslated regions of mRNAs have recently been shown to be translated. In humans, about 50% of mRNAs contain at least one upstream open reading frame representing a large resource of coding potential. We propose that some upstream open reading frames encode peptides that are functional and contribute to proteome complexity in humans and other organisms. We use the term uPEPs to describe peptides encoded by upstream open reading frames. RESULTS We have developed an online tool, termed uPEPperoni, to facilitate the identification of putative bioactive peptides. uPEPperoni detects conserved upstream open reading frames in eukaryotic transcripts by comparing query nucleotide sequences against mRNA sequences within the NCBI RefSeq database. The algorithm first locates the main coding sequence and then searches for open reading frames 5' to the main start codon which are subsequently analysed for conservation. uPEPperoni also determines the substitution frequency for both the upstream open reading frames and the main coding sequence. In addition, the uPEPperoni tool produces sequence identity heatmaps which allow rapid visual inspection of conserved regions in paired mRNAs. CONCLUSIONS uPEPperoni features user-nominated settings including, nucleotide match/mismatch, gap penalties, Ka/Ks ratios and output mode. The heatmap output shows levels of identity between any two sequences and provides easy recognition of conserved regions. Furthermore, this web tool allows comparison of evolutionary pressures acting on the upstream open reading frame against other regions of the mRNA. Additionally, the heatmap web applet can also be used to visualise the degree of conservation in any pair of sequences. uPEPperoni is freely available on an interactive web server at http://upep-scmb.biosci.uq.edu.au.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Joseph A Rothnagel
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD 4072, Australia.
| |
Collapse
|
8
|
James D, Varga A, Jesperson GD, Navratil M, Safarova D, Constable F, Horner M, Eastwell K, Jelkmann W. Identification and complete genome analysis of a virus variant or putative new foveavirus associated with apple green crinkle disease. Arch Virol 2013; 158:1877-87. [PMID: 23553453 DOI: 10.1007/s00705-013-1678-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2013] [Accepted: 02/13/2013] [Indexed: 11/30/2022]
Abstract
A virus identified as "apple green crinkle associated virus" (AGCaV) was isolated from Aurora Golden Gala apple showing severe symptoms of green crinkle disease. Evidence was obtained of a potential causal relationship to the disease. The viral genome consists of 9266 nucleotides, excluding the poly(A) tail at the 3'-terminus. It has a genome organization similar to that of members of the species Apple stem pitting virus (ASPV), the type species of the genus Foveavirus, family Betaflexiviridae. ORF1 of AGCaV encodes a replicase-complex polyprotein with a molecular mass of 247 kDa; the proteins of ORFs 2, 3, and 4 (TGB proteins) are estimated to be 25.1 kDa, 12.8 kDa, and 7.4 kDa, respectively; and ORF5 encodes the CP, with an estimated molecular mass of 43.3 kDa. Interestingly, AGCaV utilizes different stop codons for ORF1, ORF3, and ORF5 compared to the ASPV type isolate PA66, and between the two viruses, six distinct indel events were observed within ORF5. AGCaV has four non-coding regions (NCRs), including a 5'-NCR (60 nt), a 3'-NCR (134 nt), and two intergenic (IG) NCRs: IG-NCR1 (69 nt) and IG-NCR2 (91 nt). A conserved stable hairpin structure was identified in the variable 5'-NCR of members of the genus Foveavirus. AGCaV may be a variant or strain of ASPV with unique biological properties, but there is evidence that it may be a distinct putative foveavirus.
Collapse
Affiliation(s)
- D James
- Centre for Plant Health-Sidney Laboratory, Canadian Food Inspection Agency, 8801 East Saanich Road, Sidney, BC, V8L 1H3, Canada.
| | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Hsu MK, Chen FC. Selective constraint on the upstream open reading frames that overlap with coding sequences in animals. PLoS One 2012; 7:e48413. [PMID: 23133632 PMCID: PMC3486843 DOI: 10.1371/journal.pone.0048413] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2012] [Accepted: 09/24/2012] [Indexed: 11/18/2022] Open
Abstract
Upstream open reading frames (uORFs) are translational regulatory elements located in 5′ untranslated regions. They can significantly repress the translation of the downstream coding sequences (CDS), and participate in the spatio-temporal regulations of protein translation. Notwithstanding this biological significance, the selective constraint on uORFs remains underexplored. Particularly, the uORFs that partially overlap with CDS with a different reading frame (overlapping uORFs, or “VuORFs”) may lead to strong translational inhibition or N-terminal truncation of the peptides encoded by the affected CDS. By analyzing VuORF-containing transcripts (designated as “VuORF transcripts”) in human, mouse, and fruit fly, we demonstrate that VuORFs are in general slightly deleterious - the proportion of genes that encode at least one VuORF transcript is significantly smaller than expected in all of the three examined species. In addition, this proportion is significantly smaller in fruit fly than in mammals, indicating a higher efficiency of removing VuORFs in the former species because of its larger effective population size. Furthermore, the deleterious effect of a VuORF depends on the sequence context of its start codon (VuAUG). VuORFs with an optimal VuAUG context are more strongly disfavored than those with a suboptimal context in all of the three examined species. And the propensity to remove optimal-context VuAUGs is stronger in fruit fly than in mammals. Intriguingly, however, the currently observable optimal-context VuAUGs (but not suboptimal-context VuAUGs) are more conserved than expected. These observations suggest that the regulatory functions of VuORFs may have been gained fortuitously in organisms with a small effective population size because the slightly deleterious effect of these elements can be better tolerated in these organisms, thus allowing opportunities for the development of novel biological functions. Nevertheless, once the functions of VuORFs were established, they became subject to negative selection.
Collapse
Affiliation(s)
- Ming-Kung Hsu
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan
| | - Feng-Chi Chen
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan
- Department of Life Sciences, National Chiao-Tung University, Hsinchu, Taiwan
- Department of Dentistry, China Medical University, Taichung, Taiwan
- * E-mail:
| |
Collapse
|
10
|
Chen CH, Lin HY, Pan CL, Chen FC. The genomic features that affect the lengths of 5' untranslated regions in multicellular eukaryotes. BMC Bioinformatics 2011; 12 Suppl 9:S3. [PMID: 22152105 PMCID: PMC3283318 DOI: 10.1186/1471-2105-12-s9-s3] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background The lengths of 5’UTRs of multicellular eukaryotes have been suggested to be subject to stochastic changes, with upstream start codons (uAUGs) as the major constraint to suppress 5’UTR elongation. However, this stochastic model cannot fully explain the variations in 5’UTR length. We hypothesize that the selection pressure on a combination of genomic features is also important for 5’UTR evolution. The ignorance of these features may have limited the explanatory power of the stochastic model. Furthermore, different selective constraints between vertebrates and invertebrates may lead to differences in the determinants of 5’UTR length, which have not been systematically analyzed. Methods Here we use a multiple linear regression model to delineate the correlation between 5’UTR length and the combination of a series of genomic features (G+C content, observed-to-expected (OE) ratios of uAUGs, upstream stop codons (uSTOPs), methylation-related CG/UG dinucleotides, and mRNA-destabilizing UU/UA dinucleotides) in six vertebrates (human, mouse, rat, chicken, African clawed frog, and zebrafish) and four invertebrates (fruit fly, mosquito, sea squirt, and nematode). The relative contributions of each feature to the variation of 5’UTR length were also evaluated. Results We found that 14%~33% of the 5’UTR length variations can be explained by a linear combination of the analyzed genomic features. The most important genomic features are the OE ratios of uSTOPs and G+C content. The surprisingly large weightings of uSTOPs highlight the importance of selection on upstream open reading frames (which include both uAUGs and uSTOPs), rather than on uAUGs per se. Furthermore, G+C content is the most important determinants for most invertebrates, but for vertebrates its effect is second to uSTOPs. We also found that shorter 5’UTRs are affected more by the stochastic process, whereas longer 5’UTRs are affected more by selection pressure on genomic features. Conclusions Our results suggest that upstream open reading frames may be the real target of selection, rather than uAUGs. We also show that the selective constraints on genomic features of 5’UTRs differ between vertebrates and invertebrates, and between longer and shorter 5’UTRs. A more comprehensive model that takes these findings into consideration is needed to better explain 5’UTR length evolution.
Collapse
Affiliation(s)
- Chun-Hsi Chen
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, 350 Taiwan, Republic of China
| | | | | | | |
Collapse
|