51
|
Baruah C, Nath P, Barah P. LncRNAs in neuropsychiatric disorders and computational insights for their prediction. Mol Biol Rep 2022; 49:11515-11534. [PMID: 36097122 DOI: 10.1007/s11033-022-07819-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 07/20/2022] [Accepted: 07/24/2022] [Indexed: 12/06/2022]
Abstract
Long non-coding RNAs (lncRNAs) are 200 nucleotide extended transcripts that do not encode proteins or possess limited coding ability. LncRNAs epigenetically control several biological functions such as gene regulation, transcription, mRNA splicing, protein interaction, and genomic imprinting. Over the years, drastic progress in understanding the role of lncRNAs in diverse biological processes has been made. LncRNAs are reported to show tissue-specific expression patterns suggesting their potential as novel candidate biomarkers for diseases. Among all other non-coding RNAs, lncRNAs are highly expressed within the brain-enriched or brain-specific regions of the neural tissues. They are abundantly expressed in the neocortex and pre-mature frontal regions of the brain. LncRNAs are co-expressed with the protein-coding genes and have a significant role in the evolution of functions of the brain. Any deregulation in the lncRNAs contributes to disruptions in normal brain functions resulting in multiple neurological disorders. Neuropsychiatric disorders such as schizophrenia, bipolar disease, autism spectrum disorders, and anxiety are associated with the abnormal expression and regulation of lncRNAs. This review aims to highlight the understanding of lncRNAs concerning normal brain functions and their deregulation associated with neuropsychiatric disorders. We have also provided a survey on the available computational tools for the prediction of lncRNAs, their protein coding potentials, and sub-cellular locations, along with a section on existing online databases with known lncRNAs, and their interactions with other molecules.
Collapse
Affiliation(s)
- Cinmoyee Baruah
- Department of Molecular Biology and Biotechnology, Tezpur University, 784028, Napaam, Sonitpur, Assam, India
| | - Prangan Nath
- Department of Molecular Biology and Biotechnology, Tezpur University, 784028, Napaam, Sonitpur, Assam, India
| | - Pankaj Barah
- Department of Molecular Biology and Biotechnology, Tezpur University, 784028, Napaam, Sonitpur, Assam, India.
| |
Collapse
|
52
|
He F, Guo Q, Jiang GX, Zhou Y. Comprehensive analysis of m6A circRNAs identified in colorectal cancer by MeRIP sequencing. Front Oncol 2022; 12:927810. [PMID: 36059637 PMCID: PMC9437624 DOI: 10.3389/fonc.2022.927810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 08/01/2022] [Indexed: 12/24/2022] Open
Abstract
PurposeTo characterize the entire profile of m6A modifications and differential expression patterns for circRNAs in colorectal cancer (CRC).MethodsFirst, High-throughput MeRIP-sequencing and RNA-sequencing was used to determine the difference in m6A methylome and expression of circRNA between CRC tissues and tumor-adjacent normal control (NC) tissues. Then, GO and KEGG analysis detected pathways involved in differentially methylated and differentially expressed circRNAs (DEGs). The correlations between m6A status and expression level were calculated using a Pearson correlation analysis. Next, the networks of circRNA-miRNA-mRNA were visualized using the Target Scan and miRanda software. Finally, We describe the relationship of distance between the m6A peak and internal ribosome entry site (IRES) and protein coding potential of circRNAs.ResultsA total of 4340 m6A peaks of circRNAs in CRC tissue and 3216 m6A peaks of circRNAs in NC tissues were detected. A total of 2561 m6A circRNAs in CRC tissues and 2129 m6A circRNAs in NC tissues were detected. Pathway analysis detected that differentially methylated and expressed circRNAs were closely related to cancer. The conjoint analysis of MeRIP-seq and RNA-seq data discovered 30 circRNAs with differentially m6A methylated and synchronously differential expression. RT-qPCR showned circRNAs (has_circ_0032821, has_circ_0019079, has_circ_0093688) were upregulated and circRNAs (hsa_circ_0026782, hsa_circ_0108457) were downregulated in CRC. In the ceRNA network, the 10 hyper-up circRNAs were shown to be associated with 19 miRNAs and regulate 16 mRNAs, 14 hypo-down circRNAs were associated with 30 miRNAs and regulated 27 mRNAs. There was no significant correlation between the level of m6A and the expression of circRNAs. The distance between the m6A peak and IRES was not significantly related to the protein coding potential of circRNAs.ConclusionOur study found that there were significant differences in the m6A methylation patterns of circRNAs between CRC and NC tissues. M6A methylation may affect circRNA-miRNA-mRNA co-expression in CRC and further affect the regulation of cancer-related target genes.
Collapse
Affiliation(s)
- Feng He
- The First Affiliated Hospital of Chengdu Medical College, School of Clinical Medicine, Chengdu Medical College, Chengdu, China
| | - Qin Guo
- The First Affiliated Hospital of Chengdu Medical College, School of Clinical Medicine, Chengdu Medical College, Chengdu, China
| | - Guo-xiu Jiang
- The First Affiliated Hospital of Chengdu Medical College, School of Clinical Medicine, Chengdu Medical College, Chengdu, China
| | - Yan Zhou
- National Health Commission (NHC), Key Laboratory of Nuclear Technology Medical Transformation, Mianyang Central Hospital, School of Medicine, University of Electronic Science and Technology of China, Mianyang, China
- *Correspondence: Yan Zhou,
| |
Collapse
|
53
|
Xiao C, Sun T, Yang Z, Zou L, Deng J, Yang X. Whole transcriptome RNA Sequencing Reveals the Global Molecular Responses and circRNA/lncRNA-miRNA-mRNA ceRNA Regulatory Network in Chicken Fat Deposition. Poult Sci 2022; 101:102121. [PMID: 36116349 PMCID: PMC9485216 DOI: 10.1016/j.psj.2022.102121] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Revised: 03/21/2022] [Accepted: 08/03/2022] [Indexed: 11/29/2022] Open
Abstract
Fat deposition is a vital factor affecting the economics of poultry production. Numerous studies on fat deposition have been done. However, the molecular regulatory mechanism is still unclear. In the present study, the whole-transcriptome RNA sequencing in abdominal fat, back skin, and liver both high- and low-abdominal fat groups was used to uncover the competitive endogenous RNA (ceRNA) regulation network related to chicken fat deposition. The results showed that differentially expressed (DE) genes in abdominal fat, back skin, liver were 1207(784 mRNAs, 330 lncRNAs, 41 circRNAs, 52 miRNAs), 860 (607 mRNAs, 166 lncRNAs, 26 circRNAs, 61 miRNAs), and 923 (501 mRNAs, 262 lncRNAs, 15 circRNAs, 145 miRNAs), respectively. The ceRNA regulatory network analysis indicated that the fatty acid metabolic process, monocarboxylic acid metabolic process, carboxylic acid metabolic process, glycerolipid metabolism, fatty acid metabolism, and peroxisome proliferator-activated receptor (PPAR) signaling pathway took part in chicken fat deposition. Meanwhile, we scan the important genes, FADS2, HSD17B12, ELOVL5, AKR1E2, DGKQ, GPAM, PLIN2, which were regulated by gga-miR-460b-5p, gga-miR-199-5p, gga-miR-7470-3p, gga-miR-6595-5p, gga-miR-101-2-5p. While these miRNAs were competitive combined by lncRNAs including MSTRG.18043, MSTRG.7738, MSTRG.21310, MSTRG.19577, and circRNAs including novel_circ_PTPN2, novel_circ_CTNNA1, novel_circ_PTPRD. This finding provides new insights into the regulatory mechanism of mRNA, miRNA, lncRNA, and circRNA in chicken fat deposition.
Collapse
Affiliation(s)
- Cong Xiao
- College of Animal Science and Technology, Guangxi University, Nanning 530004, China
| | - Tiantian Sun
- College of Animal Science and Technology, Guangxi University, Nanning 530004, China
| | - Zhuliang Yang
- College of Animal Science and Technology, Guangxi University, Nanning 530004, China
| | - Leqin Zou
- College of Animal Science and Technology, Guangxi University, Nanning 530004, China
| | - Jixian Deng
- College of Animal Science and Technology, Guangxi University, Nanning 530004, China
| | - Xiurong Yang
- College of Animal Science and Technology, Guangxi University, Nanning 530004, China.
| |
Collapse
|
54
|
A survey of transcriptome complexity using full-length isoform sequencing in the tea plant Camellia sinensis. Mol Genet Genomics 2022; 297:1243-1255. [PMID: 35763065 DOI: 10.1007/s00438-022-01913-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Accepted: 05/29/2022] [Indexed: 10/17/2022]
Abstract
Tea is one of the most popular beverages and its leaves are rich in catechins, contributing to the diverse flavor as well as beneficial for human health. However, the study of the post-transcriptional regulatory mechanism affecting the synthesis of catechins remains insufficient. Here, we sequenced the transcriptome using PacBio sequencing technology and obtained 63,111 full-length high-quality isoforms, including 1302 potential novel genes and 583 highly reliable fusion transcripts. We also identified 1204 lncRNAs with high quality, containing 188 known and 1016 novel lncRNAs. In addition, 311 mis-annotated genes were corrected based on the high-quality Isoseq reads. A large number of alternative splicing (AS) events (3784) and alternative polyadenylation (APA) genes (18,714) were analyzed, accounting for 8.84% and 43.7% of the total annotated genes, respectively. We also found that 2884 genes containing AS and APA features exhibited higher expression levels than other genes. These genes are mainly involved in amino acid biosynthesis, carbon fixation in photosynthetic organisms, phenylalanine, tyrosine, tryptophan biosynthesis, and pyruvate metabolism, suggesting that they play an essential role in the catechins content of tea polyphenols. Our results further improved the level of genome annotation and indicated that post-transcriptional regulation plays a crucial part in synthesizing catechins.
Collapse
|
55
|
Integrated SMRT and Illumina Sequencing Provide New Insights into Crocin Biosynthesis of Gardenia jasminoides. Int J Mol Sci 2022; 23:ijms23116321. [PMID: 35683000 PMCID: PMC9181021 DOI: 10.3390/ijms23116321] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 06/04/2022] [Accepted: 06/04/2022] [Indexed: 02/04/2023] Open
Abstract
Crocins are valuable bioactive components of gardenia fruit, and their biosynthesis and accumulation have attracted widespread interest. Studies have investigated the biosynthesis and accumulation of crocin based on Illumina sequencing, but there is a lack of reports based on full-length transcriptome sequencing. Utilising SMRT sequencing and high-performance liquid chromatography (HPLC), we explored crocin biosynthesis and accumulation in the fruit of Gardenia jasminoides. HPLC analysis showed that crocins specifically exist in fruit and that the content of crocins increases gradually during fruit development. SMRT sequencing generated 46,715 high-quality full-length isoforms, including 5230 novel isoforms that are not present in the G. jasminoides genome. Furthermore, a total of 46 genes and 91 lncRNAs were involved in the biosynthesis and accumulation of crocin. The qRT-PCR indicated that genes involved in crocin biosynthesis reached a peak in the NOV stage. These findings contributed to our understanding of crocin biosynthesis and accumulation.
Collapse
|
56
|
Žarković M, Hufsky F, Markert UR, Marz M. The Role of Non-Coding RNAs in the Human Placenta. Cells 2022; 11:1588. [PMID: 35563893 PMCID: PMC9104507 DOI: 10.3390/cells11091588] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 05/01/2022] [Accepted: 05/03/2022] [Indexed: 12/11/2022] Open
Abstract
Non-coding RNAs (ncRNAs) play a central and regulatory role in almost all cells, organs, and species, which has been broadly recognized since the human ENCODE project and several other genome projects. Nevertheless, a small fraction of ncRNAs have been identified, and in the placenta they have been investigated very marginally. To date, most examples of ncRNAs which have been identified to be specific for fetal tissues, including placenta, are members of the group of microRNAs (miRNAs). Due to their quantity, it can be expected that the fairly larger group of other ncRNAs exerts far stronger effects than miRNAs. The syncytiotrophoblast of fetal origin forms the interface between fetus and mother, and releases permanently extracellular vesicles (EVs) into the maternal circulation which contain fetal proteins and RNA, including ncRNA, for communication with neighboring and distant maternal cells. Disorders of ncRNA in placental tissue, especially in trophoblast cells, and in EVs seem to be involved in pregnancy disorders, potentially as a cause or consequence. This review summarizes the current knowledge on placental ncRNA, their transport in EVs, and their involvement and pregnancy pathologies, as well as their potential for novel diagnostic tools.
Collapse
Affiliation(s)
- Milena Žarković
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, Leutragraben 1, 07743 Jena, Germany; (M.Ž.); (F.H.)
- European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany
- Placenta Lab, Department of Obstetrics, University Hospital Jena, Am Klinikum 1, 07747 Jena, Germany;
| | - Franziska Hufsky
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, Leutragraben 1, 07743 Jena, Germany; (M.Ž.); (F.H.)
- European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany
| | - Udo R. Markert
- Placenta Lab, Department of Obstetrics, University Hospital Jena, Am Klinikum 1, 07747 Jena, Germany;
| | - Manja Marz
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, Leutragraben 1, 07743 Jena, Germany; (M.Ž.); (F.H.)
- European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany
- FLI Leibniz Institute for Age Research, Beutenbergstraße 11, 07745 Jena, Germany
- Aging Research Center (ARC), 07745 Jena, Germany
| |
Collapse
|
57
|
Characterization and Distribution of Kisspeptins, Kisspeptin Receptors, GnIH, and GnRH1 in the Brain of the Protogynous Bluehead Wrasse (Thalassoma bifasciatum). J Chem Neuroanat 2022; 121:102087. [DOI: 10.1016/j.jchemneu.2022.102087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 02/14/2022] [Accepted: 03/08/2022] [Indexed: 11/18/2022]
|
58
|
CNCB-NGDC Members and Partners, Xue Y, Bao Y, Zhang Z, Zhao W, Xiao J, He S, Zhang G, Li Y, Zhao G, Chen R, Zeng J, Zhang Y, Shang Y, Mai J, Shi S, Lu M, Bu C, Zhang Z, Du Z, Xiao J, Wang Y, Kang H, Xu T, Hao L, Bao Y, Jia P, Jiang S, Qian Q, Zhu T, Shang Y, Zong W, Jin T, Zhang Y, Zou D, Bao Y, Xiao J, Zhang Z, Jiang S, Du Q, Feng C, Ma L, Zhang S, Wang A, Dong L, Wang Y, Zou D, Zhang Z, Liu W, Yan X, Ling Y, Zhao G, Zhou Z, Zhang G, Kang W, Jin T, Zhang T, Ma S, Yan H, Liu Z, Ji Z, Cai Y, Wang S, Song M, Ren J, Zhou Q, Qu J, Zhang W, Bao Y, Liu G, Chen X, Chen T, Zhang S, Sun Y, Yu C, Tang B, Zhu J, Dong L, Zhai S, Sun Y, Chen Q, Yang X, Zhang X, Sang Z, Wang Y, Zhao Y, Chen H, Lan L, Wang Y, Zhao W, Ma Y, Jia Y, Zheng X, Chen M, Zhang Y, Zou D, Zhu T, Xu T, Chen M, Niu G, et alCNCB-NGDC Members and Partners, Xue Y, Bao Y, Zhang Z, Zhao W, Xiao J, He S, Zhang G, Li Y, Zhao G, Chen R, Zeng J, Zhang Y, Shang Y, Mai J, Shi S, Lu M, Bu C, Zhang Z, Du Z, Xiao J, Wang Y, Kang H, Xu T, Hao L, Bao Y, Jia P, Jiang S, Qian Q, Zhu T, Shang Y, Zong W, Jin T, Zhang Y, Zou D, Bao Y, Xiao J, Zhang Z, Jiang S, Du Q, Feng C, Ma L, Zhang S, Wang A, Dong L, Wang Y, Zou D, Zhang Z, Liu W, Yan X, Ling Y, Zhao G, Zhou Z, Zhang G, Kang W, Jin T, Zhang T, Ma S, Yan H, Liu Z, Ji Z, Cai Y, Wang S, Song M, Ren J, Zhou Q, Qu J, Zhang W, Bao Y, Liu G, Chen X, Chen T, Zhang S, Sun Y, Yu C, Tang B, Zhu J, Dong L, Zhai S, Sun Y, Chen Q, Yang X, Zhang X, Sang Z, Wang Y, Zhao Y, Chen H, Lan L, Wang Y, Zhao W, Ma Y, Jia Y, Zheng X, Chen M, Zhang Y, Zou D, Zhu T, Xu T, Chen M, Niu G, Zong W, Pan R, Jing W, Sang J, Liu C, Xiong Y, Sun Y, Zhai S, Chen H, Zhao W, Xiao J, Bao Y, Hao L, Zhang M, Wang G, Zou D, Yi L, Zhao W, Zong W, Wu S, Xiong Z, Li R, Zong W, Kang H, Xiong Z, Ma Y, Jin T, Gong Z, Yi L, Zhang M, Wu S, Wang G, Li R, Liu L, Li Z, Liu C, Zou D, Li Q, Feng C, Jing W, Luo S, Ma L, Wang J, Shi Y, Zhou H, Zhang P, Song T, Li Y, He S, Xiong Z, Yang F, Li M, Zhao W, Wang G, Li Z, Ma Y, Zou D, Zong W, Kang H, Jia Y, Zheng X, Li R, Tian D, Liu X, Li C, Teng X, Song S, Liu L, Zhang Y, Niu G, Li Q, Li Z, Zhu T, Feng C, Liu X, Zhang Y, Xu T, Chen R, Teng X, Zhang R, Zou D, Ma L, Xu F, Wang Y, Ling Y, Zhou C, Wang H, Teschendorff AE, He Y, Zhang G, Yang Z, Song S, Ma L, Zou D, Tian D, Li C, Zhu J, Li L, Li N, Gong Z, Chen M, Wang A, Ma Y, Teng X, Cui Y, Duan G, Zhang M, Jin T, Wu G, Huang T, Jin E, Zhao W, Kang H, Wang Z, Du Z, Zhang Y, Li R, Zeng J, Hao L, Jiang S, Chen H, Li M, Xiao J, Zhang Z, Zhao W, Xue Y, Bao Y, Ning W, Xue Y, Tang B, Liu Y, Sun Y, Duan G, Cui Y, Zhou Q, Dong L, Jin E, Liu X, Zhang L, Mao B, Zhang S, Zhang Y, Wang G, Zhao W, Wang Z, Zhu Q, Li X, Zhu J, Tian D, Kang H, Li C, Zhang S, Song S, Li M, Zhao W, Liu Y, Wang Z, Luo H, Zhu J, Wu X, Tian D, Li C, Zhao W, Jing H, Zhu J, Tang B, Zou D, Liu L, Pan Y, Liu C, Chen M, Liu X, Zhang Y, Li Z, Feng C, Du Q, Chen R, Zhu T, Ma L, Zou D, Jiang S, Zhang Z, Gong Z, Zhu J, Li C, Jiang S, Ma L, Tang B, Zou D, Chen M, Sun Y, Shi L, Song S, Zhang Z, Li M, Xiao J, Xue Y, Bao Y, Du Z, Zhao W, Li Z, Du Q, Jiang S, Ma L, Zhang Z, Xiong Z, Li M, Zou D, Zong W, Li R, Chen M, Du Z, Zhao W, Bao Y, Ma Y, Zhang X, Lan L, Xue Y, Bao Y, Jiang S, Feng C, Zhao W, Xiao J, Bao Y, Zhang Z, Zuo Z, Ren J, Zhang X, Xiao Y, Li X, Zhang X, Xiao Y, Li X, Liu D, Zhang C, Xue Y, Zhao Z, Jiang T, Wu W, Zhao F, Meng X, Chen M, Peng D, Xue Y, Luo H, Gao F, Ning W, Xue Y, Lin S, Xue Y, Liu C, Guo A, Yuan H, Su T, Zhang YE, Zhou Y, Chen M, Guo G, Fu S, Tan X, Xue Y, Zhang W, Xue Y, Luo M, Guo A, Xie Y, Ren J, Zhou Y, Chen M, Guo G, Wang C, Xue Y, Liao X, Gao X, Wang J, Xie G, Guo A, Yuan C, Chen M, Tian F, Yang D, Gao G, Tang D, Xue Y, Wu W, Chen M, Gou Y, Han C, Xue Y, Cui Q, Li X, Li CY, Luo X, Ren J, Zhang X, Xiao Y, Li X. Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2022. Nucleic Acids Res 2022; 50:D27-D38. [PMID: 34718731 PMCID: PMC8728233 DOI: 10.1093/nar/gkab951] [Show More Authors] [Citation(s) in RCA: 537] [Impact Index Per Article: 179.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 09/29/2021] [Accepted: 10/08/2021] [Indexed: 12/21/2022] Open
Abstract
The National Genomics Data Center (NGDC), part of the China National Center for Bioinformation (CNCB), provides a family of database resources to support global research in both academia and industry. With the explosively accumulated multi-omics data at ever-faster rates, CNCB-NGDC is constantly scaling up and updating its core database resources through big data archive, curation, integration and analysis. In the past year, efforts have been made to synthesize the growing data and knowledge, particularly in single-cell omics and precision medicine research, and a series of resources have been newly developed, updated and enhanced. Moreover, CNCB-NGDC has continued to daily update SARS-CoV-2 genome sequences, variants, haplotypes and literature. Particularly, OpenLB, an open library of bioscience, has been established by providing easy and open access to a substantial number of abstract texts from PubMed, bioRxiv and medRxiv. In addition, Database Commons is significantly updated by cataloguing a full list of global databases, and BLAST tools are newly deployed to provide online sequence search services. All these resources along with their services are publicly accessible at https://ngdc.cncb.ac.cn.
Collapse
|
59
|
Jiang S, Du Q, Feng C, Ma L, Zhang Z. CompoDynamics: a comprehensive database for characterizing sequence composition dynamics. Nucleic Acids Res 2022; 50:D962-D969. [PMID: 34718745 PMCID: PMC8728180 DOI: 10.1093/nar/gkab979] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Revised: 10/02/2021] [Accepted: 10/06/2021] [Indexed: 11/15/2022] Open
Abstract
Sequence compositions of nucleic acids and proteins have significant impact on gene expression, RNA stability, translation efficiency, RNA/protein structure and molecular function, and are associated with genome evolution and adaptation across all kingdoms of life. Therefore, a devoted resource of sequence compositions and associated features is fundamentally crucial for a wide range of biological research. Here, we present CompoDynamics (https://ngdc.cncb.ac.cn/compodynamics/), a comprehensive database of sequence compositions of coding sequences (CDSs) and genomes for all kinds of species. Taking advantage of the exponential growth of RefSeq data, CompoDynamics presents a wealth of sequence compositions (nucleotide content, codon usage, amino acid usage) and derived features (coding potential, physicochemical property and phase separation) for 118 689 747 high-quality CDSs and 34 562 genomes across 24 995 species. Additionally, interactive analytical tools are provided to enable comparative analyses of sequence compositions and molecular features across different species and gene groups. Collectively, CompoDynamics bears the great potential to better understand the underlying roles of sequence composition dynamics across genes and genomes, providing a fundamental resource in support of a broad spectrum of biological studies.
Collapse
Affiliation(s)
- Shuai Jiang
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
| | - Qiang Du
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Changrui Feng
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lina Ma
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhang Zhang
- National Genomics Data Center & CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
- China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
60
|
Zhang Z, Guo J, Cai X, Li Y, Xi X, Lin R, Liang J, Wang X, Wu J. Improved Reference Genome Annotation of Brassica rapa by Pacific Biosciences RNA Sequencing. FRONTIERS IN PLANT SCIENCE 2022; 13:841618. [PMID: 35371168 PMCID: PMC8968949 DOI: 10.3389/fpls.2022.841618] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 02/17/2022] [Indexed: 05/05/2023]
Abstract
The species Brassica rapa includes several important vegetable crops. The draft reference genome of B. rapa ssp. pekinensis was completed in 2011, and it has since been updated twice. The pangenome with structural variations of 18 B. rapa accessions was published in 2021. Although extensive genomic analysis has been conducted on B. rapa, a comprehensive genome annotation including gene structure, alternative splicing (AS) events, and non-coding genes is still lacking. Therefore, we used the Pacific Biosciences (PacBio) single-molecular long-read technology to improve gene models and produced the annotated genome version 3.5. In total, we obtained 753,041 full-length non-chimeric (FLNC) reads and collapsed these into 92,810 non-redundant consensus isoforms, capturing 48% of the genes annotated in the B. rapa reference genome annotation v3.1. Based on the isoform data, we identified 830 novel protein-coding genes that were missed in previous genome annotations, defined the untranslated regions (UTRs) of 20,340 annotated genes and corrected 886 wrongly spliced genes. We also identified 28,564 AS events and 1,480 long non-coding RNAs (lncRNAs). We produced a relatively complete and high-quality reference transcriptome for B. rapa that can facilitate further functional genomic research.
Collapse
|
61
|
Klapproth C, Sen R, Stadler PF, Findeiß S, Fallmann J. Common Features in lncRNA Annotation and Classification: A Survey. Noncoding RNA 2021; 7:77. [PMID: 34940758 PMCID: PMC8708962 DOI: 10.3390/ncrna7040077] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 12/03/2021] [Accepted: 12/06/2021] [Indexed: 12/29/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) are widely recognized as important regulators of gene expression. Their molecular functions range from miRNA sponging to chromatin-associated mechanisms, leading to effects in disease progression and establishing them as diagnostic and therapeutic targets. Still, only a few representatives of this diverse class of RNAs are well studied, while the vast majority is poorly described beyond the existence of their transcripts. In this review we survey common in silico approaches for lncRNA annotation. We focus on the well-established sets of features used for classification and discuss their specific advantages and weaknesses. While the available tools perform very well for the task of distinguishing coding sequence from other RNAs, we find that current methods are not well suited to distinguish lncRNAs or parts thereof from other non-protein-coding input sequences. We conclude that the distinction of lncRNAs from intronic sequences and untranslated regions of coding mRNAs remains a pressing research gap.
Collapse
Affiliation(s)
- Christopher Klapproth
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany; (C.K.); (P.F.S.); (S.F.)
| | - Rituparno Sen
- Helmholtz Institute for RNA-Based Infection Research (HIRI), Helmholtz-Center for Infection Research (HZI), D-97080 Würzburg, Germany;
| | - Peter F. Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany; (C.K.); (P.F.S.); (S.F.)
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Competence Center for Scalable Data Services and Solutions, and Leipzig Research Center for Civilization Diseases, University Leipzig, D-04103 Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Vienna, Austria
- Facultad de Ciencias, Universidad National de Colombia, Bogotá CO-111321, Colombia
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA
| | - Sven Findeiß
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany; (C.K.); (P.F.S.); (S.F.)
| | - Jörg Fallmann
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany; (C.K.); (P.F.S.); (S.F.)
| |
Collapse
|
62
|
Ji Z, Tang T, Chen M, Dong B, Sun W, Wu N, Chen H, Feng Q, Yang X, Jin R, Jiang L. C-Myc-activated long non-coding RNA LINC01050 promotes gastric cancer growth and metastasis by sponging miR-7161-3p to regulate SPZ1 expression. J Exp Clin Cancer Res 2021; 40:351. [PMID: 34749766 PMCID: PMC8573944 DOI: 10.1186/s13046-021-02155-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Accepted: 10/25/2021] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Growing evidence shows that long non-coding RNAs (lncRNAs) play significant roles in cancer development. However, the functions of most lncRNAs in human gastric cancer are still not fully understood. Here, we explored the role of a novel c-Myc-activated lncRNA, LINC01050, in gastric cancer progression. METHODS The expression of LINC01050 in the context of gastric cancer was assessed using The Cancer Genome Atlas datasets. Its functions in gastric cancer were investigated through gain- and loss-of-function experiments combined with the Cell Counting Kit-8 assays, colony-forming assays, Transwell assays, flow cytometry, Western blot analyses, and xenograft tumor and mouse metastasis models. Potential LINC01050 transcription activators were screened via bioinformatics and validated by chromatin immunoprecipitation and luciferase assays. The interaction between LINC01050 and miR-7161-3p and the targets of miR-7161-3p were predicted by bioinformatics analysis and confirmed by a luciferase assay, RNA immunoprecipitation, RNA pull-down, and rescue experiments. RESULTS LINC01050 was significantly up-regulated in gastric cancer, and its high expression was positively correlated with a poor prognosis. The transcription factor c-Myc was found to directly bind to the LINC01050 promoter region and activate its transcription. Furthermore, overexpression of LINC01050 was confirmed to promote gastric cancer cell proliferation, migration, invasion, and epithelial-mesenchymal transition in vitro and tumor growth in vivo. At the same time, its knockdown inhibited gastric cancer cell proliferation, migration, invasion, and epithelial-mesenchymal transition in vitro along with tumor growth and metastasis in vivo. Moreover, mechanistic investigations revealed that LINC01050 functions as a molecular sponge to absorb cytosolic miR-7161-3p, which reduces the miR-7161-3p-mediated translational repression of SPZ1, thus contributing to gastric cancer progression. CONCLUSIONS Taken together, our results identified a novel gastric cancer-associated lncRNA, LINC01050, which is activated by c-Myc. LINC01050 may be considered a potential therapeutic target for gastric cancer.
Collapse
Affiliation(s)
- Ziwei Ji
- Department of Gastroenterology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China
| | - Tianbin Tang
- Central Laboratory, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China
| | - Mengxia Chen
- Department of Gastroenterology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China
| | - Buyuan Dong
- Department of Gastroenterology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China
| | - Wenjing Sun
- Central Laboratory, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China
| | - Nan Wu
- Department of Gastroenterology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China
| | - Hao Chen
- Department of Gastroenterology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China
| | - Qian Feng
- Department of Gastroenterology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China
| | - Xingyi Yang
- Department of Gastroenterology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China
| | - Rong Jin
- Department of Gastroenterology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China.
| | - Lei Jiang
- Central Laboratory, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China.
| |
Collapse
|
63
|
Asim MN, Ibrahim MA, Imran Malik M, Dengel A, Ahmed S. Advances in Computational Methodologies for Classification and Sub-Cellular Locality Prediction of Non-Coding RNAs. Int J Mol Sci 2021; 22:8719. [PMID: 34445436 PMCID: PMC8395733 DOI: 10.3390/ijms22168719] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Revised: 08/02/2021] [Accepted: 08/03/2021] [Indexed: 02/06/2023] Open
Abstract
Apart from protein-coding Ribonucleic acids (RNAs), there exists a variety of non-coding RNAs (ncRNAs) which regulate complex cellular and molecular processes. High-throughput sequencing technologies and bioinformatics approaches have largely promoted the exploration of ncRNAs which revealed their crucial roles in gene regulation, miRNA binding, protein interactions, and splicing. Furthermore, ncRNAs are involved in the development of complicated diseases like cancer. Categorization of ncRNAs is essential to understand the mechanisms of diseases and to develop effective treatments. Sub-cellular localization information of ncRNAs demystifies diverse functionalities of ncRNAs. To date, several computational methodologies have been proposed to precisely identify the class as well as sub-cellular localization patterns of RNAs). This paper discusses different types of ncRNAs, reviews computational approaches proposed in the last 10 years to distinguish coding-RNA from ncRNA, to identify sub-types of ncRNAs such as piwi-associated RNA, micro RNA, long ncRNA, and circular RNA, and to determine sub-cellular localization of distinct ncRNAs and RNAs. Furthermore, it summarizes diverse ncRNA classification and sub-cellular localization determination datasets along with benchmark performance to aid the development and evaluation of novel computational methodologies. It identifies research gaps, heterogeneity, and challenges in the development of computational approaches for RNA sequence analysis. We consider that our expert analysis will assist Artificial Intelligence researchers with knowing state-of-the-art performance, model selection for various tasks on one platform, dominantly used sequence descriptors, neural architectures, and interpreting inter-species and intra-species performance deviation.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Muhammad Ali Ibrahim
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Muhammad Imran Malik
- National Center for Artificial Intelligence (NCAI), National University of Sciences and Technology, Islamabad 44000, Pakistan;
- School of Electrical Engineering & Computer Science, National University of Sciences and Technology, Islamabad 44000, Pakistan
| | - Andreas Dengel
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany; (M.A.I.); (A.D.); (S.A.)
- DeepReader GmbH, Trippstadter Str. 122, 67663 Kaiserslautern, Germany
| |
Collapse
|
64
|
Zheng H, Talukder A, Li X, Hu H. A systematic evaluation of the computational tools for lncRNA identification. Brief Bioinform 2021; 22:6343529. [PMID: 34368833 DOI: 10.1093/bib/bbab285] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Revised: 06/21/2021] [Accepted: 07/03/2021] [Indexed: 12/28/2022] Open
Abstract
The computational identification of long non-coding RNAs (lncRNAs) is important to study lncRNAs and their functions. Despite the existence of many computation tools for lncRNA identification, to our knowledge, there is no systematic evaluation of these tools on common datasets and no consensus regarding their performance and the importance of the features used. To fill this gap, in this study, we assessed the performance of 17 tools on several common datasets. We also investigated the importance of the features used by the tools. We found that the deep learning-based tools have the best performance in terms of identifying lncRNAs, and the peptide features do not contribute much to the tool accuracy. Moreover, when the transcripts in a cell type were considered, the performance of all tools significantly dropped, and the deep learning-based tools were no longer as good as other tools. Our study will serve as an excellent starting point for selecting tools and features for lncRNA identification.
Collapse
Affiliation(s)
- Hansi Zheng
- Department of Computer Science, University of Central Florida, Orlando, FL, USA
| | - Amlan Talukder
- Department of Computer Science, University of Central Florida, Orlando, FL, USA
| | - Xiaoman Li
- Burnett School of Biomedical Science, University of Central Florida, Orlando, FL, USA
| | - Haiyan Hu
- Department of Computer Science, University of Central Florida, Orlando, FL, USA
| |
Collapse
|
65
|
Cao P, Fan W, Li P, Hu Y. Genome-wide profiling of long noncoding RNAs involved in wheat spike development. BMC Genomics 2021; 22:493. [PMID: 34210256 PMCID: PMC8252277 DOI: 10.1186/s12864-021-07851-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Accepted: 06/23/2021] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Long noncoding RNAs (lncRNAs) have been shown to play important roles in the regulation of plant growth and development. Recent transcriptomic analyses have revealed the gene expression profiling in wheat spike development, however, the possible regulatory roles of lncRNAs in wheat spike morphogenesis remain largely unclear. RESULTS Here, we analyzed the genome-wide profiling of lncRNAs during wheat spike development at six stages, and identified a total of 8,889 expressed lncRNAs, among which 2,753 were differentially expressed lncRNAs (DE lncRNAs) at various developmental stages. Three hundred fifteen differentially expressed cis- and trans-regulatory lncRNA-mRNA pairs comprised of 205 lncRNAs and 279 genes were predicted, which were found to be mainly involved in the stress responses, transcriptional and enzymatic regulations. Moreover, the 145 DE lncRNAs were predicted as putative precursors or target mimics of miRNAs. Finally, we identified the important lncRNAs that participate in spike development by potentially targeting stress response genes, TF genes or miRNAs. CONCLUSIONS This study outlines an overall view of lncRNAs and their possible regulatory networks during wheat spike development, which also provides an alternative resource for genetic manipulation of wheat spike architecture and thus yield.
Collapse
Affiliation(s)
- Pei Cao
- Key Laboratory of Plant Molecular Physiology, CAS Center for Excellence in Molecular Plant Sciences, Institute of Botany, Chinese Academy of Sciences, 100093, Beijing, China
| | - Wenjuan Fan
- Key Laboratory of Plant Molecular Physiology, CAS Center for Excellence in Molecular Plant Sciences, Institute of Botany, Chinese Academy of Sciences, 100093, Beijing, China
- University of Chinese Academy of Sciences, 100049, Beijing, China
| | - Pengjia Li
- Key Laboratory of Plant Molecular Physiology, CAS Center for Excellence in Molecular Plant Sciences, Institute of Botany, Chinese Academy of Sciences, 100093, Beijing, China
- University of Chinese Academy of Sciences, 100049, Beijing, China
| | - Yuxin Hu
- Key Laboratory of Plant Molecular Physiology, CAS Center for Excellence in Molecular Plant Sciences, Institute of Botany, Chinese Academy of Sciences, 100093, Beijing, China.
- National Center for Plant Gene Research, 100093, Beijing, China.
| |
Collapse
|
66
|
Ren A, Zhang D, Tian Y, Cai P, Zhang T, Hu QN. Transcriptor: a comprehensive platform for annotation of the enzymatic functions of transcripts. Bioinformatics 2021; 37:434-435. [PMID: 32717064 DOI: 10.1093/bioinformatics/btaa685] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Revised: 05/29/2020] [Accepted: 07/22/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Rapid advances in sequencing technology have resulted huge increases in the accessibility of sequencing data. Moreover, researchers are focusing more on organisms that lack a reference genome. However, few easy-to-use web servers focusing on annotations of enzymatic functions are available. Accordingly, in this study, we describe Transcriptor, a novel platform for annotating transcripts encoding enzymes. RESULTS The transcripts were evaluated using more than 300 000 in-house enzymatic reactions through bridges of Enzyme Commission numbers. Transcriptor also enabled ontology term identification and along with associated enzymes, visualization and prediction of domains and annotation of regulatory structure, such as long noncoding RNAs, which could facilitate the discovery of new functions in model or nonmodel species. Transcriptor may have applications in elucidation of the roles of organs transcriptomes and secondary metabolite biosynthesis in organisms lacking a reference genome. AVAILABILITY AND IMPLEMENTATION Transcriptor is available at http://design.rxnfinder.org/transcriptor/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ailin Ren
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, P. R. China.,University of Chinese Academy of Sciences, Beijing 100864, P. R. China
| | - Dachuan Zhang
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200333, P. R. China.,University of Chinese Academy of Sciences, Beijing 100864, P. R. China
| | - Yu Tian
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, P. R. China.,University of Chinese Academy of Sciences, Beijing 100864, P. R. China
| | - Pengli Cai
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, P. R. China.,CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200333, P. R. China.,University of Chinese Academy of Sciences, Beijing 100864, P. R. China
| | - Tong Zhang
- Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, P. R. China.,College of Biotechnology, Tianjin University of Science and Technology, Tianjin 300457, P. R. China
| | - Qian-Nan Hu
- CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200333, P. R. China.,University of Chinese Academy of Sciences, Beijing 100864, P. R. China
| |
Collapse
|
67
|
Mishra SK, Wang H. Computational Analysis Predicts Hundreds of Coding lncRNAs in Zebrafish. BIOLOGY 2021; 10:biology10050371. [PMID: 33925925 PMCID: PMC8145020 DOI: 10.3390/biology10050371] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 04/21/2021] [Accepted: 04/22/2021] [Indexed: 12/30/2022]
Abstract
Simple Summary Noncoding RNAs (ncRNAs) regulate a variety of fundamental life processes such as development, physiology, metabolism and circadian rhythmicity. RNA-sequencing (RNA-seq) technology has facilitated the sequencing of the whole transcriptome, thereby capturing and quantifying the dynamism of transcriptome-wide RNA expression profiles. However, much remains unrevealed in the huge noncoding RNA datasets that require further bioinformatic analysis. In this study, we applied six bioinformatic tools to investigate coding potentials of approximately 21,000 lncRNAs. A total of 313 lncRNAs are predicted to be coded by all the six tools. Our findings provide insights into the regulatory roles of lncRNAs and set the stage for the functional investigation of these lncRNAs and their encoded micropeptides. Abstract Recent studies have demonstrated that numerous long noncoding RNAs (ncRNAs having more than 200 nucleotide base pairs (lncRNAs)) actually encode functional micropeptides, which likely represents the next regulatory biology frontier. Thus, identification of coding lncRNAs from ever-increasing lncRNA databases would be a bioinformatic challenge. Here we employed the Coding Potential Alignment Tool (CPAT), Coding Potential Calculator 2 (CPC2), LGC web server, Coding-Non-Coding Identifying Tool (CNIT), RNAsamba, and MicroPeptide identification tool (MiPepid) to analyze approximately 21,000 zebrafish lncRNAs and computationally to identify 2730–6676 zebrafish lncRNAs with high coding potentials, including 313 coding lncRNAs predicted by all the six bioinformatic tools. We also compared the sensitivity and specificity of these six bioinformatic tools for identifying lncRNAs with coding potentials and summarized their strengths and weaknesses. These predicted zebrafish coding lncRNAs set the stage for further experimental studies.
Collapse
Affiliation(s)
- Shital Kumar Mishra
- Center for Circadian Clocks, Soochow University, Suzhou 215123, China;
- School of Biology & Basic Medical Sciences, Medical College, Soochow University, Suzhou 215123, China
| | - Han Wang
- Center for Circadian Clocks, Soochow University, Suzhou 215123, China;
- School of Biology & Basic Medical Sciences, Medical College, Soochow University, Suzhou 215123, China
- Correspondence: or ; Tel.: +86-512-6588-2115
| |
Collapse
|
68
|
Xu X, Liu S, Yang Z, Zhao X, Deng Y, Zhang G, Pang J, Zhao C, Zhang W. A systematic review of computational methods for predicting long noncoding RNAs. Brief Funct Genomics 2021; 20:162-173. [PMID: 33754153 DOI: 10.1093/bfgp/elab016] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 02/20/2021] [Accepted: 02/22/2021] [Indexed: 12/20/2022] Open
Abstract
Accurately and rapidly distinguishing long noncoding RNAs (lncRNAs) from transcripts is prerequisite for exploring their biological functions. In recent years, many computational methods have been developed to predict lncRNAs from transcripts, but there is no systematic review on these computational methods. In this review, we introduce databases and features involved in the development of computational prediction models, and subsequently summarize existing state-of-the-art computational methods, including methods based on binary classifiers, deep learning and ensemble learning. However, a user-friendly way of employing existing state-of-the-art computational methods is in demand. Therefore, we develop a Python package ezLncPred, which provides a pragmatic command line implementation to utilize nine state-of-the-art lncRNA prediction methods. Finally, we discuss challenges of lncRNA prediction and future directions.
Collapse
|
69
|
Shukla B, Gupta S, Srivastava G, Sharma A, Shukla AK, Shasany AK. lncRNADetector: a bioinformatics pipeline for long non-coding RNA identification and MAPslnc: a repository of medicinal and aromatic plant lncRNAs. RNA Biol 2021; 18:2290-2295. [PMID: 33685383 DOI: 10.1080/15476286.2021.1899673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) are an emerging class of non-coding RNAs and potent regulatory elements in the living cells. High throughput RNA sequencing analyses have generated a tremendous amount of transcript sequence data. A large proportion of these transcript sequences does not code for proteins and are known as non-coding RNAs. Among them, lncRNAs are a unique class of transcripts longer than 200 nucleotides with diverse biological functions and regulatory mechanisms. Recent emerging studies and next-generation sequencing technologies show a substantial amount of lncRNAs within the plant genome, which are yet to be identified. The computational identification of lncRNAs from these transcripts is a challenging task due to the involvement of a series of filtering steps. We have developed lncRNADetector, a bioinformatics pipeline for the identification of novel lncRNAs, especially from medicinal and aromatic plant (MAP) species. The lncRNADetector has been utilized to analyse and identify more than 88,459 lncRNAs from 21 species of MAPs. To provide a knowledge resource for the plant research community towards elucidating the diversity of biological roles of lncRNAs, the information generated about MAP lncRNAs (post-filtering steps) through lncRNADetector has been stored and organized in MAPslnc database (MAPslnc, https://lncrnapipe.cimap.res.in). The lncRNADetector web server and MAPslnc database have been developed in order to facilitate researchers for accurate identification of lncRNAs from the next-generation sequencing data of different organisms for downstream studies. To the best of our knowledge no such MAPslnc database is available till date.
Collapse
Affiliation(s)
- Bhaskar Shukla
- Information and Communication Technology Department, CSIR-Central Institute of Medicinal and Aromatic Plants, P.O. CIMAP, Lucknow, India.,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - Sanchita Gupta
- CSIR-National Botanical Research Institute, Lucknow, India
| | - Gaurava Srivastava
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India.,Biotechnology Division, CSIR-Central Institute of Medicinal and Aromatic Plants, P.O. CIMAP, Lucknow, India
| | - Ashok Sharma
- Information and Communication Technology Department, CSIR-Central Institute of Medicinal and Aromatic Plants, P.O. CIMAP, Lucknow, India.,Biotechnology Division, CSIR-Central Institute of Medicinal and Aromatic Plants, P.O. CIMAP, Lucknow, India
| | - Ashutosh K Shukla
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India.,Biotechnology Division, CSIR-Central Institute of Medicinal and Aromatic Plants, P.O. CIMAP, Lucknow, India
| | - Ajit K Shasany
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India.,Biotechnology Division, CSIR-Central Institute of Medicinal and Aromatic Plants, P.O. CIMAP, Lucknow, India
| |
Collapse
|
70
|
Bonidia RP, Sampaio LDH, Domingues DS, Paschoal AR, Lopes FM, de Carvalho ACPLF, Sanches DS. Feature extraction approaches for biological sequences: a comparative study of mathematical features. Brief Bioinform 2021; 22:6135010. [PMID: 33585910 DOI: 10.1093/bib/bbab011] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 12/13/2020] [Accepted: 01/07/2021] [Indexed: 11/14/2022] Open
Abstract
As consequence of the various genomic sequencing projects, an increasing volume of biological sequence data is being produced. Although machine learning algorithms have been successfully applied to a large number of genomic sequence-related problems, the results are largely affected by the type and number of features extracted. This effect has motivated new algorithms and pipeline proposals, mainly involving feature extraction problems, in which extracting significant discriminatory information from a biological set is challenging. Considering this, our work proposes a new study of feature extraction approaches based on mathematical features (numerical mapping with Fourier, entropy and complex networks). As a case study, we analyze long non-coding RNA sequences. Moreover, we separated this work into three studies. First, we assessed our proposal with the most addressed problem in our review, e.g. lncRNA and mRNA; second, we also validate the mathematical features in different classification problems, to predict the class of lncRNA, e.g. circular RNAs sequences; third, we analyze its robustness in scenarios with imbalanced data. The experimental results demonstrated three main contributions: first, an in-depth study of several mathematical features; second, a new feature extraction pipeline; and third, its high performance and robustness for distinct RNA sequence classification. Availability: https://github.com/Bonidia/FeatureExtraction_BiologicalSequences.
Collapse
Affiliation(s)
- Robson P Bonidia
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio Procópio, 86300-000, Brazil.,Institute of Mathematics and Computer Sciences, University of São Paulo - USP, São Carlos, 13566-590, Brazil
| | - Lucas D H Sampaio
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio Procópio, 86300-000, Brazil
| | - Douglas S Domingues
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio Procópio, 86300-000, Brazil.,Department of Botany, Institute of Biosciences, São Paulo State University (UNESP), Rio Claro 13506-900, Brazil
| | - Alexandre R Paschoal
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio Procópio, 86300-000, Brazil
| | - Fabrício M Lopes
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio Procópio, 86300-000, Brazil
| | - André C P L F de Carvalho
- Institute of Mathematics and Computer Sciences, University of São Paulo - USP, São Carlos, 13566-590, Brazil
| | - Danilo S Sanches
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio Procópio, 86300-000, Brazil
| |
Collapse
|
71
|
Li Z, Liu L, Jiang S, Li Q, Feng C, Du Q, Zou D, Xiao J, Zhang Z, Ma L. LncExpDB: an expression database of human long non-coding RNAs. Nucleic Acids Res 2021; 49:D962-D968. [PMID: 33045751 PMCID: PMC7778919 DOI: 10.1093/nar/gkaa850] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 09/12/2020] [Accepted: 09/22/2020] [Indexed: 12/14/2022] Open
Abstract
Expression profiles of long non-coding RNAs (lncRNAs) across diverse biological conditions provide significant insights into their biological functions, interacting targets as well as transcriptional reliability. However, there lacks a comprehensive resource that systematically characterizes the expression landscape of human lncRNAs by integrating their expression profiles across a wide range of biological conditions. Here, we present LncExpDB (https://bigd.big.ac.cn/lncexpdb), an expression database of human lncRNAs that is devoted to providing comprehensive expression profiles of lncRNA genes, exploring their expression features and capacities, identifying featured genes with potentially important functions, and building interactions with protein-coding genes across various biological contexts/conditions. Based on comprehensive integration and stringent curation, LncExpDB currently houses expression profiles of 101 293 high-quality human lncRNA genes derived from 1977 samples of 337 biological conditions across nine biological contexts. Consequently, LncExpDB estimates lncRNA genes' expression reliability and capacities, identifies 25 191 featured genes, and further obtains 28 443 865 lncRNA-mRNA interactions. Moreover, user-friendly web interfaces enable interactive visualization of expression profiles across various conditions and easy exploration of featured lncRNAs and their interacting partners in specific contexts. Collectively, LncExpDB features comprehensive integration and curation of lncRNA expression profiles and thus will serve as a fundamental resource for functional studies on human lncRNAs.
Collapse
Affiliation(s)
- Zhao Li
- China National Center for Bioinformation, Beijing 100101, China.,National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100101, China
| | - Lin Liu
- China National Center for Bioinformation, Beijing 100101, China.,National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100101, China
| | - Shuai Jiang
- China National Center for Bioinformation, Beijing 100101, China.,National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Qianpeng Li
- China National Center for Bioinformation, Beijing 100101, China.,National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100101, China
| | - Changrui Feng
- China National Center for Bioinformation, Beijing 100101, China.,National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100101, China
| | - Qiang Du
- China National Center for Bioinformation, Beijing 100101, China.,National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100101, China
| | - Dong Zou
- China National Center for Bioinformation, Beijing 100101, China.,National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Jingfa Xiao
- China National Center for Bioinformation, Beijing 100101, China.,National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100101, China
| | - Zhang Zhang
- China National Center for Bioinformation, Beijing 100101, China.,National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100101, China
| | - Lina Ma
- China National Center for Bioinformation, Beijing 100101, China.,National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
72
|
Xing L, Xi Y, Qiao X, Huang C, Wu Q, Yang N, Guo J, Liu W, Fan W, Wan F, Qian W. The landscape of lncRNAs in Cydia pomonella provides insights into their signatures and potential roles in transcriptional regulation. BMC Genomics 2021; 22:4. [PMID: 33402093 PMCID: PMC7786964 DOI: 10.1186/s12864-020-07313-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Accepted: 12/07/2020] [Indexed: 12/13/2022] Open
Abstract
Background Long noncoding RNAs (lncRNAs) have emerged as an important class of transcriptional regulators in cellular processes. The past decades have witnessed great progress in lncRNA studies in a variety of organisms. The codling moth (Cydia pomonella L.) is an important invasive insect in China. However, the functional impact of lncRNAs in this insect remains unclear. In this study, an atlas of codling moth lncRNAs was constructed based on publicly available RNA-seq datasets. Results In total, 9875 lncRNA transcripts encoded by 9161 loci were identified in the codling moth. As expected, the lncRNAs exhibited shorter transcript lengths, lower GC contents, and lower expression levels than protein-coding genes (PCGs). Additionally, the lncRNAs were more likely to show tissue-specific expression patterns than PCGs. Interestingly, a substantial fraction of the lncRNAs showed a testis-biased expression pattern. Additionally, conservation analysis indicated that lncRNA sequences were weakly conserved across insect species, though additional lncRNAs with homologous relationships could be identified based on synteny, suggesting that synteny could be a more reliable approach for the cross-species comparison of lncRNAs. Furthermore, the correlation analysis of lncRNAs with neighbouring PCGs indicated a stronger correlation between them, suggesting potential cis-acting roles of these lncRNAs in the regulation of gene expression. Conclusions Taken together, our work provides a valuable resource for the comparative and functional study of lncRNAs, which will facilitate the understanding of their mechanistic roles in transcriptional regulation.
Collapse
Affiliation(s)
- Longsheng Xing
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Yu Xi
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Xi Qiao
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Cong Huang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Qiang Wu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Nianwan Yang
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Jianyang Guo
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Wanxue Liu
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, 100193, China
| | - Wei Fan
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China.
| | - Fanghao Wan
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China. .,State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, 100193, China.
| | - Wanqiang Qian
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China.
| |
Collapse
|
73
|
Li Y, Xu Q, Wu D, Chen G. Exploring Additional Valuable Information From Single-Cell RNA-Seq Data. Front Cell Dev Biol 2020; 8:593007. [PMID: 33335900 PMCID: PMC7736616 DOI: 10.3389/fcell.2020.593007] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 10/26/2020] [Indexed: 12/28/2022] Open
Abstract
Single-cell RNA-seq (scRNA-seq) technologies are broadly applied to dissect the cellular heterogeneity and expression dynamics, providing unprecedented insights into single-cell biology. Most of the scRNA-seq studies mainly focused on the dissection of cell types/states, developmental trajectory, gene regulatory network, and alternative splicing. However, besides these routine analyses, many other valuable scRNA-seq investigations can be conducted. Here, we first review cell-to-cell communication exploration, RNA velocity inference, identification of large-scale copy number variations and single nucleotide changes, and chromatin accessibility prediction based on single-cell transcriptomics data. Next, we discuss the identification of novel genes/transcripts through transcriptome reconstruction approaches, as well as the profiling of long non-coding RNAs and circular RNAs. Additionally, we survey the integration of single-cell and bulk RNA-seq datasets for deconvoluting the cell composition of large-scale bulk samples and linking single-cell signatures to patient outcomes. These additional analyses could largely facilitate corresponding basic science and clinical applications.
Collapse
Affiliation(s)
- Yunjin Li
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, China
| | - Qiyue Xu
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, China
| | - Duojiao Wu
- Institute of Clinical Science, Zhongshan Hospital, Fudan University, Shanghai, China
| | - Geng Chen
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, China
| |
Collapse
|
74
|
Li J, Zhang X, Liu C. The computational approaches of lncRNA identification based on coding potential: Status quo and challenges. Comput Struct Biotechnol J 2020; 18:3666-3677. [PMID: 33304463 PMCID: PMC7710504 DOI: 10.1016/j.csbj.2020.11.030] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 11/15/2020] [Accepted: 11/16/2020] [Indexed: 12/13/2022] Open
Abstract
Long noncoding RNAs (lncRNAs) make up a large proportion of transcriptome in eukaryotes, and have been revealed with many regulatory functions in various biological processes. When studying lncRNAs, the first step is to accurately and specifically distinguish them from the colossal transcriptome data with complicated composition, which contains mRNAs, lncRNAs, small RNAs and their primary transcripts. In the face of such a huge and progressively expanding transcriptome data, the in-silico approaches provide a practicable scheme for effectively and rapidly filtering out lncRNA targets, using machine learning and probability statistics. In this review, we mainly discussed the characteristics of algorithms and features on currently developed approaches. We also outlined the traits of some state-of-the-art tools for ease of operation. Finally, we pointed out the underlying challenges in lncRNA identification with the advent of new experimental data.
Collapse
Affiliation(s)
- Jing Li
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
| | - Xuan Zhang
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
| | - Changning Liu
- CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
- Center of Economic Botany, Core Botanical Gardens, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
- The Innovative Academy of Seed Design, Chinese Academy of Sciences, Menglun, Mengla, Yunnan 666303, China
| |
Collapse
|
75
|
Sang J, Zou D, Wang Z, Wang F, Zhang Y, Xia L, Li Z, Ma L, Li M, Xu B, Liu X, Wu S, Liu L, Niu G, Li M, Luo Y, Hu S, Hao L, Zhang Z. IC4R-2.0: Rice Genome Reannotation Using Massive RNA-seq Data. GENOMICS PROTEOMICS & BIOINFORMATICS 2020; 18:161-172. [PMID: 32683045 PMCID: PMC7646092 DOI: 10.1016/j.gpb.2018.12.011] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Revised: 11/28/2018] [Accepted: 12/29/2018] [Indexed: 12/19/2022]
Abstract
Genome reannotation aims for complete and accurate characterization of gene models and thus is of critical significance for in-depth exploration of gene function. Although the availability of massive RNA-seq data provides great opportunities for gene model refinement, few efforts have been made to adopt these precious data in rice genome reannotation. Here we reannotate the rice (Oryza sativa L. ssp. japonica) genome based on integration of large-scale RNA-seq data and release a new annotation system IC4R-2.0. In general, IC4R-2.0 significantly improves the completeness of gene structure, identifies a number of novel genes, and integrates a variety of functional annotations. Furthermore, long non-coding RNAs (lncRNAs) and circular RNAs (circRNAs) are systematically characterized in the rice genome. Performance evaluation shows that compared to previous annotation systems, IC4R-2.0 achieves higher integrity and quality, primarily attributable to massive RNA-seq data applied in genome annotation. Consequently, we incorporate the improved annotations into the Information Commons for Rice (IC4R), a database integrating multiple omics data of rice, and accordingly update IC4R by providing more user-friendly web interfaces and implementing a series of practical online tools. Together, the updated IC4R, which is equipped with the improved annotations, bears great promise for comparative and functional genomic studies in rice and other monocotyledonous species. The IC4R-2.0 annotation system and related resources are freely accessible at http://ic4r.org/.
Collapse
Affiliation(s)
- Jian Sang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Dong Zou
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Zhennan Wang
- University of Chinese Academy of Sciences, Beijing 100049, China; State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Fan Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Yuansheng Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lin Xia
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhaohua Li
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lina Ma
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Mengwei Li
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Bingxiang Xu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiaonan Liu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shuangyang Wu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lin Liu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Guangyi Niu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Man Li
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yingfeng Luo
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Songnian Hu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Lili Hao
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.
| | - Zhang Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
76
|
Ma L, Cao J, Liu L, Li Z, Shireen H, Pervaiz N, Batool F, Raza RZ, Zou D, Bao Y, Abbasi AA, Zhang Z. Community Curation and Expert Curation of Human Long Noncoding RNAs with LncRNAWiki and LncBook. ACTA ACUST UNITED AC 2020; 67:e82. [PMID: 31524988 DOI: 10.1002/cpbi.82] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
In recent years, the number of human long noncoding RNAs (lncRNAs) that have been identified has increased exponentially. However, these lncRNAs are poorly annotated compared to protein-coding genes, posing great challenges for a better understanding of their functional significance and elucidating their complex functioning molecular mechanisms. Here we employ both community and expert curation to yield a comprehensive collection of human lncRNAs and their annotations. Specifically, LncRNAWiki (http://lncrna.big.ac.cn/index.php/Main_Page) uses a wiki-based community curation model, thus showing great promise in dealing with the flood of biological knowledge, while LncBook (http://bigd.big.ac.cn/lncbook) is an expert curation-based database that provides a complement to LncRNAWiki. LncBook features a comprehensive collection of human lncRNAs and a systematic curation of lncRNAs by multi-omics data integration, functional annotation, and disease association. These protocols provide step-by-step instructions on how to browse and search a specific lncRNA and how to obtain a range of related information including expression, methylation, variation, function, and disease association. © 2019 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Lina Ma
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Jiabao Cao
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Lin Liu
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Zhao Li
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Huma Shireen
- National Center for Bioinformatics, Programme of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad, Pakistan
| | - Nashaiman Pervaiz
- National Center for Bioinformatics, Programme of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad, Pakistan
| | - Fatima Batool
- National Center for Bioinformatics, Programme of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad, Pakistan
| | - Rabail Z Raza
- National Center for Bioinformatics, Programme of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad, Pakistan
| | - Dong Zou
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
| | - Yiming Bao
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Amir A Abbasi
- National Center for Bioinformatics, Programme of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad, Pakistan
| | - Zhang Zhang
- BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
77
|
Liang B, Ding H, Huang L, Luo H, Zhu X. GWAS in cancer: progress and challenges. Mol Genet Genomics 2020; 295:537-561. [PMID: 32048005 DOI: 10.1007/s00438-020-01647-z] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Accepted: 01/11/2020] [Indexed: 12/31/2022]
Abstract
The genome-wide association study (GWAS) is an effective method to detect single-nucleotide polymorphisms (SNPs) of multiple individual genes based on linkage disequilibrium (LD). GWAS examines genotypes and distinguishing gene characteristics that are exhibited in diseases. In the past few decades, more and more literature has reported the results of applying GWAS to study tumors. Although many pleiotropic loci associated with complex phenotypes have been identified by GWAS, the biological functions of many genetic variation loci remain unclear, and the genetic mechanisms of most complex phenotypes cannot be systematically explained. In this article, we will review the new findings of several tumor types, and categorize the new sites and mechanisms that have recently been discovered. We linked the mechanisms of action of various tumors and searched for links to related gene expression pathways. We found that susceptible sites can be divided into hub genes and peripheral genes; the two interact to link gene expression in a variety of diseases.
Collapse
Affiliation(s)
- Baiqiang Liang
- Guangdong Key Laboratory for Research and Development of Natural Drugs, Guangdong Medical University, Zhanjiang, 524023, China.,The Marine Biomedical Research Institute, Southern Marine Science and Engineering Guangdong Laboratory Zhanjiang, Guangdong Medical University, Zhanjiang, 524023, China.,Cancer Center, The Affiliated Hospital, Guangdong Medical University, Zhanjiang, 524023, China
| | - Hongrong Ding
- The Marine Biomedical Research Institute, Southern Marine Science and Engineering Guangdong Laboratory Zhanjiang, Guangdong Medical University, Zhanjiang, 524023, China.,Key Laboratory of Guangdong Provincial Medical Molecular Diagnosis, Dongguan, 523808, China
| | - Lianfang Huang
- Guangdong Key Laboratory for Research and Development of Natural Drugs, Guangdong Medical University, Zhanjiang, 524023, China.,The Marine Biomedical Research Institute, Southern Marine Science and Engineering Guangdong Laboratory Zhanjiang, Guangdong Medical University, Zhanjiang, 524023, China
| | - Haiqing Luo
- Cancer Center, The Affiliated Hospital, Guangdong Medical University, Zhanjiang, 524023, China.
| | - Xiao Zhu
- Guangdong Key Laboratory for Research and Development of Natural Drugs, Guangdong Medical University, Zhanjiang, 524023, China. .,The Marine Biomedical Research Institute, Southern Marine Science and Engineering Guangdong Laboratory Zhanjiang, Guangdong Medical University, Zhanjiang, 524023, China. .,Key Laboratory of Guangdong Provincial Medical Molecular Diagnosis, Dongguan, 523808, China.
| |
Collapse
|
78
|
National Genomics Data Center Members and Partners, Zhang Z, Zhao W, Xiao J, Bao Y, He S, Zhang G, Li Y, Zhao G, Chen R, Gao Y, Zhang C, Yuan L, Zhang G, Xu S, Zhang C, Gao Y, Ning Z, Lu Y, Xu S, Zeng J, Yuan N, Zhu J, Pan M, Zhang H, Wang Q, Shi S, Jiang M, Lu M, Qian Q, Gao Q, Shang Y, Wang J, Du Z, Xiao J, Tian D, Wang P, Tang B, Li C, Teng X, Liu X, Zou D, Song S, Xiong Z, Li M, Yang F, Ma Y, Sang J, Li Z, Li R, Wang Z, Zhu Q, Zhu J, Li X, Zhang S, Tian D, Kang H, Li C, Dong L, Ying C, Duan G, Song S, Li M, Zhao W, Zhi X, Ling Y, Cao R, Jiang Z, Zhou H, Lv D, Liu W, Klenk HP, Zhao G, Zhang G, Zhang Y, Zhang Z, Zhang H, Xiao J, Chen T, Zhang S, Chen X, Zhu J, Wang Z, Kang H, Dong L, Wang Y, Ma Y, Wu S, Li Z, Gong Z, Chen M, Li C, Tian D, Teng X, Wang P, Tang B, Liu X, Zou D, Song S, Fang S, et alNational Genomics Data Center Members and Partners, Zhang Z, Zhao W, Xiao J, Bao Y, He S, Zhang G, Li Y, Zhao G, Chen R, Gao Y, Zhang C, Yuan L, Zhang G, Xu S, Zhang C, Gao Y, Ning Z, Lu Y, Xu S, Zeng J, Yuan N, Zhu J, Pan M, Zhang H, Wang Q, Shi S, Jiang M, Lu M, Qian Q, Gao Q, Shang Y, Wang J, Du Z, Xiao J, Tian D, Wang P, Tang B, Li C, Teng X, Liu X, Zou D, Song S, Xiong Z, Li M, Yang F, Ma Y, Sang J, Li Z, Li R, Wang Z, Zhu Q, Zhu J, Li X, Zhang S, Tian D, Kang H, Li C, Dong L, Ying C, Duan G, Song S, Li M, Zhao W, Zhi X, Ling Y, Cao R, Jiang Z, Zhou H, Lv D, Liu W, Klenk HP, Zhao G, Zhang G, Zhang Y, Zhang Z, Zhang H, Xiao J, Chen T, Zhang S, Chen X, Zhu J, Wang Z, Kang H, Dong L, Wang Y, Ma Y, Wu S, Li Z, Gong Z, Chen M, Li C, Tian D, Teng X, Wang P, Tang B, Liu X, Zou D, Song S, Fang S, Zhang L, Guo J, Niu Y, Wu Y, Li H, Zhao L, Li X, Teng X, Sun X, Sun L, Chen R, Zhao Y, Wang J, Zhang P, Li Y, Zheng Y, Chen R, He S, Teng X, Chen X, Xue H, Teng Y, Zhang P, Kang Q, Hao Y, Zhao Y, Chen R, He S, Cao J, Liu L, Li Z, Li Q, Zou D, Du Q, Abbasi AA, Shireen H, Pervaiz N, Batool F, Raza RZ, Ma L, Niu G, Zhang Y, Zou D, Zhu T, Sang J, Li M, Hao L, Zou D, Wang G, Li M, Li R, Li M, Li R, Bao Y, Yan J, Sang J, Zou D, Li C, Wang Z, Zhang Y, Zhu T, Song S, Wang X, Hao L, Li Z, Zhang Y, Zou D, Zhao Y, Wang H, Zhang Y, Xia X, Guo H, Zhang Z, Zou D, Ma L, Dong L, Tang B, Zhu J, Zhou Q, Wang Z, Kang H, Chen X, Lan L, Bao Y, Zhao W, Zou D, Zhu J, Tang B, Bao Y, Lan L, Zhang X, Ma Y, Xue Y, Sun Y, Zhai S, Yu L, Sun M, Chen H, Zhang Z, Zhao W, Xiao J, Bao Y, Hao L, Hu H, Guo AY, Lin S, Xue Y, Wang C, Xue Y, Ning W, Xue Y, Zhang X, Xiao Y, Li X, Tu Y, Xue Y, Wu W, Ji P, Zhao F, Luo H, Gao F, Guo Y, Xue Y, Yuan H, Zhang YE, Zhang Q, Guo AY, Zhou J, Xue Y, Huang Z, Cui Q, Miao YR, Guo AY, Ruan C, Xue Y, Yuan C, Chen M, Jin JP, Tian F, Gao G, Shi Y, Xue Y, Yao L, Xue Y, Cui Q, Li X, Li CY, Tang Q, Guo AY, Peng D, Xue Y. Database Resources of the National Genomics Data Center in 2020. Nucleic Acids Res 2020; 48:D24-D33. [PMID: 31702008 PMCID: PMC7145560 DOI: 10.1093/nar/gkz913] [Show More Authors] [Citation(s) in RCA: 111] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Revised: 09/30/2019] [Accepted: 10/02/2019] [Indexed: 11/23/2022] Open
Abstract
The National Genomics Data Center (NGDC) provides a suite of database resources to support worldwide research activities in both academia and industry. With the rapid advancements in higher-throughput and lower-cost sequencing technologies and accordingly the huge volume of multi-omics data generated at exponential scales and rates, NGDC is continually expanding, updating and enriching its core database resources through big data integration and value-added curation. In the past year, efforts for update have been mainly devoted to BioProject, BioSample, GSA, GWH, GVM, NONCODE, LncBook, EWAS Atlas and IC4R. Newly released resources include three human genome databases (PGG.SNV, PGG.Han and CGVD), eLMSG, EWAS Data Hub, GWAS Atlas, iSheep and PADS Arsenal. In addition, four web services, namely, eGPS Cloud, BIG Search, BIG Submission and BIG SSO, have been significantly improved and enhanced. All of these resources along with their services are publicly accessible at https://bigd.big.ac.cn.
Collapse
|
79
|
Bu C, Zhang Q, Zeng J, Cao X, Hao Z, Qiao D, Cao Y, Xu H. Identification of a novel anthocyanin synthesis pathway in the fungus Aspergillus sydowii H-1. BMC Genomics 2020; 21:29. [PMID: 31914922 PMCID: PMC6950803 DOI: 10.1186/s12864-019-6442-2] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Accepted: 12/29/2019] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Anthocyanins are common substances with many agro-food industrial applications. However, anthocyanins are generally considered to be found only in natural plants. Our previous study isolated and purified the fungus Aspergillus sydowii H-1, which can produce purple pigments during fermentation. To understand the characteristics of this strain, a transcriptomic and metabolomic comparative analysis was performed with A. sydowii H-1 from the second and eighth days of fermentation, which confer different pigment production. RESULTS We found five anthocyanins with remarkably different production in A. sydowii H-1 on the eighth day of fermentation compared to the second day of fermentation. LC-MS/MS combined with other characteristics of anthocyanins suggested that the purple pigment contained anthocyanins. A total of 28 transcripts related to the anthocyanin biosynthesis pathway was identified in A. sydowii H-1, and almost all of the identified genes displayed high correlations with the metabolome. Among them, the chalcone synthase gene (CHS) and cinnamate-4-hydroxylase gene (C4H) were only found using the de novo assembly method. Interestingly, the best hits of these two genes belonged to plant species. Finally, we also identified 530 lncRNAs in our datasets, and among them, three lncRNAs targeted the genes related to anthocyanin biosynthesis via cis-regulation, which provided clues for understanding the underlying mechanism of anthocyanin production in fungi. CONCLUSION We first reported that anthocyanin can be produced in fungus, A. sydowii H-1. Totally, 31 candidate transcripts were identified involved in anthocyanin biosynthesis, in which CHS and C4H, known as the key genes in anthocyanin biosynthesis, were only found in strain H1, which indicated that these two genes may contribute to anthocyanins producing in H-1. This discovery expanded our knowledges of the biosynthesis of anthocyanins and provided a direction for the production of anthocyanin.
Collapse
Affiliation(s)
- Congfan Bu
- Microbiology and Metabolic Engineering Key Laboratory of Sichuan Province, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065 Sichuan People’s Republic of China
| | - Qian Zhang
- Microbiology and Metabolic Engineering Key Laboratory of Sichuan Province, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065 Sichuan People’s Republic of China
| | - Jie Zeng
- Microbiology and Metabolic Engineering Key Laboratory of Sichuan Province, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065 Sichuan People’s Republic of China
| | - Xiyue Cao
- Microbiology and Metabolic Engineering Key Laboratory of Sichuan Province, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065 Sichuan People’s Republic of China
| | - Zhaonan Hao
- Microbiology and Metabolic Engineering Key Laboratory of Sichuan Province, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065 Sichuan People’s Republic of China
| | - Dairong Qiao
- Microbiology and Metabolic Engineering Key Laboratory of Sichuan Province, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065 Sichuan People’s Republic of China
| | - Yi Cao
- Microbiology and Metabolic Engineering Key Laboratory of Sichuan Province, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065 Sichuan People’s Republic of China
| | - Hui Xu
- Microbiology and Metabolic Engineering Key Laboratory of Sichuan Province, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, 610065 Sichuan People’s Republic of China
| |
Collapse
|