1
|
Salojärvi J, Rambani A, Yu Z, Guyot R, Strickler S, Lepelley M, Wang C, Rajaraman S, Rastas P, Zheng C, Muñoz DS, Meidanis J, Paschoal AR, Bawin Y, Krabbenhoft TJ, Wang ZQ, Fleck SJ, Aussel R, Bellanger L, Charpagne A, Fournier C, Kassam M, Lefebvre G, Métairon S, Moine D, Rigoreau M, Stolte J, Hamon P, Couturon E, Tranchant-Dubreuil C, Mukherjee M, Lan T, Engelhardt J, Stadler P, Correia De Lemos SM, Suzuki SI, Sumirat U, Wai CM, Dauchot N, Orozco-Arias S, Garavito A, Kiwuka C, Musoli P, Nalukenge A, Guichoux E, Reinout H, Smit M, Carretero-Paulet L, Filho OG, Braghini MT, Padilha L, Sera GH, Ruttink T, Henry R, Marraccini P, Van de Peer Y, Andrade A, Domingues D, Giuliano G, Mueller L, Pereira LF, Plaisance S, Poncet V, Rombauts S, Sankoff D, Albert VA, Crouzillat D, de Kochko A, Descombes P. The genome and population genomics of allopolyploid Coffea arabica reveal the diversification history of modern coffee cultivars. Nat Genet 2024; 56:721-731. [PMID: 38622339 PMCID: PMC11018527 DOI: 10.1038/s41588-024-01695-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 02/23/2024] [Indexed: 04/17/2024]
Abstract
Coffea arabica, an allotetraploid hybrid of Coffea eugenioides and Coffea canephora, is the source of approximately 60% of coffee products worldwide, and its cultivated accessions have undergone several population bottlenecks. We present chromosome-level assemblies of a di-haploid C. arabica accession and modern representatives of its diploid progenitors, C. eugenioides and C. canephora. The three species exhibit largely conserved genome structures between diploid parents and descendant subgenomes, with no obvious global subgenome dominance. We find evidence for a founding polyploidy event 350,000-610,000 years ago, followed by several pre-domestication bottlenecks, resulting in narrow genetic variation. A split between wild accessions and cultivar progenitors occurred ~30.5 thousand years ago, followed by a period of migration between the two populations. Analysis of modern varieties, including lines historically introgressed with C. canephora, highlights their breeding histories and loci that may contribute to pathogen resistance, laying the groundwork for future genomics-based breeding of C. arabica.
Collapse
Affiliation(s)
- Jarkko Salojärvi
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore.
- Organismal and Evolutionary Biology Research Programme, University of Helsinki, Helsinki, Finland.
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore.
| | - Aditi Rambani
- Boyce Thompson Institute, Cornell University, Ithaca, NY, USA
| | - Zhe Yu
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Ontario, Canada
| | - Romain Guyot
- Institut de Recherche pour le Développement (IRD), Université de Montpellier, Montpellier, France
- Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales, Colombia
| | - Susan Strickler
- Boyce Thompson Institute, Cornell University, Ithaca, NY, USA
| | - Maud Lepelley
- Société des Produits Nestlé SA, Nestlé Research, Tours, France
| | - Cui Wang
- Organismal and Evolutionary Biology Research Programme, University of Helsinki, Helsinki, Finland
| | - Sitaram Rajaraman
- Organismal and Evolutionary Biology Research Programme, University of Helsinki, Helsinki, Finland
| | - Pasi Rastas
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Chunfang Zheng
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Ontario, Canada
| | - Daniella Santos Muñoz
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Ontario, Canada
| | - João Meidanis
- Institute of Computing, University of Campinas, Campinas, Brazil
| | - Alexandre Rossi Paschoal
- Department of Computer Science, The Federal University of Technology - Paraná (UTFPR), Cornélio Procópio, Brazil
| | - Yves Bawin
- Plant Sciences Unit, Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), Melle, Belgium
| | | | - Zhen Qin Wang
- Department of Biological Sciences, University at Buffalo, Buffalo, NY, USA
| | - Steven J Fleck
- Department of Biological Sciences, University at Buffalo, Buffalo, NY, USA
| | - Rudy Aussel
- Société des Produits Nestlé SA, Nestlé Research, Tours, France
- Centre d'Immunologie de Marseille-Luminy, Aix Marseille Université, Marseille, France
| | | | - Aline Charpagne
- Société des Produits Nestlé SA, Nestlé Research, Lausanne, Switzerland
| | - Coralie Fournier
- Société des Produits Nestlé SA, Nestlé Research, Lausanne, Switzerland
| | - Mohamed Kassam
- Société des Produits Nestlé SA, Nestlé Research, Lausanne, Switzerland
| | - Gregory Lefebvre
- Société des Produits Nestlé SA, Nestlé Research, Lausanne, Switzerland
| | - Sylviane Métairon
- Société des Produits Nestlé SA, Nestlé Research, Lausanne, Switzerland
| | - Déborah Moine
- Société des Produits Nestlé SA, Nestlé Research, Lausanne, Switzerland
| | - Michel Rigoreau
- Société des Produits Nestlé SA, Nestlé Research, Tours, France
| | - Jens Stolte
- Société des Produits Nestlé SA, Nestlé Research, Lausanne, Switzerland
| | - Perla Hamon
- Institut de Recherche pour le Développement (IRD), Université de Montpellier, Montpellier, France
| | - Emmanuel Couturon
- Institut de Recherche pour le Développement (IRD), Université de Montpellier, Montpellier, France
| | | | - Minakshi Mukherjee
- Department of Biological Sciences, University at Buffalo, Buffalo, NY, USA
| | - Tianying Lan
- Department of Biological Sciences, University at Buffalo, Buffalo, NY, USA
| | - Jan Engelhardt
- Department of Computer Science, University of Leipzig, Leipzig, Germany
| | - Peter Stadler
- Department of Computer Science, University of Leipzig, Leipzig, Germany
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany
| | | | | | - Ucu Sumirat
- Indonesian Coffee and Cocoa Research Institute (ICCRI), Jember, Indonesia
| | - Ching Man Wai
- University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Nicolas Dauchot
- Research Unit in Plant Cellular and Molecular Biology, University of Namur, Namur, Belgium
| | - Simon Orozco-Arias
- Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales, Colombia
| | - Andrea Garavito
- Departamento de Ciencias Biológicas, Facultad de Ciencias Exactas y Naturales, Universidad de Caldas, Manizales, Colombia
| | - Catherine Kiwuka
- National Agricultural Research Organization (NARO), Entebbe, Uganda
| | - Pascal Musoli
- National Agricultural Research Organization (NARO), Entebbe, Uganda
| | - Anne Nalukenge
- National Agricultural Research Organization (NARO), Entebbe, Uganda
| | - Erwan Guichoux
- Biodiversité Gènes & Communautés, INRA, Bordeaux, France
| | | | - Martin Smit
- Hortus Botanicus Amsterdam, Amsterdam, the Netherlands
| | | | - Oliveiro Guerreiro Filho
- Instituto Agronômico (IAC) Centro de Café 'Alcides Carvalho', Fazenda Santa Elisa, Campinas, Brazil
| | - Masako Toma Braghini
- Instituto Agronômico (IAC) Centro de Café 'Alcides Carvalho', Fazenda Santa Elisa, Campinas, Brazil
| | - Lilian Padilha
- Embrapa Café/Instituto Agronômico (IAC) Centro de Café 'Alcides Carvalho', Fazenda Santa Elisa, Campinas, Brazil
| | | | - Tom Ruttink
- Plant Sciences Unit, Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), Melle, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
| | - Robert Henry
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, Queensland, Australia
| | - Pierre Marraccini
- CIRAD - UMR DIADE (IRD-CIRAD-Université de Montpellier) BP 64501, Montpellier, France
| | - Yves Van de Peer
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria, South Africa
- College of Horticulture, Academy for Advanced Interdisciplinary Studies, Nanjing Agricultural University, Nanjing, China
- Center for Plant Systems Biology, VIB, Ghent, Belgium
| | - Alan Andrade
- Embrapa Café/Inovacafé Laboratory of Molecular Genetics Campus da UFLA-MG, Lavras, Brazil
| | - Douglas Domingues
- Group of Genomics and Transcriptomes in Plants, São Paulo State University, UNESP, Rio Claro, Brazil
| | - Giovanni Giuliano
- Italian National Agency for New Technologies, Energy and Sustainable Economic Development, ENEA Casaccia Research Center, Rome, Italy
| | - Lukas Mueller
- Boyce Thompson Institute, Cornell University, Ithaca, NY, USA
| | - Luiz Filipe Pereira
- Embrapa Café/Lab. Biotecnologia, Área de Melhoramento Genético, Londrina, Brazil
| | | | - Valerie Poncet
- Institut de Recherche pour le Développement (IRD), Université de Montpellier, Montpellier, France
| | - Stephane Rombauts
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Center for Plant Systems Biology, VIB, Ghent, Belgium
| | - David Sankoff
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Ontario, Canada
| | - Victor A Albert
- Department of Biological Sciences, University at Buffalo, Buffalo, NY, USA.
| | | | - Alexandre de Kochko
- Institut de Recherche pour le Développement (IRD), Université de Montpellier, Montpellier, France.
| | - Patrick Descombes
- Société des Produits Nestlé SA, Nestlé Research, Lausanne, Switzerland.
| |
Collapse
|
2
|
Barbosa DF, Oliveira LS, Nachtigall PG, Valentini Junior R, de Souza N, Paschoal AR, Kashiwabara AY. cirCodAn: A GHMM-based tool for accurate prediction of coding regions in circRNA. Adv Protein Chem Struct Biol 2024; 139:289-334. [PMID: 38448139 DOI: 10.1016/bs.apcsb.2023.11.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/08/2024]
Abstract
Studies focusing on characterizing circRNAs with the potential to translate into peptides are quickly advancing. It is helping to elucidate the roles played by circRNAs in several biological processes, especially in the emergence and development of diseases. While various tools are accessible for predicting coding regions within linear sequences, none have demonstrated accurate open reading frame detection in circular sequences, such as circRNAs. Here, we present cirCodAn, a novel tool designed to predict coding regions in circRNAs. We evaluated the performance of cirCodAn using datasets of circRNAs with strong translation evidence and showed that cirCodAn outperformed the other tools available to perform a similar task. Our findings demonstrate the applicability of cirCodAn to identify coding regions in circRNAs, which reveals the potential of use of cirCodAn in future research focusing on elucidating the biological roles of circRNAs and their encoded proteins. cirCodAn is freely available at https://github.com/denilsonfbar/cirCodAn.
Collapse
Affiliation(s)
- Denilson Fagundes Barbosa
- Programa de Pós-Graduação Associado em Bioinformática (UFPR/UTFPR), Departamento Acadêmico de Computação (DACOM), Universidade Tecnológica Federal do Paraná (UTFPR), Cornélio Procópio, Paraná, Brazil; Instituto Federal de Educação, Ciência e Tecnologia de Santa Catarina (IFSC), Canoinhas, Santa Catarina, Brazil
| | - Liliane Santana Oliveira
- Programa de Pós-Graduação Associado em Bioinformática (UFPR/UTFPR), Departamento Acadêmico de Computação (DACOM), Universidade Tecnológica Federal do Paraná (UTFPR), Cornélio Procópio, Paraná, Brazil
| | - Pedro Gabriel Nachtigall
- Laboratório de Toxinologia Aplicada, CeTICS, Instituto Butantan, São Paulo, SP, Brazil; Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, Oslo, Norway
| | - Rodolpho Valentini Junior
- Programa de Pós-Graduação Associado em Bioinformática (UFPR/UTFPR), Departamento Acadêmico de Computação (DACOM), Universidade Tecnológica Federal do Paraná (UTFPR), Cornélio Procópio, Paraná, Brazil
| | - Nayane de Souza
- Programa de Pós-Graduação Associado em Bioinformática (UFPR/UTFPR), Departamento Acadêmico de Computação (DACOM), Universidade Tecnológica Federal do Paraná (UTFPR), Cornélio Procópio, Paraná, Brazil
| | - Alexandre Rossi Paschoal
- Programa de Pós-Graduação Associado em Bioinformática (UFPR/UTFPR), Departamento Acadêmico de Computação (DACOM), Universidade Tecnológica Federal do Paraná (UTFPR), Cornélio Procópio, Paraná, Brazil
| | - André Yoshiaki Kashiwabara
- Programa de Pós-Graduação Associado em Bioinformática (UFPR/UTFPR), Departamento Acadêmico de Computação (DACOM), Universidade Tecnológica Federal do Paraná (UTFPR), Cornélio Procópio, Paraná, Brazil.
| |
Collapse
|
3
|
Sanita Lima M, Rossi Paschoal A, Silva Domingues D, Smith DR. Pervasive transcription of plant organelle genomes: functional noncoding transcriptomes? Trends Plant Sci 2024:S1360-1385(24)00019-0. [PMID: 38360479 DOI: 10.1016/j.tplants.2024.01.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/15/2024] [Accepted: 01/23/2024] [Indexed: 02/17/2024]
Abstract
Plant mitochondrial and plastid genomes typically show pervasive, genome-wide transcription. Little is known, however, about the utility of organelle noncoding RNAs, which often make up most of the transcriptome. Here, we suggest that long-read sequencing data combined with dedicated RNA databases could help identify putative functional organelle noncoding transcripts.
Collapse
Affiliation(s)
- Matheus Sanita Lima
- Department of Biology, Western University, London, Ontario, N6A 5B7, Canada.
| | - Alexandre Rossi Paschoal
- Department of Computer Science, Bioinformatics and Pattern Recognition Group, Federal University of Technology - Paraná - UTFPR, Cornélio Procópio, PR, Brazil
| | - Douglas Silva Domingues
- Department of Genetics, "Luiz de Queiroz" College of Agriculture, University of São Paulo, Piracicaba, SP, Brazil
| | - David Roy Smith
- Department of Biology, Western University, London, Ontario, N6A 5B7, Canada.
| |
Collapse
|
4
|
Orozco-Arias S, Humberto Lopez-Murillo L, Candamil-Cortés MS, Arias M, Jaimes PA, Rossi Paschoal A, Tabares-Soto R, Isaza G, Guyot R. Inpactor2: a software based on deep learning to identify and classify LTR-retrotransposons in plant genomes. Brief Bioinform 2022; 24:6887110. [PMID: 36502372 PMCID: PMC9851300 DOI: 10.1093/bib/bbac511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 10/13/2022] [Accepted: 10/26/2022] [Indexed: 12/14/2022] Open
Abstract
LTR-retrotransposons are the most abundant repeat sequences in plant genomes and play an important role in evolution and biodiversity. Their characterization is of great importance to understand their dynamics. However, the identification and classification of these elements remains a challenge today. Moreover, current software can be relatively slow (from hours to days), sometimes involve a lot of manual work and do not reach satisfactory levels in terms of precision and sensitivity. Here we present Inpactor2, an accurate and fast application that creates LTR-retrotransposon reference libraries in a very short time. Inpactor2 takes an assembled genome as input and follows a hybrid approach (deep learning and structure-based) to detect elements, filter partial sequences and finally classify intact sequences into superfamilies and, as very few tools do, into lineages. This tool takes advantage of multi-core and GPU architectures to decrease execution times. Using the rice genome, Inpactor2 showed a run time of 5 minutes (faster than other tools) and has the best accuracy and F1-Score of the tools tested here, also having the second best accuracy and specificity only surpassed by EDTA, but achieving 28% higher sensitivity. For large genomes, Inpactor2 is up to seven times faster than other available bioinformatics tools.
Collapse
Affiliation(s)
- Simon Orozco-Arias
- Corresponding authors. Simon Orozco-Arias, Computer Science Department, Universidad Autónoma de Manizales, Antigua Estación del Ferrocarrill, Manizalez, Colombia. Tel.: +57(606)8727272 - 8727709 Ext 102; E-mail: ; Alexandre Rossi Paschoal, Department of Computer Science, Bioinformatics and Pattern Recognition Group, Graduation Program in Bioinformatics, Federal University of Technology - Paraná, UTFPR, Cornélio Procópio, Paraná, 86300-000, Brazil. Tel.: +433133-3790; E-mail: ; Gustavo Isaza, Systems and Informatics Department, Center for Technology Development - Bioprocess and Agro-industry Plant, Universidad de Caldas, St 65 #26-10, Manizales, Colombia. Tel.: +57(606)8781500 ext 13146; E-mail: , Romain Guyot, IRD, 911 Av. Agropolis, 34394 Montpellier, France. Tel.: +334674160000; E-mail:
| | | | | | - Maradey Arias
- Department of Computer Science, Universidad Autónoma de Manizales, 170001, Caldas, Colombia
| | - Paula A Jaimes
- Department of Computer Science, Universidad Autónoma de Manizales, 170001, Caldas, Colombia
| | - Alexandre Rossi Paschoal
- Corresponding authors. Simon Orozco-Arias, Computer Science Department, Universidad Autónoma de Manizales, Antigua Estación del Ferrocarrill, Manizalez, Colombia. Tel.: +57(606)8727272 - 8727709 Ext 102; E-mail: ; Alexandre Rossi Paschoal, Department of Computer Science, Bioinformatics and Pattern Recognition Group, Graduation Program in Bioinformatics, Federal University of Technology - Paraná, UTFPR, Cornélio Procópio, Paraná, 86300-000, Brazil. Tel.: +433133-3790; E-mail: ; Gustavo Isaza, Systems and Informatics Department, Center for Technology Development - Bioprocess and Agro-industry Plant, Universidad de Caldas, St 65 #26-10, Manizales, Colombia. Tel.: +57(606)8781500 ext 13146; E-mail: , Romain Guyot, IRD, 911 Av. Agropolis, 34394 Montpellier, France. Tel.: +334674160000; E-mail:
| | - Reinel Tabares-Soto
- Department of Electronics and Automation, Universidad Autónoma de Manizales, 170001, Caldas, Colombia
| | - Gustavo Isaza
- Corresponding authors. Simon Orozco-Arias, Computer Science Department, Universidad Autónoma de Manizales, Antigua Estación del Ferrocarrill, Manizalez, Colombia. Tel.: +57(606)8727272 - 8727709 Ext 102; E-mail: ; Alexandre Rossi Paschoal, Department of Computer Science, Bioinformatics and Pattern Recognition Group, Graduation Program in Bioinformatics, Federal University of Technology - Paraná, UTFPR, Cornélio Procópio, Paraná, 86300-000, Brazil. Tel.: +433133-3790; E-mail: ; Gustavo Isaza, Systems and Informatics Department, Center for Technology Development - Bioprocess and Agro-industry Plant, Universidad de Caldas, St 65 #26-10, Manizales, Colombia. Tel.: +57(606)8781500 ext 13146; E-mail: , Romain Guyot, IRD, 911 Av. Agropolis, 34394 Montpellier, France. Tel.: +334674160000; E-mail:
| | - Romain Guyot
- Corresponding authors. Simon Orozco-Arias, Computer Science Department, Universidad Autónoma de Manizales, Antigua Estación del Ferrocarrill, Manizalez, Colombia. Tel.: +57(606)8727272 - 8727709 Ext 102; E-mail: ; Alexandre Rossi Paschoal, Department of Computer Science, Bioinformatics and Pattern Recognition Group, Graduation Program in Bioinformatics, Federal University of Technology - Paraná, UTFPR, Cornélio Procópio, Paraná, 86300-000, Brazil. Tel.: +433133-3790; E-mail: ; Gustavo Isaza, Systems and Informatics Department, Center for Technology Development - Bioprocess and Agro-industry Plant, Universidad de Caldas, St 65 #26-10, Manizales, Colombia. Tel.: +57(606)8781500 ext 13146; E-mail: , Romain Guyot, IRD, 911 Av. Agropolis, 34394 Montpellier, France. Tel.: +334674160000; E-mail:
| |
Collapse
|
5
|
da Silva EMG, Rebello KM, Choi YJ, Gregorio V, Paschoal AR, Mitreva M, McKerrow JH, Neves-Ferreira AGDC, Passetti F. Identification of Novel Genes and Proteoforms in Angiostrongylus costaricensis through a Proteogenomic Approach. Pathogens 2022; 11:1273. [PMID: 36365024 PMCID: PMC9694666 DOI: 10.3390/pathogens11111273] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 10/15/2022] [Accepted: 10/20/2022] [Indexed: 07/22/2023] Open
Abstract
RNA sequencing (RNA-Seq) and mass-spectrometry-based proteomics data are often integrated in proteogenomic studies to assist in the prediction of eukaryote genome features, such as genes, splicing, single-nucleotide (SNVs), and single-amino-acid variants (SAAVs). Most genomes of parasite nematodes are draft versions that lack transcript- and protein-level information and whose gene annotations rely only on computational predictions. Angiostrongylus costaricensis is a roundworm species that causes an intestinal inflammatory disease, known as abdominal angiostrongyliasis (AA). Currently, there is no drug available that acts directly on this parasite, mostly due to the sparse understanding of its molecular characteristics. The available genome of A. costaricensis, specific to the Costa Rica strain, is a draft version that is not supported by transcript- or protein-level evidence. This study used RNA-Seq and MS/MS data to perform an in-depth annotation of the A. costaricensis genome. Our prediction improved the reference annotation with (a) novel coding and non-coding genes; (b) pieces of evidence of alternative splicing generating new proteoforms; and (c) a list of SNVs between the Brazilian (Crissiumal) and the Costa Rica strain. To the best of our knowledge, this is the first time that a multi-omics approach has been used to improve the genome annotation of A. costaricensis. We hope this improved genome annotation can assist in the future development of drugs, kits, and vaccines to treat, diagnose, and prevent AA caused by either the Brazil strain (Crissiumal) or the Costa Rica strain.
Collapse
Affiliation(s)
- Esdras Matheus Gomes da Silva
- Instituto Carlos Chagas, Fiocruz, Curitiba 81350-010, PR, Brazil
- Laboratory of Toxinology, Oswaldo Cruz Institute, Fiocruz, Rio de Janeiro 21040-900, RJ, Brazil
| | - Karina Mastropasqua Rebello
- Laboratory of Toxinology, Oswaldo Cruz Institute, Fiocruz, Rio de Janeiro 21040-900, RJ, Brazil
- Laboratory of Integrated Studies in Protozoology, Oswaldo Cruz Institute, Fiocruz, Rio de Janeiro 21040-360, RJ, Brazil
| | - Young-Jun Choi
- Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Vitor Gregorio
- Bioinformatics and Pattern Recognition Group (Bioinfo-CP), Department of Computer Science (DACOM), Federal University of Technology-Parana (UTFPR), Cornélio Procópio 86300-000, PR, Brazil
| | - Alexandre Rossi Paschoal
- Bioinformatics and Pattern Recognition Group (Bioinfo-CP), Department of Computer Science (DACOM), Federal University of Technology-Parana (UTFPR), Cornélio Procópio 86300-000, PR, Brazil
| | - Makedonka Mitreva
- Department of Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - James H. McKerrow
- Center for Discovery and Innovation in Parasitic Diseases, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, CA 92093, USA
| | | | - Fabio Passetti
- Instituto Carlos Chagas, Fiocruz, Curitiba 81350-010, PR, Brazil
| |
Collapse
|
6
|
Oliveira LS, Patera AC, Domingues DS, Sanches DS, Lopes FM, Bugatti PH, Saito PTM, Maracaja-Coutinho V, Durham AM, Paschoal AR. Computational Analysis of Transposable Elements and CircRNAs in Plants. Methods Mol Biol 2022; 2362:147-172. [PMID: 34195962 DOI: 10.1007/978-1-0716-1645-1_9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
This chapter provides two main contributions: (1) a description of computational tools and databases used to identify and analyze transposable elements (TEs) and circRNAs in plants; and (2) data analysis on public TE and circRNA data. Our goal is to highlight the primary information available in the literature on circular noncoding RNAs and transposable elements in plants. The exploratory analysis performed on publicly available circRNA and TEs data help discuss four sequence features. Finally, we investigate the association on circRNAs:TE in plants in the model organism Arabidopsis thaliana.
Collapse
Affiliation(s)
- Liliane Santana Oliveira
- Department of Computer Science, Federal University of Technology-Paraná (UTFPR), Cornélio Procópio, PR, Brazil. .,Embrapa Soja, Londrina, Paraná, Brazil.
| | - Andressa Caroline Patera
- Department of Computer Science, Federal University of Technology-Paraná (UTFPR), Cornélio Procópio, PR, Brazil
| | - Douglas Silva Domingues
- Department of Computer Science, Federal University of Technology-Paraná (UTFPR), Cornélio Procópio, PR, Brazil.,Group of Genomics and Transcriptomes in Plants, Instituto de Biociências de Rio Claro, Universidade Estadual Paulista (UNESP), Rio Claro, SP, Brazil
| | - Danilo Sipoli Sanches
- Department of Computer Science, Federal University of Technology-Paraná (UTFPR), Cornélio Procópio, PR, Brazil
| | - Fabricio Martins Lopes
- Department of Computer Science, Federal University of Technology-Paraná (UTFPR), Cornélio Procópio, PR, Brazil
| | - Pedro Henrique Bugatti
- Department of Computer Science, Federal University of Technology-Paraná (UTFPR), Cornélio Procópio, PR, Brazil
| | - Priscila Tiemi Maeda Saito
- Department of Computer Science, Federal University of Technology-Paraná (UTFPR), Cornélio Procópio, PR, Brazil
| | - Vinicius Maracaja-Coutinho
- Centro de Modelamiento Molecular, Biofísica y Bioinformática-CM2B2, Facultad de Ciencias Quimicas y Farmaceuticas, Universidad de Chile, Santiago, Chile
| | - Alan Mitchell Durham
- Department of Computer Science, Instituto de Matemática e Estatística, Universidade de São Paulo (USP), Cidade Universitária, SP, Brazil
| | - Alexandre Rossi Paschoal
- Department of Computer Science, Federal University of Technology-Paraná (UTFPR), Cornélio Procópio, PR, Brazil.
| |
Collapse
|
7
|
Abstract
In this era of big data, sets of methodologies and strategies are designed to extract knowledge from huge volumes of data. However, the cost of where and how to get this information accurately and quickly is extremely important, given the diversity of genomes and the different ways of representing that information. Among the huge set of information and relationships that the genome carries, there are sequences called miRNAs (microRNAs). These sequences were described in the 1990s and are mainly involved in mechanisms of regulation and gene expression. Having this in mind, this chapter focuses on exploring the available literature and providing useful and practical guidance on the miRNA database and tools topic. For that, we organized and present this text in two ways: (a) the update reviews and articles, which best summarize and discuss the theme; and (b) our update investigation on miRNA literature and portals about databases and tools. Finally, we present the main challenge and a possible solution to improve resources and tools.
Collapse
Affiliation(s)
- Tharcísio Soares de Amorim
- Department of Computer Science and Bioinformatics and Pattern Recognition Group, Universidade Tecnológica Federal do Paraná (UTFPR), Cornélio Procópio, Brazil
| | - Daniel Longhi Fernandes Pedro
- Department of Computer Science and Bioinformatics and Pattern Recognition Group, Universidade Tecnológica Federal do Paraná (UTFPR), Cornélio Procópio, Brazil
| | - Alexandre Rossi Paschoal
- Department of Computer Science and Bioinformatics and Pattern Recognition Group, Universidade Tecnológica Federal do Paraná (UTFPR), Cornélio Procópio, Brazil.
| |
Collapse
|
8
|
Abstract
Advances in genomic sequencing have recently offered vast opportunities for biological exploration, unraveling the evolution and improving our understanding of Earth biodiversity. Due to distinct plant species characteristics in terms of genome size, ploidy and heterozygosity, transposable elements (TEs) are common characteristics of many genomes. TEs are ubiquitous and dispersed repetitive DNA sequences that frequently impact the evolution and composition of the genome, mainly due to their redundancy and rearrangements. For this study, we provided an atlas of TE data by employing an easy-to-use portal ( APTE website ). To our knowledge, this is the most extensive and standardized analysis of TEs in plant genomes. We evaluated 67 plant genomes assembled at chromosome scale, recovering a total of 49,802,023 TE records, representing a total of 47,992,091,043 (~47,62%) base pairs (bp) of the total genomic space. We observed that new types of TEs were identified and annotated compared to other data repositories. By establishing a standardized catalog of TE annotation on 67 genomes, new hypotheses, exploration of TE data and their influences on the genomes may allow a better understanding of their function and processes. All original code and an example of how we developed the TE annotation strategy is available on GitHub ( Extended data).
Collapse
Affiliation(s)
- Daniel Longhi Fernandes Pedro
- Department of Computer Science; Bioinformatics and Pattern Recognition Group, Graduation Program in Bioinformatics, Federal University of Technology - Paraná (UTFPR), Cornélio Procópio, Paraná, 86300000, Brazil
| | - Tharcisio Soares Amorim
- Department of Computer Science; Bioinformatics and Pattern Recognition Group, Graduation Program in Bioinformatics, Federal University of Technology - Paraná (UTFPR), Cornélio Procópio, Paraná, 86300000, Brazil
| | - Alessandro Varani
- Departament of Agricultural and Environmental Biotechnology, School of Agricultural and Veterinary Sciences, São Paulo State University (UNESP), Jaboticabal, São Paulo, 14884-900, Brazil
| | - Romain Guyot
- Institut de Recherche pour le Développement, IRD, University of Montpellier, Montpellier, France
- Department of Electronics and Automatization, Universidad Autónoma de Manizales, Manizales, Colombia
| | - Douglas Silva Domingues
- Department of Computer Science; Bioinformatics and Pattern Recognition Group, Graduation Program in Bioinformatics, Federal University of Technology - Paraná (UTFPR), Cornélio Procópio, Paraná, 86300000, Brazil
- Group of Genomics and Transcriptomes in Plants, Institute of Biosciences, São Paulo State University (UNESP), Rio Claro, São Paulo, 13506-900, Brazil
| | - Alexandre Rossi Paschoal
- Department of Computer Science; Bioinformatics and Pattern Recognition Group, Graduation Program in Bioinformatics, Federal University of Technology - Paraná (UTFPR), Cornélio Procópio, Paraná, 86300000, Brazil
| |
Collapse
|
9
|
da Cruz MHP, Domingues DS, Saito PTM, Paschoal AR, Bugatti PH. TERL: classification of transposable elements by convolutional neural networks. Brief Bioinform 2020; 22:5900933. [PMID: 34020551 DOI: 10.1093/bib/bbaa185] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 07/07/2020] [Accepted: 07/20/2020] [Indexed: 11/12/2022] Open
Abstract
Transposable elements (TEs) are the most represented sequences occurring in eukaryotic genomes. Few methods provide the classification of these sequences into deeper levels, such as superfamily level, which could provide useful and detailed information about these sequences. Most methods that classify TE sequences use handcrafted features such as k-mers and homology-based search, which could be inefficient for classifying non-homologous sequences. Here we propose an approach, called transposable elements pepresentation learner (TERL), that preprocesses and transforms one-dimensional sequences into two-dimensional space data (i.e., image-like data of the sequences) and apply it to deep convolutional neural networks. This classification method tries to learn the best representation of the input data to classify it correctly. We have conducted six experiments to test the performance of TERL against other methods. Our approach obtained macro mean accuracies and F1-score of 96.4% and 85.8% for superfamilies and 95.7% and 91.5% for the order sequences from RepBase, respectively. We have also obtained macro mean accuracies and F1-score of 95.0% and 70.6% for sequences from seven databases into superfamily level and 89.3% and 73.9% for the order level, respectively. We surpassed accuracy, recall and specificity obtained by other methods on the experiment with the classification of order level sequences from seven databases and surpassed by far the time elapsed of any other method for all experiments. Therefore, TERL can learn how to predict any hierarchical level of the TEs classification system and is about 20 times and three orders of magnitude faster than TEclass and PASTEC, respectively https://github.com/muriloHoracio/TERL. Contact:murilocruz@alunos.utfpr.edu.br.
Collapse
Affiliation(s)
- Murilo Horacio Pereira da Cruz
- Federal University of Technology - Parana (UTFPR), Brazil.,Bioinformatics Graduation Program (PPGBIOINFO), Department of Computer Science, Federal University of Technology - Parana (UTFPR), Brazil
| | - Douglas Silva Domingues
- São Paulo State University at Botucatu, Brazil.,University of São Paulo, Brazil.,Department of Biodiversity, São Paulo State University at Rio Claro, Brazil
| | - Priscila Tiemi Maeda Saito
- Euripides Soares da Rocha University of Marilia, Brazil.,University of São Paulo (ICMC-USP), Brazil.,University of Campinas (IC-UNICAMP), Brazil.,Department of Computing, Federal University of Technology - Parana (UTFPR), Brazil
| | | | - Pedro Henrique Bugatti
- Euripides Soares da Rocha University of Marilia, Brazil.,University of São Paulo (ICMC-USP), Brazil.,Department of Computing, Federal University of Technology - Parana (UTFPR), Brazil
| |
Collapse
|
10
|
Da Fonseca BHR, Domingues DS, Paschoal AR. mirtronDB: a mirtron knowledge base. Bioinformatics 2020; 35:3873-3874. [PMID: 30874795 PMCID: PMC6761972 DOI: 10.1093/bioinformatics/btz153] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Revised: 02/21/2019] [Accepted: 03/14/2019] [Indexed: 01/12/2023] Open
Abstract
Motivation Mirtrons arise from short introns with atypical cleavage by using the splicing mechanism. In the current literature, there is no repository centralizing and organizing the data available to the public. To fill this gap, we developed mirtronDB, the first knowledge database dedicated to mirtron, and it is available at http://mirtrondb.cp.utfpr.edu.br/. MirtronDB currently contains a total of 1407 mirtron precursors and 2426 mirtron mature sequences in 18 species. Results Through a user-friendly interface, users can now browse and search mirtrons by organism, organism group, type and name. MirtronDB is a specialized resource that provides free and user-friendly access to knowledge on mirtron data. Availability and implementation MirtronDB is available at http://mirtrondb.cp.utfpr.edu.br/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bruno Henrique Ribeiro Da Fonseca
- Bioinformatics Graduation Program (PPGBIOINFO), Department of Computer Science, Federal University of Technology - Paraná, Cornélio Procópio, Paraná, Brazil
| | - Douglas Silva Domingues
- Bioinformatics Graduation Program (PPGBIOINFO), Department of Computer Science, Federal University of Technology - Paraná, Cornélio Procópio, Paraná, Brazil.,Department of Botany, Institute of Biosciences, São Paulo State University, UNESP, Rio Claro, São Paulo, Brazil
| | - Alexandre Rossi Paschoal
- Bioinformatics Graduation Program (PPGBIOINFO), Department of Computer Science, Federal University of Technology - Paraná, Cornélio Procópio, Paraná, Brazil
| |
Collapse
|
11
|
Paschoal AR, Lozada-Chávez I, Domingues DS, Stadler PF. ceRNAs in plants: computational approaches and associated challenges for target mimic research. Brief Bioinform 2019; 19:1273-1289. [PMID: 28575144 DOI: 10.1093/bib/bbx058] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Accepted: 04/27/2017] [Indexed: 11/13/2022] Open
Abstract
The competing endogenous RNA hypothesis has gained increasing attention as a potential global regulatory mechanism of microRNAs (miRNAs), and as a powerful tool to predict the function of many noncoding RNAs, including miRNAs themselves. Most studies have been focused on animals, although target mimic (TMs) discovery as well as important computational and experimental advances has been developed in plants over the past decade. Thus, our contribution summarizes recent progresses in computational approaches for research of miRNA:TM interactions. We divided this article in three main contributions. First, a general overview of research on TMs in plants is presented with practical descriptions of the available literature, tools, data, databases and computational reports. Second, we describe a common protocol for the computational and experimental analyses of TM. Third, we provide a bioinformatics approach for the prediction of TM motifs potentially cross-targeting both members within the same or from different miRNA families, based on the identification of consensus miRNA-binding sites from known TMs across sequenced genomes, transcriptomes and known miRNAs. This computational approach is promising because, in contrast to animals, miRNA families in plants are large with identical or similar members, several of which are also highly conserved. From the three consensus TM motifs found with our approach: MIM166, MIM171 and MIM159/319, the last one has found strong support on the recent experimental work by Reichel and Millar [Specificity of plant microRNA TMs: cross-targeting of mir159 and mir319. J Plant Physiol 2015;180:45-8]. Finally, we stress the discussion on the major computational and associated experimental challenges that have to be faced in future ceRNA studies.
Collapse
Affiliation(s)
| | - Irma Lozada-Chávez
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Germany
| | - Douglas Silva Domingues
- Department of Botany, Institute of Biosciences, S~ao Paulo State University (UNESP) in Rio Claro, Brazil
| | | |
Collapse
|
12
|
Bellini RG, Coronado MA, Paschoal AR, Gaudencio do Rêgo T, Hungria M, Ribeiro de Vasconcelos AT, Nicolás MF. Structural analysis of a novel N-carbamoyl-d-amino acid amidohydrolase from a Brazilian Bradyrhizobium japonicum strain: In silico insights by molecular modelling, docking and molecular dynamics. J Mol Graph Model 2018; 86:35-42. [PMID: 30336451 DOI: 10.1016/j.jmgm.2018.10.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2018] [Revised: 10/06/2018] [Accepted: 10/08/2018] [Indexed: 10/28/2022]
Abstract
In this work we performed several in silico analyses to describe the relevant structural aspects of an enzyme N-Carbamoyl-d-amino acid amidohydrolase (d-NCAase) encoded on the genome of the Brazilian strain CPAC 15 (=SEMIA 5079) of Bradyrhizobium japonicum, a nonpathogenic species belonging to the order Rhizobiales. d-NCAase has wide applications particularly in the pharmaceutical industry, since it catalyzes the production of d-amino acids such as D-p-hydroxyphenylglycine (D-HPG), an intermediate in the synthesis of β-lactam antibiotics. We applied a homology modelling approach and 50 ns of molecular dynamics simulations to predict the structure and the intersubunit interactions of this novel d-NCAase. Also, in order to evaluate the substrate binding site, the model was subjected to 50 ns of molecular dynamics simulations in the presence of N-Carbamoyl-d-p-hydroxyphenylglycine (Cp-HPG) (a d-NCAase canonical substrate) and water-protein/water-substrate interactions analyses were performed. Overall, the structural analysis and the molecular dynamics simulations suggest that d-NCAase of B. japonicum CPAC-15 has a homodimeric structure in solution. Here, we also examined the substrate specificity of the catalytic site of our model and the interactions with water molecules into the active binding site were comprehensively discussed. Also, these simulations showed that the amino acids Lys123, His125, Pro127, Cys172, Asp174 and Arg176 are responsible for recognition of ligand in the active binding site through several chemical associations, such as hydrogen bonds and hydrophobic interactions. Our results show a favourable environment for a reaction of hydrolysis that transforms N-Carbamoyl-d-p-hydroxyphenylglycine (Cp-HPG) into the active compound D-p-hydroxyphenylglycine (D-HPG). This work envisage the use of d-NCAase from the Brazilian Bradyrhizobium japonicum strain CPAC-15 (=SEMIA 5079) for the industrial production of D-HPG, an important intermediate for semi-synthesis of β-lactam antibiotics such as penicillins, cephalosporins and amoxicillin.
Collapse
Affiliation(s)
- Reinaldo G Bellini
- Laboratório Nacional de Computação Científica, Petrópolis, Rio de Janeiro, Brazil
| | - Mônika Aparecida Coronado
- Centro Multiusuário de Inovação Biomolecular, Departamento de Física, Universidade, Estadual Paulista (UNESP), São José do Rio Preto, 15054-000, SP, Brazil.
| | - Alexandre Rossi Paschoal
- Federal University of Technology - Paraná, Avenida Alberto Carazzai, 1640, 86300-000, Cornélio Procópio, PR, Brazil.
| | - Thaís Gaudencio do Rêgo
- Universidade Federal da Paraíba, Centro de Informática, Rua dos Escoteiros, S/N, João Pessoa, PB, 58055-000, Brazil.
| | | | | | | |
Collapse
|
13
|
Wolf IR, Paschoal AR, Quiroga C, Domingues DS, de Souza RF, Pretto-Giordano LG, Vilas-Boas LA. Functional annotation and distribution overview of RNA families in 27 Streptococcus agalactiae genomes. BMC Genomics 2018; 19:556. [PMID: 30055586 PMCID: PMC6064168 DOI: 10.1186/s12864-018-4951-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Accepted: 07/22/2018] [Indexed: 01/08/2023] Open
Abstract
Background Streptococcus agalactiae, also known as Group B Streptococcus (GBS), is a Gram-positive bacterium that colonizes the gastrointestinal and genitourinary tract of humans. This bacterium has also been isolated from various animals, such as fish and cattle. Non-coding RNAs (ncRNAs) can act as regulators of gene expression in bacteria, such as Streptococcus pneumoniae and Streptococcus pyogenes. However, little is known about the genomic distribution of ncRNAs and RNA families in S. agalactiae. Results Comparative genome analysis of 27 S. agalactiae strains showed more than 5 thousand genomic regions identified and classified as Core, Exclusive, and Shared genome sequences. We identified 27 to 89 RNA families per genome distributed over these regions, from these, 25 were in Core regions while Shared and Exclusive regions showed variations amongst strains. We propose that the amount and type of ncRNA present in each genome can provide a pattern to contribute in the identification of the clonal types. Conclusions The identification of RNA families provides an insight over ncRNAs, sRNAs and ribozymes function, that can be further explored as targets for antibiotic development or studied in gene regulation of cellular processes. RNA families could be considered as markers to determine infection capabilities of different strains. Lastly, pan-genome analysis of GBS including the full range of functional transcripts provides a broader approach in the understanding of this pathogen. Electronic supplementary material The online version of this article (10.1186/s12864-018-4951-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ivan Rodrigo Wolf
- Departamento de Biologia Geral, Centro de Ciências Biológicas, Universidade Estadual de Londrina, Londrina, Paraná, Brazil.
| | - Alexandre Rossi Paschoal
- Universidade Tecnológica Federal do Paraná, Campus Cornélio Procópio, Cornélio Procópio, Paraná, Brazil.
| | - Cecilia Quiroga
- Universidad de Buenos Aires, Consejo Nacional de Investigaciones Científicas y Tecnológicas, Instituto de Investigaciones en Microbiología y Parasitología Médica (IMPAM), Facultad de Medicina, Buenos Aires, Argentina
| | - Douglas Silva Domingues
- Departamento de Botânica, Instituto de Biociências de Rio Claro, Universidade Estadual Paulista Júlio de Mesquita Filho, Rio Claro, São Paulo, Brazil
| | - Rogério Fernandes de Souza
- Departamento de Biologia Geral, Centro de Ciências Biológicas, Universidade Estadual de Londrina, Londrina, Paraná, Brazil
| | | | - Laurival Antonio Vilas-Boas
- Departamento de Biologia Geral, Centro de Ciências Biológicas, Universidade Estadual de Londrina, Londrina, Paraná, Brazil
| |
Collapse
|
14
|
Fukutani E, Rodrigues M, Kasprzykowski JI, Araujo CFD, Paschoal AR, Ramos PIP, Fukutani KF, Queiroz ATLD. Follow up of a robust meta-signature to identify Zika virus infection in Aedes aegypti: another brick in the wall. Mem Inst Oswaldo Cruz 2018; 113:e180053. [PMID: 29846381 PMCID: PMC5965457 DOI: 10.1590/0074-02760180053] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Accepted: 04/17/2018] [Indexed: 11/22/2022] Open
Abstract
The mosquito Aedes aegypti is the main vector of several arthropod-borne diseases that have global impacts. In a previous meta-analysis, our group identified a vector gene set containing 110 genes strongly associated with infections of dengue, West Nile and yellow fever viruses. Of these 110 genes, four genes allowed a highly accurate classification of infected status. More recently, a new study of Ae. aegypti infected with Zika virus (ZIKV) was published, providing new data to investigate whether this “infection” gene set is also altered during a ZIKV infection. Our hypothesis is that the infection-associated signature may also serve as a proxy to classify the ZIKV infection in the vector. Raw data associated with the NCBI/BioProject were downloaded and re-analysed. A total of 18 paired-end replicates corresponding to three ZIKV-infected samples and three controls were included in this study. The nMDS technique with a logistic regression was used to obtain the probabilities of belonging to a given class. Thus, to compare both gene sets, we used the area under the curve and performed a comparison using the bootstrap method. Our meta-signature was able to separate the infected mosquitoes from the controls with good predictive power to classify the Zika-infected mosquitoes.
Collapse
Affiliation(s)
- Eduardo Fukutani
- Instituto Gonçalo Moniz, Fundação Oswaldo Cruz-Fiocruz, Salvador, BA, Brasil
| | - Moreno Rodrigues
- Instituto Gonçalo Moniz, Fundação Oswaldo Cruz-Fiocruz, Salvador, BA, Brasil
| | | | | | | | | | | | | |
Collapse
|
15
|
Negri TDC, Alves WAL, Bugatti PH, Saito PTM, Domingues DS, Paschoal AR. Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants. Brief Bioinform 2018; 20:682-689. [DOI: 10.1093/bib/bby034] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2017] [Revised: 03/30/2018] [Indexed: 01/04/2023] Open
Affiliation(s)
- Tatianne da Costa Negri
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio, Procópio, Brazil and Informatics and Knowledge Management Graduate Program, Universidade Nove de Julho, São Paulo, Brazil
| | | | - Pedro Henrique Bugatti
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio, Procópio, Brazil
| | - Priscila Tiemi Maeda Saito
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio, Procópio, Brazil
| | - Douglas Silva Domingues
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio, Procópio, Brazil and Department of Botany, Institute of Biosciences, São Paulo State University, UNESP, Rio Claro, SP, Brazil
| | - Alexandre Rossi Paschoal
- Department of Computer Science, Bioinformatics Graduate Program (PPGBIOINFO), Federal University of Technology - Paraná, UTFPR, Campus Cornélio, Procópio, Brazil
| |
Collapse
|
16
|
Fukutani KF, Kasprzykowski JI, Paschoal AR, Gomes MDS, Barral A, de Oliveira CI, Ramos PIP, de Queiroz ATL. Meta-Analysis of Aedes aegypti Expression Datasets: Comparing Virus Infection and Blood-Fed Transcriptomes to Identify Markers of Virus Presence. Front Bioeng Biotechnol 2018; 5:84. [PMID: 29376049 PMCID: PMC5768613 DOI: 10.3389/fbioe.2017.00084] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2017] [Accepted: 12/15/2017] [Indexed: 02/05/2023] Open
Abstract
The mosquito Aedes aegypti (L.) is vector of several arboviruses including dengue, yellow fever, chikungunya, and more recently zika. Previous transcriptomic studies have been performed to elucidate altered pathways in response to viral infection. However, the intrinsic coupling between alimentation and infection were unappreciated in these studies. Feeding is required for the initial mosquito contact with the virus and these events are highly dependent. Addressing this relationship, we reinterrogated datasets of virus-infected mosquitoes with two different diet schemes (fed and unfed mosquitoes), evaluating the metabolic cross-talk during both processes. We constructed coexpression networks with the differentially expressed genes of these comparison: virus-infected versus blood-fed mosquitoes and virus-infected versus unfed mosquitoes. Our analysis identified one module with 110 genes that correlated with infection status (representing ~0.7% of the A. aegypti genome). Furthermore, we performed a machine-learning approach and summarized the infection status using only four genes (AAEL012128, AAEL014210, AAEL002477, and AAEL005350). While three of the four genes were annotated as hypothetical proteins, AAEL012128 gene is a membrane amino acid transporter correlated with viral envelope binding. This gene alone is able to discriminate all infected samples and thus should have a key role to discriminate viral infection in the A. aegypti mosquito. Moreover, validation using external datasets found this gene as differentially expressed in four transcriptomic experiments. Therefore, these genes may serve as a proxy of viral infection in the mosquito and the others 106 identified genes provides a framework to future studies.
Collapse
Affiliation(s)
| | - José Irahe Kasprzykowski
- Instituto Gonçalo Moniz, Fundação Oswaldo Cruz (FIOCRUZ), Salvador, Brazil.,Post-Graduation Program in Biotechnology in Health and Investigative Medicine, Fundação Oswaldo Cruz (FIOCRUZ), Salvador, Brazil
| | - Alexandre Rossi Paschoal
- Federal University of Technology-Paraná, UTFPR, Campus Cornélio Procópio, Cornélio Procópio, Brazil
| | | | - Aldina Barral
- Instituto Gonçalo Moniz, Fundação Oswaldo Cruz (FIOCRUZ), Salvador, Brazil.,Post-Graduation Program in Health Sciences, School of Medicine, Federal University of Bahia, Salvador, Brazil
| | - Camila I de Oliveira
- Instituto Gonçalo Moniz, Fundação Oswaldo Cruz (FIOCRUZ), Salvador, Brazil.,Post-Graduation Program in Health Sciences, School of Medicine, Federal University of Bahia, Salvador, Brazil
| | | | - Artur Trancoso Lopo de Queiroz
- Instituto Gonçalo Moniz, Fundação Oswaldo Cruz (FIOCRUZ), Salvador, Brazil.,Post-Graduation Program in Biotechnology in Health and Investigative Medicine, Fundação Oswaldo Cruz (FIOCRUZ), Salvador, Brazil.,Post-Graduation Program in Applied Computation, Universida de Estadual de Feira de Santana, Feira de Santana, Brazil
| |
Collapse
|
17
|
Pedro DLF, Lorenzetti APR, Domingues DS, Paschoal AR. PlaNC-TE: a comprehensive knowledgebase of non-coding RNAs and transposable elements in plants. Database (Oxford) 2018; 2018:1-7. [PMID: 30101318 PMCID: PMC6146122 DOI: 10.1093/database/bay078] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/19/2018] [Accepted: 06/28/2018] [Indexed: 12/20/2022]
Abstract
Transposable elements (TEs) play an essential role in the genetic variability of eukaryotic species. In plants, they may comprise up to 90% of the total genome. Non-coding RNAs (ncRNAs) are known to control gene expression and regulation. Although the relationship between ncRNAs and TEs is known, obtaining the organized data for sequenced genomes is not straightforward. In this study, we describe the PlaNC-TE (http://planc-te.cp.utfpr.edu.br), a user-friendly portal harboring a knowledgebase created by integrating and analysing plant ncRNA-TE data. We identified a total of 14 350 overlaps between ncRNAs and TEs in 40 plant genomes. The database allows users to browse, search and download all ncRNA and TE data analysed. Overall, PlaNC-TE not only organizes data and provides insights about the relationship between ncRNA and TEs in plants but also helps improve genome annotation strategies. Moreover, this is the first database to provide resources to broadly investigate functions and mechanisms involving TEs and ncRNAs in plants.
Collapse
Affiliation(s)
- Daniel Longhi Fernandes Pedro
- Department of Computer Science, Bioinformatics Graduation Program (PPGBIOINFO), Federal University of Technology - Paraná, Cornélio Procópio, PR, Brazil
| | | | - Douglas Silva Domingues
- Department of Computer Science, Bioinformatics Graduation Program (PPGBIOINFO), Federal University of Technology - Paraná, Cornélio Procópio, PR, Brazil.,Department of Botany, Institute of Biosciences, São Paulo State University, UNESP, Rio Claro, SP, Brazil
| | - Alexandre Rossi Paschoal
- Department of Computer Science, Bioinformatics Graduation Program (PPGBIOINFO), Federal University of Technology - Paraná, Cornélio Procópio, PR, Brazil
| |
Collapse
|
18
|
de Souza MF, Kuasne H, Barros-Filho MDC, Cilião HL, Marchi FA, Fuganti PE, Paschoal AR, Rogatto SR, Cólus IMDS. Circulating mRNAs and miRNAs as candidate markers for the diagnosis and prognosis of prostate cancer. PLoS One 2017; 12:e0184094. [PMID: 28910345 PMCID: PMC5598937 DOI: 10.1371/journal.pone.0184094] [Citation(s) in RCA: 69] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2017] [Accepted: 08/17/2017] [Indexed: 02/07/2023] Open
Abstract
Circulating nucleic acids are found in free form in body fluids and may serve as minimally invasive tools for cancer diagnosis and prognosis. Only a few studies have investigated the potential application of circulating mRNAs and microRNAs (miRNAs) in prostate cancer (PCa). The Cancer Genome Atlas (TCGA) database was used for an in silico analysis to identify circulating mRNA and miRNA as potential markers of PCa. A total of 2,267 genes and 49 miRNAs were differentially expressed between normal and tumor samples. The prediction analyses of target genes and integrative analysis of mRNA and miRNA expression revealed eleven genes and eight miRNAs which were validated by RT-qPCR in plasma samples from 102 untreated PCa patients and 50 cancer-free individuals. Two genes, OR51E2 and SIM2, and two miRNAs, miR-200c and miR-200b, showed significant association with PCa. Expression levels of these transcripts distinguished PCa patients from controls (67% sensitivity and 75% specificity). PCa patients and controls with prostate-specific antigen (PSA) ≤ 4.0 ng/mL were discriminated based on OR51E2 and SIM2 expression levels. The miR-200c expression showed association with Gleason score and miR-200b, with bone metastasis, bilateral tumor, and PSA > 10.0 ng/mL. The combination of circulating mRNA and miRNA was useful for the diagnosis and prognosis of PCa.
Collapse
Affiliation(s)
| | - Hellen Kuasne
- CIPE, AC Camargo Cancer Center, São Paulo, São Paulo, Brazil
| | | | | | | | | | - Alexandre Rossi Paschoal
- Department of Computing, Federal University of Technology—Paraná, UTFPR, Cornélio Procópio, Paraná, Brazil
| | - Silvia Regina Rogatto
- CIPE, AC Camargo Cancer Center, São Paulo, São Paulo, Brazil
- Department of Clinical Genetics, Vejle Hospital and Institute of Regional Health Research, University of Southern Denmark, Vejle, Denmark
| | | |
Collapse
|
19
|
Severino P, Oliveira LS, Andreghetto FM, Torres N, Curioni O, Cury PM, Toporcov TN, Paschoal AR, Durham AM. Abstract 3983: MicroRNAs and other small RNA molecules expressed in metastatic and non-metastatic oral squamous cell carcinoma. Cancer Res 2015. [DOI: 10.1158/1538-7445.am2015-3983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Small noncoding RNAs are regulatory molecules that play important roles in several aspects of cellular biology. They vary from 18 to 30 nucleotides in length and often act through the inactivation of complementary sequences. A variety of small RNA classes have been identified to date and the list is constantly growing, owing to the advent of new sequencing technologies. Among these molecules, microRNAs are the most extensively studied. They are known to play important roles in human diseases, including cancer. More recently, genome-wide analyses have showed that changes in the expression levels of other small noncoding RNAs are also associated with cancer, with correlations between RNA abundance and disease status. Oral squamous cell carcinoma, associated with chronic tobacco and alcohol consumption, is among the leading cancers in the world. The presence of cervical lymph node metastases is currently its strongest prognostic factor. The possibility to evaluate its metastatic potential is relevant to the clinical and molecular oncologist due to frequent asymptomatic development of such cancer in its early stages. In this study small RNA libraries from 30 oral squamous cell carcinoma samples were sequenced for the identification and quantification of known small RNAs. Samples were divided in two groups: those presenting lymph node metastasis at the time of diagnosis and those that did not present this characteristic. Additionally, plasma of 30 patients was accessed for the presence of microRNAs identified as differentially expressed between metastatic and non-metastatic tumor samples as a means to identify circulating molecules that could be considered potential biomarkers for lymph node metastasis. Global expression patterns of small RNA molecules were not associated with cervical metastases. MiR-21, miR-203 and miR-205 were highly expressed throughout samples, in agreement with their role in epithelial cell biology, but disagreeing with studies correlating these molecules with cancer invasion. Nineteen microRNAs, but no other small RNA class, varied consistently between metastatic and non-metastatic cancer samples, some of which were also detected at corresponding levels in plasma. MiR-31 and miR-130b, known to inhibit several steps in the metastatic process, were over-expressed in non-metastatic samples and the expression of miR-130b was confirmed in plasma of patients presenting non-metastatic oral squamous cell carcinoma. MiR-181 and miR-296 were associated with metastasis but were not detected in plasma. In conclusion, we identified microRNAs linked to the metastatic status in a group of patients diagnosed with oral squamous cell carcinoma, both in tissue and plasma samples. We also demonstrate that other small RNA classes are expressed and should, therefore, be involved and contribute to the metastatic phenotype of this disease.
Citation Format: Patricia Severino, Liliane Santana Oliveira, Flavia Maziero Andreghetto, Natalia Torres, Otavio Curioni, Patricia Maluf Cury, Tatiana Natasha Toporcov, Alexandre Rossi Paschoal, Alan Mitchell Durham. MicroRNAs and other small RNA molecules expressed in metastatic and non-metastatic oral squamous cell carcinoma. [abstract]. In: Proceedings of the 106th Annual Meeting of the American Association for Cancer Research; 2015 Apr 18-22; Philadelphia, PA. Philadelphia (PA): AACR; Cancer Res 2015;75(15 Suppl):Abstract nr 3983. doi:10.1158/1538-7445.AM2015-3983
Collapse
Affiliation(s)
- Patricia Severino
- 1Albert Einstein Research & Education Institute, Hospital Israelita Albert Einstein, São Paulo, Brazil
| | - Liliane Santana Oliveira
- 1Albert Einstein Research & Education Institute, Hospital Israelita Albert Einstein, São Paulo, Brazil
| | | | - Natalia Torres
- 1Albert Einstein Research & Education Institute, Hospital Israelita Albert Einstein, São Paulo, Brazil
| | | | | | | | | | - Alan Mitchell Durham
- 6Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo, Brazil
| |
Collapse
|
20
|
Severino P, Oliveira LS, Andreghetto FM, Torres N, Curioni O, Cury PM, Toporcov TN, Paschoal AR, Durham AM. Small RNAs in metastatic and non-metastatic oral squamous cell carcinoma. BMC Med Genomics 2015; 8:31. [PMID: 26104160 PMCID: PMC4479233 DOI: 10.1186/s12920-015-0102-4] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2014] [Accepted: 05/29/2015] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Small non-coding regulatory RNAs control cellular functions at the transcriptional and post-transcriptional levels. Oral squamous cell carcinoma is among the leading cancers in the world and the presence of cervical lymph node metastases is currently its strongest prognostic factor. In this work we aimed at finding small RNAs expressed in oral squamous cell carcinoma that could be associated with the presence of lymph node metastasis. METHODS Small RNA libraries from metastatic and non-metastatic oral squamous cell carcinomas were sequenced for the identification and quantification of known small RNAs. Selected markers were validated in plasma samples. Additionally, we used in silico analysis to investigate possible new molecules, not previously described, involved in the metastatic process. RESULTS Global expression patterns were not associated with cervical metastases. MiR-21, miR-203 and miR-205 were highly expressed throughout samples, in agreement with their role in epithelial cell biology, but disagreeing with studies correlating these molecules with cancer invasion. Eighteen microRNAs, but no other small RNA class, varied consistently between metastatic and non-metastatic samples. Nine of these microRNAs had been previously detected in human plasma, eight of which presented consistent results between tissue and plasma samples. MiR-31 and miR-130b, known to inhibit several steps in the metastatic process, were over-expressed in non-metastatic samples and the expression of miR-130b was confirmed in plasma of patients showing no metastasis. MiR-181 and miR-296 were detected in metastatic tumors and the expression of miR-296 was confirmed in plasma of patients presenting metastasis. A novel microRNA-like molecule was also associated with non-metastatic samples, potentially targeting cell-signaling mechanisms. CONCLUSIONS We corroborate literature data on the role of small RNAs in cancer metastasis and suggest the detection of microRNAs as a tool that may assist in the evaluation of oral squamous cell carcinoma metastatic potential.
Collapse
Affiliation(s)
- Patricia Severino
- Albert Einstein Research and Education Institute, Hospital Israelita Albert Einstein, Sao Paulo, SP, Brazil.
| | - Liliane Santana Oliveira
- Albert Einstein Research and Education Institute, Hospital Israelita Albert Einstein, Sao Paulo, SP, Brazil.
| | - Flávia Maziero Andreghetto
- Albert Einstein Research and Education Institute, Hospital Israelita Albert Einstein, Sao Paulo, SP, Brazil.
| | - Natalia Torres
- Albert Einstein Research and Education Institute, Hospital Israelita Albert Einstein, Sao Paulo, SP, Brazil.
| | - Otávio Curioni
- Hospital Heliopolis, Departamento de Cirurgia e Otorrinolaringologia, Sao Paulo, SP, Brazil.
| | | | - Tatiana Natasha Toporcov
- Departamento de Epidemiologia, Faculdade de Saúde Pública, University of Sao Paulo, Sao Paulo, SP, Brazil.
| | | | - Alan Mitchell Durham
- Instituto de Matemática e Estatística, University of Sao Paulo, Sao Paulo, SP, Brazil.
| |
Collapse
|
21
|
Paschoal AR, Fernandes EDM, Silva JC, Lopes FM, Pereira LFP, Domingues DS. CoffeebEST: an integrated resource for Coffea spp expressed sequence tags. Genet Mol Res 2014; 13:10913-20. [PMID: 25526212 DOI: 10.4238/2014.december.19.13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Coffee is one of the most important commodities in the world, and its production relies mainly on two species, Coffea arabica and Coffea canephora. Although there are diverse transcriptome datasets available for coffee trees, few research groups have exploited the potential knowledge contained in these data, especially with respect to fruit and seed development. Here, we present a comparative analysis of the transcriptomes of Coffea arabica and Coffea canephora with a focus on fruit development using publicly available expressed sequence tags (ESTs). Most of the fruit and seed EST data has been obtained from C. canephora. Therefore, we performed a fruit EST analysis of the 5 developmental stages of this species (18, 22, 30, 42, and 46 weeks after flowering) comprising 29,009 sequences. We compared C. canephora fruit ESTs to reference unigenes of C. canephora (7710 contigs and 8955 singletons) and C. arabica (15,656 contigs and 16,351 singletons). Additional analyses included functional annotation based on Gene Onthology, as well as an annotation using PlantCyc, a curated plant protein database. The Coffee Bean EST (CoffeebEST) is a public database available at http://bioinfo-02.cp.utfpr.edu.br/. This database represents an additional resource for the coffee scientific community, offering a user-friendly collection of information for non-specialists in coffee molecular biology to support experimental research on comparative and functional genomics.
Collapse
Affiliation(s)
- A R Paschoal
- Universidade Tecnológica Federal do Paraná, Cornélio Procópio, PR, Brasil
| | - E D M Fernandes
- Universidade Tecnológica Federal do Paraná, Cornélio Procópio, PR, Brasil
| | - J C Silva
- Universidade Tecnológica Federal do Paraná, Cornélio Procópio, PR, Brasil
| | - F M Lopes
- Universidade Tecnológica Federal do Paraná, Cornélio Procópio, PR, Brasil
| | - L F P Pereira
- Laboratório de Biotecnologia Vegetal, Instituto Agronômico do Paraná, Londrina, PR, Brasil
| | - D S Domingues
- Laboratório de Biotecnologia Vegetal, Instituto Agronômico do Paraná, Londrina, PR, Brasil
| |
Collapse
|
22
|
Paschoal AR, Maracaja-Coutinho V, Setubal JC, Simões ZLP, Verjovski-Almeida S, Durham AM. Non-coding transcription characterization and annotation. RNA Biol 2014; 9:274-82. [DOI: 10.4161/rna.19352] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
|
23
|
Abstract
MicroRNAs (miRNAs) are small molecules, noncoding proteins that are involved in many biological processes, especially in plants; among these processes is nodulation in the legume. Biological nitrogen fixation is a key process, with critical importance to the soybean crop. This study aimed to identify the potential of novel miRNAs to act during the root nodulation process. We utilized a set of transcripts that were differentially expressed in soybean roots 10 days after inoculation with Bradyrhizobium japonicum, which were obtained in a previous study, and performed a set of computational analyses that led us to select new miRNAs potentially involved in nodulation. Among these analyses, the set of transcripts were submitted to an in silico annotation of noncoding RNAs, including a search of similarity against miRNA public databases, ab initio tools for miRNA identification, structural search against miRNA families, prediction of the secondary structure of miRNA precursors, and prediction of the sequences of mature miRNAs. Subsequently, we applied filter procedures based on miRNA selections described in the literature (e.g., free energy value). In the next step, a manual curation inspection of the annotation was performed and the top candidates were selected and used for prediction of potential target genes, which were later checked manually in the database of the soybean genome. This prediction led us to the identification of 9 potential new miRNAs; among these, 4 were conserved in other plants. Moreover, we predicted their target genes might play important roles in the regulation of nodulation.
Collapse
Affiliation(s)
- G A Barros-Carvalho
- Centro Nacional de Pesquisa de Soja, Empresa Brasileira de Pesquisa Agropecuária, Londrina, PR, Brasil
| | - A R Paschoal
- Universidade Tecnológica Federal do Paraná, Cornélio Procópio, PR, Brasil
| | - F C Marcelino-Guimarães
- Centro Nacional de Pesquisa de Soja, Empresa Brasileira de Pesquisa Agropecuária, Londrina, PR, Brasil
| | - M Hungria
- Centro Nacional de Pesquisa de Soja, Empresa Brasileira de Pesquisa Agropecuária, Londrina, PR, Brasil
| |
Collapse
|
24
|
Severino P, Oliveira LS, Torres N, Andreghetto FM, Klingbeil MDFG, Moyses R, Wünsch-Filho V, Nunes FD, Mathor MB, Paschoal AR, Durham AM. High-throughput sequencing of small RNA transcriptomes reveals critical biological features targeted by microRNAs in cell models used for squamous cell cancer research. BMC Genomics 2013; 14:735. [PMID: 24160351 PMCID: PMC3870990 DOI: 10.1186/1471-2164-14-735] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2012] [Accepted: 10/17/2013] [Indexed: 11/11/2022] Open
Abstract
Background The implication of post-transcriptional regulation by microRNAs in molecular mechanisms underlying cancer disease is well documented. However, their interference at the cellular level is not fully explored. Functional in vitro studies are fundamental for the comprehension of their role; nevertheless results are highly dependable on the adopted cellular model. Next generation small RNA transcriptomic sequencing data of a tumor cell line and keratinocytes derived from primary culture was generated in order to characterize the microRNA content of these systems, thus helping in their understanding. Both constitute cell models for functional studies of microRNAs in head and neck squamous cell carcinoma (HNSCC), a smoking-related cancer. Known microRNAs were quantified and analyzed in the context of gene regulation. New microRNAs were investigated using similarity and structural search, ab initio classification, and prediction of the location of mature microRNAs within would-be precursor sequences. Results were compared with small RNA transcriptomic sequences from HNSCC samples in order to access the applicability of these cell models for cancer phenotype comprehension and for novel molecule discovery. Results Ten miRNAs represented over 70% of the mature molecules present in each of the cell types. The most expressed molecules were miR-21, miR-24 and miR-205, Accordingly; miR-21 and miR-205 have been previously shown to play a role in epithelial cell biology. Although miR-21 has been implicated in cancer development, and evaluated as a biomarker in HNSCC progression, no significant expression differences were seen between cell types. We demonstrate that differentially expressed mature miRNAs target cell differentiation and apoptosis related biological processes, indicating that they might represent, with acceptable accuracy, the genetic context from which they derive. Most miRNAs identified in the cancer cell line and in keratinocytes were present in tumor samples and cancer-free samples, respectively, with miR-21, miR-24 and miR-205 still among the most prevalent molecules at all instances. Thirteen miRNA-like structures, containing reads identified by the deep sequencing, were predicted from putative miRNA precursor sequences. Strong evidences suggest that one of them could be a new miRNA. This molecule was mostly expressed in the tumor cell line and HNSCC samples indicating a possible biological function in cancer. Conclusions Critical biological features of cells must be fully understood before they can be chosen as models for functional studies. Expression levels of miRNAs relate to cell type and tissue context. This study provides insights on miRNA content of two cell models used for cancer research. Pathways commonly deregulated in HNSCC might be targeted by most expressed and also by differentially expressed miRNAs. Results indicate that the use of cell models for cancer research demands careful assessment of underlying molecular characteristics for proper data interpretation. Additionally, one new miRNA-like molecule with a potential role in cancer was identified in the cell lines and clinical samples.
Collapse
Affiliation(s)
- Patricia Severino
- Albert Einstein Research and Education Institute, Hospital Israelita Albert Einstein, Sao Paulo, SP, Brazil.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|