1
|
Volatile Evolution of Long Non-Coding RNA Repertoire in Retinal Pigment Epithelium: Insights from Comparison of Bovine and Human RNA Expression Profiles. Genes (Basel) 2019; 10:genes10030205. [PMID: 30857256 PMCID: PMC6471466 DOI: 10.3390/genes10030205] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Revised: 02/22/2019] [Accepted: 02/26/2019] [Indexed: 11/17/2022] Open
Abstract
Currently, several long non-coding RNAs (lncRNAs) (TUG1, MALAT1, MEG3 and others) have been discovered to regulate normal visual function and may potentially contribute to dysfunction of the retina. We decided to extend these analyses of lncRNA genes to the retinal pigment epithelium (RPE) to determine whether there is conservation of RPE-expressed lncRNA between human and bovine genomes. We reconstructed bovine RPE lncRNAs based on genome-guided assembly. Next, we predicted homologous human transcripts based on whole genome alignment. We found a small set of conserved lncRNAs that could be involved in signature RPE functions that are conserved across mammals. However, the fraction of conserved lncRNAs in the overall pool of lncRNA found in RPE appeared to be very small (less than 5%), perhaps reflecting a fast and flexible adaptation of the mammalian eye to various environmental conditions.
Collapse
|
2
|
PONOMARENKO JULIA, ORLOVA GALINA, MERKULOVA TATYANA, VASILIEV GENNADY, PONOMARENKO MIKHAIL. MINING GENOME VARIATION TO ASSOCIATE GENETIC DISEASE WITH MUTATION ALTERATIONS AND ORTHO/PARALOGOUS POLIMORPHYSMS IN TRANSCRIPTION FACTOR BINDING SITE. INT J ARTIF INTELL T 2011. [DOI: 10.1142/s0218213005002284] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
We have developed a system rSNP_Guide, , predicting the transcription factor (TF) binding sites on DNA, which mutation-caused alterations may explain disease penetration. rSNP_Guide uses the detected alterations in the mutant DNA binding to unknown TF caused by diseases and, upon the DNA sequences, calculates the alterations in known TF sites so that to select only the known ones with calculated alterations in the best consistence with those detected. Our system has been control tested on the SNP's with known site-disease relationships. For practical aims, two TF sites associated with diseases were predicted and confirmed by the immune assay with anti-TF antibodies. In the case of tumor susceptibility, the GATA site in the second intron of mouse K-ras gene was truly predicted, whereas mutation damage of this site causes tumor resistance. In the case of alcohol dependencies and others behavioral diseases, the mutation-caused spurious YY1 site in the sixth intron of human tryptophan 2,3-dioxygenase (TDO2) gene was successfully predicted. Finally, sixteen non-documented TF sites localizable at both orthologous and paralogous genes were first characterized by three rates "present", "weakened" or "absent", with significance estimated by rSNP_Guide relatively to six TF sites with known mutation-caused alterations in DNA/TF-binding.
Collapse
Affiliation(s)
- JULIA PONOMARENKO
- Laboratory of Genome Structure, Institute of Cytology and Genetics, 10 Lavrentyev Ave, Novosibirsk, 630090, Russia
| | - GALINA ORLOVA
- Laboratory of Theoretical Genetics, Institute of Cytology and Genetics, 10 Lavrentyev Ave, Novosibirsk, 630090, Russia
| | - TATYANA MERKULOVA
- Laboratory of Gene Expression Regulation, Institute of Cytology and Genetics, 10 Lavrentyev Ave, Novosibirsk, 630090, Russia
| | - GENNADY VASILIEV
- Laboratory of Gene Expression Regulation, Institute of Cytology and Genetics, 10 Lavrentyev Ave, Novosibirsk, 630090, Russia
| | - MIKHAIL PONOMARENKO
- Laboratory of Theoretical Genetics, Institute of Cytology and Genetics, 10 Lavrentyev Ave, Novosibirsk, 630090, Russia
| |
Collapse
|
3
|
Managadze D, Rogozin IB, Chernikova D, Shabalina SA, Koonin EV. Negative correlation between expression level and evolutionary rate of long intergenic noncoding RNAs. Genome Biol Evol 2011; 3:1390-404. [PMID: 22071789 PMCID: PMC3242500 DOI: 10.1093/gbe/evr116] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Mammalian genomes contain numerous genes for long noncoding RNAs (lncRNAs). The functions of the lncRNAs remain largely unknown but their evolution appears to be constrained by purifying selection, albeit relatively weakly. To gain insights into the mode of evolution and the functional range of the lncRNA, they can be compared with much better characterized protein-coding genes. The evolutionary rate of the protein-coding genes shows a universal negative correlation with expression: highly expressed genes are on average more conserved during evolution than the genes with lower expression levels. This correlation was conceptualized in the misfolding-driven protein evolution hypothesis according to which misfolding is the principal cost incurred by protein expression. We sought to determine whether long intergenic ncRNAs (lincRNAs) follow the same evolutionary trend and indeed detected a moderate but statistically significant negative correlation between the evolutionary rate and expression level of human and mouse lincRNA genes. The magnitude of the correlation for the lincRNAs is similar to that for equal-sized sets of protein-coding genes with similar levels of sequence conservation. Additionally, the expression level of the lincRNAs is significantly and positively correlated with the predicted extent of lincRNA molecule folding (base-pairing), however, the contributions of evolutionary rates and folding to the expression level are independent. Thus, the anticorrelation between evolutionary rate and expression level appears to be a general feature of gene evolution that might be caused by similar deleterious effects of protein and RNA misfolding and/or other factors, for example, the number of interacting partners of the gene product.
Collapse
Affiliation(s)
- David Managadze
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | | | | | | | | |
Collapse
|
4
|
Wu J. Testing the coding potential of conserved short genomic sequences. Adv Bioinformatics 2010; 2010:287070. [PMID: 20224812 PMCID: PMC2834954 DOI: 10.1155/2010/287070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2009] [Accepted: 01/02/2010] [Indexed: 11/25/2022] Open
Abstract
Proposed is a procedure to test whether a genomic sequence contains coding DNA, called a coding potential region. The procedure tests the coding potential of conserved short genomic sequence, in which the assumptions on the probability models of gene structures are relaxed. Thus, it is expected to provide additional candidate regions that contain coding DNAs to the current genomic database. The procedure was applied to the set of highly conserved human-mouse sequences in the genome database at the University of California at Santa Cruz. For sequences containing RefSeq coding exons, the procedure detected 91.3% regions having coding potential in this set, which covers 83% of the human RefSeq coding exons, at a 2.6% false positive rate. The procedure detected 12,688 novel short regions with coding potential at the false discovery rate <0.05; 65.7% of the novel regions are between annotated genes.
Collapse
Affiliation(s)
- Jing Wu
- Department of Statistics, Carnegie Mellon University, PA 15213, USA
| |
Collapse
|
5
|
Rè M, Pesole G, Horner DS. Accurate discrimination of conserved coding and non-coding regions through multiple indicators of evolutionary dynamics. BMC Bioinformatics 2009; 10:282. [PMID: 19737408 PMCID: PMC2758873 DOI: 10.1186/1471-2105-10-282] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2008] [Accepted: 09/08/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The conservation of sequences between related genomes has long been recognised as an indication of functional significance and recognition of sequence homology is one of the principal approaches used in the annotation of newly sequenced genomes. In the context of recent findings that the number non-coding transcripts in higher organisms is likely to be much higher than previously imagined, discrimination between conserved coding and non-coding sequences is a topic of considerable interest. Additionally, it should be considered desirable to discriminate between coding and non-coding conserved sequences without recourse to the use of sequence similarity searches of protein databases as such approaches exclude the identification of novel conserved proteins without characterized homologs and may be influenced by the presence in databases of sequences which are erroneously annotated as coding. RESULTS Here we present a machine learning-based approach for the discrimination of conserved coding sequences. Our method calculates various statistics related to the evolutionary dynamics of two aligned sequences. These features are considered by a Support Vector Machine which designates the alignment coding or non-coding with an associated probability score. CONCLUSION We show that our approach is both sensitive and accurate with respect to comparable methods and illustrate several situations in which it may be applied, including the identification of conserved coding regions in genome sequences and the discrimination of coding from non-coding cDNA sequences.
Collapse
Affiliation(s)
- Matteo Rè
- Dipartimento di Scienze Biomolecolari e Biotecnologie, Università degli Studi di Milano, Via Celoria 26, 20133 Milano, Italia.
| | | | | |
Collapse
|
6
|
Xu D, Liu HJ, Wang YF. BSS-HMM3s: an improved HMM method for identifying transcription factor binding sites. DNA SEQUENCE : THE JOURNAL OF DNA SEQUENCING AND MAPPING 2005; 16:403-11. [PMID: 16287619 DOI: 10.1080/10425170500356032] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Today, an important problem in molecular biology is the study of gene expression mechanism. The first step in determining differential gene expression is the binding of sequence specific transcription factors to regulatory regions of the genes. An important aspect to understand how a given transcription factor functions is to know the entire gamut of binding sites and potential target genes that the factor may regulate. In this paper, we presented an improved prediction method based on hidden Markov model (HMM) called BSS-HMM(3)s (binding site search based on third-order HMMs) for transcription factor binding sites. The results show that the predicted sensitivity and specificity of BSS-HMM(3)s increased 11.95 and 12.97%, respectively, compared with Match.
Collapse
Affiliation(s)
- Dong Xu
- College of Sciences, Shanghai University, Department of Mathematics, 99 Shangda Road, Shanghai 200444, People's Republic of China
| | | | | |
Collapse
|
7
|
Churbanov A, Rogozin IB, Babenko VN, Ali H, Koonin EV. Evolutionary conservation suggests a regulatory function of AUG triplets in 5'-UTRs of eukaryotic genes. Nucleic Acids Res 2005; 33:5512-20. [PMID: 16186132 PMCID: PMC1236974 DOI: 10.1093/nar/gki847] [Citation(s) in RCA: 88] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
By comparing sequences of human, mouse and rat orthologous genes, we show that in 5′-untranslated regions (5′-UTRs) of mammalian cDNAs but not in 3′-UTRs or coding sequences, AUG is conserved to a significantly greater extent than any of the other 63 nt triplets. This effect is likely to reflect, primarily, bona fide evolutionary conservation, rather than cDNA annotation artifacts, because the excess of conserved upstream AUGs (uAUGs) is seen in 5′-UTRs containing stop codons in-frame with the start AUG and many of the conserved AUGs are found in different frames, consistent with the location in authentic non-coding sequences. Altogether, conserved uAUGs are present in at least 20–30% of mammalian genes. Qualitatively similar results were obtained by comparison of orthologous genes from different species of the yeast genus Saccharomyces. Together with the observation that mammalian and yeast 5′-UTRs are significantly depleted in overall AUG content, these findings suggest that AUG triplets in 5′-UTRs are subject to the pressure of purifying selection in two opposite directions: the uAUGs that have no specific function tend to be deleterious and get eliminated during evolution, whereas those uAUGs that do serve a function are conserved. Most probably, the principal role of the conserved uAUGs is attenuation of translation at the initiation stage, which is often additionally regulated by alternative splicing in the mammalian 5′-UTRs. Consistent with this hypothesis, we found that open reading frames starting from conserved uAUGs are significantly shorter than those starting from non-conserved uAUGs, possibly, owing to selection for optimization of the level of attenuation.
Collapse
Affiliation(s)
| | - Igor B. Rogozin
- National Center for Biotechnology Information NLM, National Institutes of HealthBethesda MD 20894, USA
| | - Vladimir N. Babenko
- National Center for Biotechnology Information NLM, National Institutes of HealthBethesda MD 20894, USA
| | | | - Eugene V. Koonin
- National Center for Biotechnology Information NLM, National Institutes of HealthBethesda MD 20894, USA
- To whom correspondence should be addressed. Tel: +1 301 435 5913; Fax: +1 301 435 7794;
| |
Collapse
|
8
|
Abstract
The accurate prediction of higher eukaryotic gene structures and regulatory elements directly from genomic sequences is an important early step in the understanding of newly assembled contigs and finished genomes. As more new genomes are sequenced, comparative approaches are becoming increasingly practical and valuable for predicting genes and regulatory elements. We demonstrate the effectiveness of a comparative method called pattern filtering; it utilizes synteny between two or more genomic segments for the annotation of genomic sequences. Pattern filtering optimally detects the signatures of conserved functional elements despite the stochastic noise inherent in evolutionary processes, allowing more accurate annotation of gene models. We anticipate that pattern filtering will facilitate sequence annotation and the discovery of new functional elements by the genetics and genomics communities.
Collapse
Affiliation(s)
- Jonathan E Moore
- Molecular Biology Institute, University of California Los Angeles, Los Angeles, CA 90095, USA
| | | |
Collapse
|
9
|
Mathé C, Sagot MF, Schiex T, Rouzé P. Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Res 2002; 30:4103-17. [PMID: 12364589 PMCID: PMC140543 DOI: 10.1093/nar/gkf543] [Citation(s) in RCA: 209] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2002] [Revised: 08/07/2002] [Accepted: 08/07/2002] [Indexed: 11/14/2022] Open
Abstract
While the genomes of many organisms have been sequenced over the last few years, transforming such raw sequence data into knowledge remains a hard task. A great number of prediction programs have been developed that try to address one part of this problem, which consists of locating the genes along a genome. This paper reviews the existing approaches to predicting genes in eukaryotic genomes and underlines their intrinsic advantages and limitations. The main mathematical models and computational algorithms adopted are also briefly described and the resulting software classified according to both the method and the type of evidence used. Finally, the several difficulties and pitfalls encountered by the programs are detailed, showing that improvements are needed and that new directions must be considered.
Collapse
Affiliation(s)
- Catherine Mathé
- Institut de Pharmacologie et Biologie Structurale, UMR 5089, 205 route de Narbonne, F-31077 Toulouse Cedex, France.
| | | | | | | |
Collapse
|
10
|
Slesarev AI, Mezhevaya KV, Makarova KS, Polushin NN, Shcherbinina OV, Shakhova VV, Belova GI, Aravind L, Natale DA, Rogozin IB, Tatusov RL, Wolf YI, Stetter KO, Malykh AG, Koonin EV, Kozyavkin SA. The complete genome of hyperthermophile Methanopyrus kandleri AV19 and monophyly of archaeal methanogens. Proc Natl Acad Sci U S A 2002; 99:4644-9. [PMID: 11930014 PMCID: PMC123701 DOI: 10.1073/pnas.032671499] [Citation(s) in RCA: 239] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2001] [Accepted: 12/14/2001] [Indexed: 11/18/2022] Open
Abstract
We have determined the complete 1,694,969-nt sequence of the GC-rich genome of Methanopyrus kandleri by using a whole direct genome sequencing approach. This approach is based on unlinking of genomic DNA with the ThermoFidelase version of M. kandleri topoisomerase V and cycle sequencing directed by 2'-modified oligonucleotides (Fimers). Sequencing redundancy (3.3x) was sufficient to assemble the genome with less than one error per 40 kb. Using a combination of sequence database searches and coding potential prediction, 1,692 protein-coding genes and 39 genes for structural RNAs were identified. M. kandleri proteins show an unusually high content of negatively charged amino acids, which might be an adaptation to the high intracellular salinity. Previous phylogenetic analysis of 16S RNA suggested that M. kandleri belonged to a very deep branch, close to the root of the archaeal tree. However, genome comparisons indicate that, in both trees constructed using concatenated alignments of ribosomal proteins and trees based on gene content, M. kandleri consistently groups with other archaeal methanogens. M. kandleri shares the set of genes implicated in methanogenesis and, in part, its operon organization with Methanococcus jannaschii and Methanothermobacter thermoautotrophicum. These findings indicate that archaeal methanogens are monophyletic. A distinctive feature of M. kandleri is the paucity of proteins involved in signaling and regulation of gene expression. Also, M. kandleri appears to have fewer genes acquired via lateral transfer than other archaea. These features might reflect the extreme habitat of this organism.
Collapse
|
11
|
Santhanam R, Naz RK. Novel human testis-specific cDNA: molecular cloning, expression and immunobiological effects of the recombinant protein. Mol Reprod Dev 2001; 60:1-12. [PMID: 11550262 DOI: 10.1002/mrd.1055] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
A differential display-polymerase chain reaction was employed to obtain a testis-specific cDNA fragment. On screening the human testis-(lambda)gt10-cDNA library with testis-specific cDNA fragment, a novel cDNA encoding for a sperm antigen, designated TSA-1, was obtained. It has a novel open reading frame (ORF) of 471 base pairs encoding for 156 amino acids. The computer generated translated protein has a calculated molecular mass of 17.4 kDa and contains a potential N-glycosylation site at amino acids 122-124. The hydrophilicity analysis of the amino acid sequence suggested that this protein is a membrane-anchored peptide. Extensive analysis for tissue-specificity by Northern blots and RT-PCR-Southern blot procedures using various human tissues indicated that TSA-1 was specifically expressed only in the human testis. Based on the results of in vitro transcription and translation experiments, the TSA-1 (ORF) was subcloned into pGEX-6P-3 vector and expressed using the glutathione S-transferase gene fusion system. Antibodies (Ab) against the purified recombinant protein specifically recognized the approximately 17 kDa recombinant TSA-1, and a approximately 24 kDa band in human sperm extract in the Western blot procedure. The recombinant TSA-1 Ab recognized the acrosomal, equatorial, mid-piece, and tail regions of human sperm cell in indirect immunofluorescence, bound to live human sperm in the immunobeads binding technique (IBT) and caused a significant concentration-dependent inhibition of human sperm acrosome reaction. These findings indicate that the novel sperm-specific recombinant TSA-1 has a role in sperm function and may have applications in the development of a contraceptive vaccine, and in the specific diagnosis and treatment of male infertility.
Collapse
MESH Headings
- Acrosome Reaction
- Amino Acid Sequence
- Antibodies/immunology
- Antigens, Surface/chemistry
- Antigens, Surface/genetics
- Antigens, Surface/immunology
- Base Sequence
- Blotting, Western
- Cloning, Molecular
- Contraception, Immunologic/methods
- DNA, Complementary/genetics
- GPI-Linked Proteins
- Humans
- Male
- Membrane Proteins
- Microscopy, Fluorescence
- Molecular Sequence Data
- Molecular Weight
- Organ Specificity
- Polymerase Chain Reaction
- RNA, Messenger/genetics
- RNA, Messenger/metabolism
- Recombinant Proteins/chemistry
- Recombinant Proteins/immunology
- Spermatozoa/immunology
- Spermatozoa/physiology
- Testis/cytology
- Testis/immunology
- Testis/metabolism
Collapse
Affiliation(s)
- R Santhanam
- Division of Research, Department of Obstetrics and Gynecology, Medical College of Ohio, Toledo, Ohio 43614-5806, USA
| | | |
Collapse
|
12
|
Abstract
The Genome Annotation Assessment Project tested current methods of gene identification, including a critical assessment of the accuracy of different methods. Two new databases have provided new resources for gene annotation: these are the InterPro database of protein domains and motifs, and the Gene Ontology database for terms that describe the molecular functions and biological roles of gene products. Efforts in genome annotation are most often based upon advances in computer systems that are specifically designed to deal with the tremendous amounts of data being generated by current sequencing projects. These efforts in analysis are being linked to new ways of visualizing computationally annotated genomes.
Collapse
Affiliation(s)
- S Lewis
- Department of Molecular and Cell Biology, Berkeley Drosophila Genome Project, University of California, Berkeley, CA 94720-3200, USA.
| | | | | |
Collapse
|