1
|
Torrado M, Maneiro E, Lamounier Junior A, Fernández-Burriel M, Sánchez Giralt S, Martínez-Carapeto A, Cazón L, Santiago E, Ochoa JP, McKenna WJ, Santomé L, Monserrat L. Identification of an elusive spliceogenic MYBPC3 variant in an otherwise genotype-negative hypertrophic cardiomyopathy pedigree. Sci Rep 2022; 12:7284. [PMID: 35508642 PMCID: PMC9068804 DOI: 10.1038/s41598-022-11159-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 04/13/2022] [Indexed: 11/10/2022] Open
Abstract
The finding of a genotype-negative hypertrophic cardiomyopathy (HCM) pedigree with several affected members indicating a familial origin of the disease has driven this study to discover causative gene variants. Genetic testing of the proband and subsequent family screening revealed the presence of a rare variant in the MYBPC3 gene, c.3331−26T>G in intron 30, with evidence supporting cosegregation with the disease in the family. An analysis of potential splice-altering activity using several splicing algorithms consistently yielded low scores. Minigene expression analysis at the mRNA and protein levels revealed that c.3331−26T>G is a spliceogenic variant with major splice-altering activity leading to undetectable levels of properly spliced transcripts or the corresponding protein. Minigene and patient mRNA analyses indicated that this variant induces complete and partial retention of intron 30, which was expected to lead to haploinsufficiency in carrier patients. As most spliceogenic MYBPC3 variants, c.3331−26T>G appears to be non-recurrent, since it was identified in only two additional unrelated probands in our large HCM cohort. In fact, the frequency analysis of 46 known splice-altering MYBPC3 intronic nucleotide substitutions in our HCM cohort revealed 9 recurrent and 16 non-recurrent variants present in a few probands (≤ 4), while 21 were not detected. The identification of non-recurrent elusive MYBPC3 spliceogenic variants that escape detection by in silico algorithms represents a challenge for genetic diagnosis of HCM and contributes to solving a fraction of genotype-negative HCM cases.
Collapse
Affiliation(s)
- Mario Torrado
- Cardiovascular Research Group, University of A Coruña, Campus de Oza, Building Fortín, 15006, A Coruña, Spain. .,Biomedical Research Institute of A Coruña, A Coruña, Spain.
| | - Emilia Maneiro
- Biomedical Research Institute of A Coruña, A Coruña, Spain. .,Cardiovascular Genetics, Health in Code, Business Center Marineda, Avenida de Arteixo 43, Local 1A, 15008, A Coruña, Spain.
| | - Arsonval Lamounier Junior
- Cardiovascular Research Group, University of A Coruña, Campus de Oza, Building Fortín, 15006, A Coruña, Spain.,Biomedical Research Institute of A Coruña, A Coruña, Spain.,Cardiovascular Genetics, Health in Code, Business Center Marineda, Avenida de Arteixo 43, Local 1A, 15008, A Coruña, Spain.,Medical School, Universidade Vale do Rio Doce, Governador Valadares, MG, Brazil
| | | | | | | | - Laura Cazón
- Cardiovascular Genetics, Health in Code, Business Center Marineda, Avenida de Arteixo 43, Local 1A, 15008, A Coruña, Spain
| | - Elisa Santiago
- Cardiovascular Genetics, Health in Code, Business Center Marineda, Avenida de Arteixo 43, Local 1A, 15008, A Coruña, Spain
| | - Juan Pablo Ochoa
- Biomedical Research Institute of A Coruña, A Coruña, Spain.,Cardiovascular Genetics, Health in Code, Business Center Marineda, Avenida de Arteixo 43, Local 1A, 15008, A Coruña, Spain
| | - William J McKenna
- Cardiovascular Research Group, University of A Coruña, Campus de Oza, Building Fortín, 15006, A Coruña, Spain.,Biomedical Research Institute of A Coruña, A Coruña, Spain.,Institute of Cardiovascular Science, University College London, London, UK
| | - Luis Santomé
- Cardiovascular Genetics, Health in Code, Business Center Marineda, Avenida de Arteixo 43, Local 1A, 15008, A Coruña, Spain
| | - Lorenzo Monserrat
- Biomedical Research Institute of A Coruña, A Coruña, Spain.,Cardiovascular Genetics, Health in Code, Business Center Marineda, Avenida de Arteixo 43, Local 1A, 15008, A Coruña, Spain
| |
Collapse
|
2
|
Hassanzadeh P, Atyabi F, Dinarvand R. The significance of artificial intelligence in drug delivery system design. Adv Drug Deliv Rev 2019; 151-152:169-190. [PMID: 31071378 DOI: 10.1016/j.addr.2019.05.001] [Citation(s) in RCA: 77] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Revised: 04/14/2019] [Accepted: 05/02/2019] [Indexed: 02/07/2023]
Abstract
Over the last decade, increasing interest has been attracted towards the application of artificial intelligence (AI) technology for analyzing and interpreting the biological or genetic information, accelerated drug discovery, and identification of the selective small-molecule modulators or rare molecules and prediction of their behavior. Application of the automated workflows and databases for rapid analysis of the huge amounts of data and artificial neural networks (ANNs) for development of the novel hypotheses and treatment strategies, prediction of disease progression, and evaluation of the pharmacological profiles of drug candidates may significantly improve treatment outcomes. Target fishing (TF) by rapid prediction or identification of the biological targets might be of great help for linking targets to the novel compounds. AI and TF methods in association with human expertise may indeed revolutionize the current theranostic strategies, meanwhile, validation approaches are necessary to overcome the potential challenges and ensure higher accuracy. In this review, the significance of AI and TF in the development of drugs and delivery systems and the potential challenging issues have been highlighted.
Collapse
Affiliation(s)
- Parichehr Hassanzadeh
- Nanotechnology Research Center, Faculty of Pharmacy, Tehran University of Medical Sciences, Tehran 13169-43551, Iran.
| | - Fatemeh Atyabi
- Nanotechnology Research Center, Faculty of Pharmacy, Tehran University of Medical Sciences, Tehran 13169-43551, Iran.
| | - Rassoul Dinarvand
- Nanotechnology Research Center, Faculty of Pharmacy, Tehran University of Medical Sciences, Tehran 13169-43551, Iran.
| |
Collapse
|
3
|
Alves CS, Dobrowsky TM. Strategies and Considerations for Improving Expression of "Difficult to Express" Proteins in CHO Cells. Methods Mol Biol 2017; 1603:1-23. [PMID: 28493120 DOI: 10.1007/978-1-4939-6972-2_1] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Despite substantial advances in the field of mammalian expression, there are still proteins that are characterized as difficult to express. Determining the expression bottleneck requires troubleshooting techniques specific for the given molecule and host. The complex array of intracellular processes involved in protein expression includes transcription, protein folding, post-translation processing, and secretion. Challenges in any of these steps could result in low protein expression, while the inherent properties of the molecule itself may limit its production via mechanisms such as cytotoxicity or inherent instability. Strategies to identify the rate-limiting step and subsequently improve expression and production are discussed here.
Collapse
|
4
|
A novel exon generates ubiquitously expressed alternatively spliced new transcript of mouse Abcc4 gene. Gene 2016; 594:131-137. [PMID: 27613143 DOI: 10.1016/j.gene.2016.08.058] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2016] [Revised: 07/30/2016] [Accepted: 08/31/2016] [Indexed: 11/20/2022]
Abstract
Abcc4 gene codes for a protein (ABCC4) involved in the transportation of different classes of drugs outside the cells. Various important drugs transported by ABCC4 include antiviral and anticancer drugs as well as endogenous molecules such as bile acids, cyclic nucleotides, folates, prostaglandins and steroids. Alternative splicing generates multiple mRNAs that encode protein isoforms having diverse functions. In this study, we have identified a novel transcript of mouse Abcc4 gene using a combination of bioinformatics and molecular biology techniques. This transcript was found to be different from the reported transcript in having a different first exon that was found to be located on previously identified first intron. Newly identified transcript was found to be expressed across different tissues we studied and in different developmental stages. Expression level of novel and reported transcripts was studied using quantitative real-time PCR. After conceptually translating the novel transcript, various post-translational modifications were studied. Translation efficiency and predicted half life of encoded protein isoforms were analysed in silico. Molecular modelling was performed to compare the structural differences in both isoforms. The diversity at N-termini in these protein isoforms explains the diverse function of ABCC4 in mouse.
Collapse
|
5
|
Levin L, Bar-Yaacov D, Bouskila A, Chorev M, Carmel L, Mishmar D. LEMONS - A Tool for the Identification of Splice Junctions in Transcriptomes of Organisms Lacking Reference Genomes. PLoS One 2015; 10:e0143329. [PMID: 26606265 PMCID: PMC4659627 DOI: 10.1371/journal.pone.0143329] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2015] [Accepted: 11/03/2015] [Indexed: 11/18/2022] Open
Abstract
RNA-seq is becoming a preferred tool for genomics studies of model and non-model organisms. However, DNA-based analysis of organisms lacking sequenced genomes cannot rely on RNA-seq data alone to isolate most genes of interest, as DNA codes both exons and introns. With this in mind, we designed a novel tool, LEMONS, that exploits the evolutionary conservation of both exon/intron boundary positions and splice junction recognition signals to produce high throughput splice-junction predictions in the absence of a reference genome. When tested on multiple annotated vertebrate mRNA data, LEMONS accurately identified 87% (average) of the splice-junctions. LEMONS was then applied to our updated Mediterranean chameleon transcriptome, which lacks a reference genome, and predicted a total of 90,820 exon-exon junctions. We experimentally verified these splice-junction predictions by amplifying and sequencing twenty randomly selected genes from chameleon DNA templates. Exons and introns were detected in 19 of 20 of the positions predicted by LEMONS. To the best of our knowledge, LEMONS is currently the only experimentally verified tool that can accurately predict splice-junctions in organisms that lack a reference genome.
Collapse
Affiliation(s)
- Liron Levin
- Department of Life Sciences, Ben Gurion University of the Negev, Beer Sheva, 8410501, Israel
| | - Dan Bar-Yaacov
- Department of Life Sciences, Ben Gurion University of the Negev, Beer Sheva, 8410501, Israel
| | - Amos Bouskila
- Department of Life Sciences, Ben Gurion University of the Negev, Beer Sheva, 8410501, Israel
| | - Michal Chorev
- Department of Genetics, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, 91904, Israel
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, 91904, Israel
| | - Liran Carmel
- Department of Genetics, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, 91904, Israel
| | - Dan Mishmar
- Department of Life Sciences, Ben Gurion University of the Negev, Beer Sheva, 8410501, Israel
- * E-mail:
| |
Collapse
|
6
|
Zhang L, Yan L, Jiang J, Wang Y, Jiang Y, Yan T, Cao Y. The structure and retrotransposition mechanism of LTR-retrotransposons in the asexual yeast Candida albicans. Virulence 2014; 5:655-64. [PMID: 25101670 PMCID: PMC4139406 DOI: 10.4161/viru.32180] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Retrotransposons constitute a major part of the genome in a number of eukaryotes. Long-terminal repeat (LTR) retrotransposons are one type of the retrotransposons. Candida albicans have 34 distinct LTR-retrotransposon families. They respectively belong to the Ty1/copia and Ty3/gypsy groups which have been extensively studied in the model yeast Saccharomyces cerevisiae. LTR-retrotransposons carry two LTRs flanking a long internal protein-coding domain, open reading frames. LTR-retrotransposons use RNA as intermediate to synthesize double-stranded DNA copies. In this article, we describe the structure feature, retrotransposition mechanism and the influence on organism diversity of LTR retrotransposons in C. albicans. We also discuss the relationship between pathogenicity and LTR retrotransposons in C. albicans.
Collapse
Affiliation(s)
- Lulu Zhang
- Research and Develop Center of New Drug; School of Pharmacy; Second Military Medical University; Shanghai, PR China
| | - Lan Yan
- Research and Develop Center of New Drug; School of Pharmacy; Second Military Medical University; Shanghai, PR China
| | - Jingchen Jiang
- Department of Pharmacology; School of Pharmacy; China Pharmaceutical University; Nanjing, PR China
| | - Yan Wang
- Research and Develop Center of New Drug; School of Pharmacy; Second Military Medical University; Shanghai, PR China
| | - Yuanying Jiang
- Research and Develop Center of New Drug; School of Pharmacy; Second Military Medical University; Shanghai, PR China
| | - Tianhua Yan
- Department of Pharmacology; School of Pharmacy; China Pharmaceutical University; Nanjing, PR China
| | - Yongbing Cao
- Research and Develop Center of New Drug; School of Pharmacy; Second Military Medical University; Shanghai, PR China
| |
Collapse
|
7
|
Takata N, Yokota K, Ohki S, Mori M, Taniguchi T, Kurita M. Evolutionary relationship and structural characterization of the EPF/EPFL gene family. PLoS One 2013; 8:e65183. [PMID: 23755192 PMCID: PMC3670920 DOI: 10.1371/journal.pone.0065183] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2012] [Accepted: 04/24/2013] [Indexed: 01/02/2023] Open
Abstract
EPF1-EPF2 and EPFL9/Stomagen act antagonistically in regulating leaf stomatal density. The aim of this study was to elucidate the evolutionary functional divergence of EPF/EPFL family genes. Phylogenetic analyses showed that AtEPFL9/Stomagen-like genes are conserved only in vascular plants and are closely related to AtEPF1/EPF2-like genes. Modeling showed that EPF/EPFL peptides share a common 3D structure that is constituted of a scaffold and loop. Molecular dynamics simulation suggested that AtEPF1/EPF2-like peptides form an additional disulfide bond in their loop regions and show greater flexibility in these regions than AtEPFL9/Stomagen-like peptides. This study uncovered the evolutionary relationship and the conformational divergence of proteins encoded by the EPF/EPFL family genes.
Collapse
Affiliation(s)
- Naoki Takata
- Forest Bio-Research Center, Forestry and Forest Products Research Institute, Hitachi, Japan
| | - Kiyonobu Yokota
- Kanazawa University Graduate School of Medical Sciences, Kanazawa, Japan
| | - Shinya Ohki
- Center for Nano Materials and Technology, Japan Advanced Institute of Science and Technology, Nomi, Japan
| | - Masashi Mori
- Research Institute for Bioresources and Biotechnology, Ishikawa Prefectural University, Nonoichi, Japan
| | - Toru Taniguchi
- Forest Bio-Research Center, Forestry and Forest Products Research Institute, Hitachi, Japan
- Forest Tree Breeding Center, Forestry and Forest Products Research Institute, Hitachi, Japan
| | - Manabu Kurita
- Forest Bio-Research Center, Forestry and Forest Products Research Institute, Hitachi, Japan
- Forest Tree Breeding Center, Forestry and Forest Products Research Institute, Hitachi, Japan
- * E-mail:
| |
Collapse
|
8
|
Goel N, Singh S, Aseri TC. A comparative analysis of soft computing techniques for gene prediction. Anal Biochem 2013; 438:14-21. [PMID: 23529114 DOI: 10.1016/j.ab.2013.03.015] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2012] [Revised: 03/05/2013] [Accepted: 03/14/2013] [Indexed: 11/17/2022]
Abstract
The rapid growth of genomic sequence data for both human and nonhuman species has made analyzing these sequences, especially predicting genes in them, very important and is currently the focus of many research efforts. Beside its scientific interest in the molecular biology and genomics community, gene prediction is of considerable importance in human health and medicine. A variety of gene prediction techniques have been developed for eukaryotes over the past few years. This article reviews and analyzes the application of certain soft computing techniques in gene prediction. First, the problem of gene prediction and its challenges are described. These are followed by different soft computing techniques along with their application to gene prediction. In addition, a comparative analysis of different soft computing techniques for gene prediction is given. Finally some limitations of the current research activities and future research directions are provided.
Collapse
Affiliation(s)
- Neelam Goel
- Department of Computer Science and Engineering, PEC University of Technology, Sector-12, Chandigarh 160 012, UT, India.
| | | | | |
Collapse
|
9
|
Liou SW, Huang YF. An exon/intron disparity framework based on the nucleotide profile of single sequence. ACTA ACUST UNITED AC 2012. [DOI: 10.1007/s13721-012-0007-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
10
|
Yin PY, Shyu SJ, Yang SR, Chang YC. Reinforcement Learning for Improving Gene Identification Accuracy by Combination of Gene-Finding Programs. INTERNATIONAL JOURNAL OF APPLIED METAHEURISTIC COMPUTING 2012. [DOI: 10.4018/jamc.2012010104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Due to the explosive and growing size of the genome database, the discovery of gene has become one of the most computationally intensive tasks in bioinformatics. Many such systems have been developed to find genes; however, there is still some room to improve the prediction accuracy. This paper proposes a reinforcement learning model for a combination of gene predictions from existing gene-finding programs. The model learns the optimal policy for accepting the best predictions. The fitness of a policy is reinforced if the selected prediction at a nucleotide site correctly corresponds to the true annotation. The model searches for the optimal policy which maximizes the expected prediction accuracy over all nucleotide sites in the sequences. The experimental results demonstrate that the proposed model yields higher prediction accuracy than that obtained by the single best program.
Collapse
|
11
|
Ontivero M, Zamora GM, Salazar S, Ricci JCD, Castagnaro AP. Isolation of a strawberry gene fragment encoding an actin depolymerizing factor-like protein from genotypes resistant to Colletotrichum acutatum. Genome 2011; 54:1041-4. [DOI: 10.1139/g11-068] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Actin depolymerizing factors (ADFs) have been recently implicated in plant defense against pathogenic fungi, associated with the cytoskeletal rearrangements that contribute to establish an effective barrier against fungal ingress. In this work, we identified a DNA fragment corresponding to a part of a gene predicted to encode an ADF-like protein in genotypes of Fragaria ananassa resistant to the fungus Colletotrichum acutatum . Bulked segregant analysis combined with AFLP was used to identify polymorphisms linked to resistance in hybrids derived from the cross between the resistant cultivar ‘Sweet Charlie’ and the susceptible cultivar ‘Pájaro’. The sequence of one out of three polymorphic bands detected showed significant BLASTX hits to ADF proteins from other plants. Two possible exons were identified and bioinformatic analysis revealed the presence of the ADF homology domain with two actin-binding sites, an N-terminal phosphorylation site, and a nuclear localization signal. In addition to its possible application in strawberry breeding programs, these finding may contribute to investigate the role of ADFs in plant resistance against fungi.
Collapse
Affiliation(s)
- Marta Ontivero
- Sección Biotecnología, Estación Experimental Agroindustrial Obispo Colombres-Unidad Asociada al INSIBIO, Av. William Cross 3150, 4101 Las Talitas, Tucumán, Argentina
| | - Gustavo Martínez Zamora
- Instituto Superior de Investigaciones Biológicas (INSIBIO; CONICET- UNT) and Instituto de Química Biológica “Dr. Bernabé Bloj”, Universidad Nacional de Tucumán. Chacabuco 461, 4000 Tucumán. Argentina
| | - Sergio Salazar
- Estación Experimental Agropecuaria Famaillá-INTA. Ruta Prov. 301 km 32. 4132 Famaillá, Tucumán, Argentina. Facultad de Agronomía y Zootecnia, Universidad Nacional de Tucumán. Av. Roca 1900. 4000 Tucumán. Argentina
| | - Juan Carlos Díaz Ricci
- Instituto Superior de Investigaciones Biológicas (INSIBIO; CONICET- UNT) and Instituto de Química Biológica “Dr. Bernabé Bloj”, Universidad Nacional de Tucumán. Chacabuco 461, 4000 Tucumán. Argentina
| | - Atilio Pedro Castagnaro
- Sección Biotecnología, Estación Experimental Agroindustrial Obispo Colombres-Unidad Asociada al INSIBIO, Av. William Cross 3150, 4101 Las Talitas, Tucumán, Argentina
| |
Collapse
|
12
|
Akhtar MN, Bukhari SA, Fazal Z, Qamar R, Shahmuradov IA. POLYAR, a new computer program for prediction of poly(A) sites in human sequences. BMC Genomics 2010; 11:646. [PMID: 21092114 PMCID: PMC3053588 DOI: 10.1186/1471-2164-11-646] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2010] [Accepted: 11/19/2010] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND mRNA polyadenylation is an essential step of pre-mRNA processing in eukaryotes. Accurate prediction of the pre-mRNA 3'-end cleavage/polyadenylation sites is important for defining the gene boundaries and understanding gene expression mechanisms. RESULTS 28761 human mapped poly(A) sites have been classified into three classes containing different known forms of polyadenylation signal (PAS) or none of them (PAS-strong, PAS-weak and PAS-less, respectively) and a new computer program POLYAR for the prediction of poly(A) sites of each class was developed. In comparison with polya_svm (till date the most accurate computer program for prediction of poly(A) sites) while searching for PAS-strong poly(A) sites in human sequences, POLYAR had a significantly higher prediction sensitivity (80.8% versus 65.7%) and specificity (66.4% versus 51.7%) However, when a similar sort of search was conducted for PAS-weak and PAS-less poly(A) sites, both programs had a very low prediction accuracy, which indicates that our knowledge about factors involved in the determination of the poly(A) sites is not sufficient to identify such polyadenylation regions. CONCLUSIONS We present a new classification of polyadenylation sites into three classes and a novel computer program POLYAR for prediction of poly(A) sites/regions of each of the class. In tests, POLYAR shows high accuracy of prediction of the PAS-strong poly(A) sites, though this program's efficiency in searching for PAS-weak and PAS-less poly(A) sites is not very high but is comparable to other available programs. These findings suggest that additional characteristics of such poly(A) sites remain to be elucidated. POLYAR program with a stand-alone version for downloading is available at http://cub.comsats.edu.pk/polyapredict.htm.
Collapse
Affiliation(s)
- Malik Nadeem Akhtar
- Department of Biosciences, COMSATS Institute of Information Technology, Islamabad, Pakistan
| | | | | | | | | |
Collapse
|
13
|
Ordiz MI, Yang J, Barbazuk WB, Beachy RN. Functional analysis of the activation domain of RF2a, a rice transcription factor. PLANT BIOTECHNOLOGY JOURNAL 2010; 8:835-44. [PMID: 20408988 DOI: 10.1111/j.1467-7652.2010.00520.x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Rice transcription factor RF2a binds to the BoxII cis element of the promoter of rice tungro bacilliform virus and activates promoter expression. The acidic acid-rich domain of RF2a is a transcription activator and has been partially characterized (Dai et al., 2003). The RF2a acidic domain (A; amino acids 49-116) was fused with the synthetic zinc finger ZF-TF 2C7 and was co-introduced with a reporter gene into transgenic Arabidopsis plants. Expression of the reporter gene was increased up to seven times by the effector. In transient assays in tobacco BY-2 protoplasts, we identified a subdomain comprising amino acids 56-84 (A5) that was equally as effective as an activator as the entire acidic domain. A chemically inducible system was used to show determined that A and A5 domains are equally as effective in transcription activation as the well-characterized VP16 activation domain. Bioinformatics analyses revealed that the A5 domain is present only in b-ZIP transcription factors. In dicots, the A domain contains an insertion of four amino acids that is not present in monocot proteins. The A5 domain, and similar domains in other b-ZIP transcription factors, is predicted to form an anti-parallel beta sheet structure.
Collapse
Affiliation(s)
- M Isabel Ordiz
- Donald Danforth Plant Science Center, St Louis, MO 63132, USA
| | | | | | | |
Collapse
|
14
|
Pitsch NT, Witsch B, Baier M. Comparison of the chloroplast peroxidase system in the chlorophyte Chlamydomonas reinhardtii, the bryophyte Physcomitrella patens, the lycophyte Selaginella moellendorffii and the seed plant Arabidopsis thaliana. BMC PLANT BIOLOGY 2010; 10:133. [PMID: 20584316 PMCID: PMC3095285 DOI: 10.1186/1471-2229-10-133] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2010] [Accepted: 06/28/2010] [Indexed: 05/04/2023]
Abstract
BACKGROUND Oxygenic photosynthesis is accompanied by the formation of reactive oxygen species (ROS), which damage proteins, lipids, DNA and finally limit plant yield. The enzymes of the chloroplast antioxidant system are exclusively nuclear encoded. During evolution, plastid and mitochondrial genes were post-endosymbiotically transferred to the nucleus, adapted for eukaryotic gene expression and post-translational protein targeting and supplemented with genes of eukaryotic origin. RESULTS Here, the genomes of the green alga Chlamydomonas reinhardtii, the moss Physcomitrella patens, the lycophyte Selaginella moellendorffii and the seed plant Arabidopsis thaliana were screened for ORFs encoding chloroplast peroxidases. The identified genes were compared for their amino acid sequence similarities and gene structures. Stromal and thylakoid-bound ascorbate peroxidases (APx) share common splice sites demonstrating that they evolved from a common ancestral gene. In contrast to most cormophytes, our results predict that chloroplast APx activity is restricted to the stroma in Chlamydomonas and to thylakoids in Physcomitrella. The moss gene is of retrotransposonal origin.The exon-intron-structures of 2CP genes differ between chlorophytes and streptophytes indicating an independent evolution. According to amino acid sequence characteristics only the A-isoform of Chlamydomonas 2CP may be functionally equivalent to streptophyte 2CP, while the weakly expressed B- and C-isoforms show chlorophyte specific surfaces and amino acid sequence characteristics. The amino acid sequences of chloroplast PrxII are widely conserved between the investigated species. In the analyzed streptophytes, the genes are unspliced, but accumulated four introns in Chlamydomonas. A conserved splice site indicates also a common origin of chlorobiont PrxQ.The similarity of splice sites also demonstrates that streptophyte glutathione peroxidases (GPx) are of common origin. Besides a less related cysteine-type GPx, Chlamydomonas encodes two selenocysteine-type GPx. The latter were lost prior or during streptophyte evolution. CONCLUSION Throughout plant evolution, there was a strong selective pressure on maintaining the activity of all three investigated types of peroxidases in chloroplasts. APx evolved from a gene, which dates back to times before differentiation of chlorobionts into chlorophytes and streptophytes, while Prx and presumably also GPx gene patterns may have evolved independently in the streptophyte and chlorophyte branches.
Collapse
Affiliation(s)
- Nicola T Pitsch
- Plant Science Institute, Heinrich-Heine-University, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Benjamin Witsch
- Plant Science Institute, Heinrich-Heine-University, Universitätsstraße 1, 40225 Düsseldorf, Germany
| | - Margarete Baier
- Plant Science Institute, Heinrich-Heine-University, Universitätsstraße 1, 40225 Düsseldorf, Germany
- Plant Physiology, Freie Universität Berlin, Königin-Luise-Straße 12-16, 14195 Berlin, Germany
| |
Collapse
|
15
|
Zheng H, Zhou L, Dou T, Han X, Cai Y, Zhan X, Tang C, Huang J, Wu Q. Genome-wide prediction of G protein-coupled receptors in Verticillium spp. Fungal Biol 2010; 114:359-68. [PMID: 20943146 DOI: 10.1016/j.funbio.2010.02.008] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2009] [Revised: 01/11/2010] [Accepted: 02/24/2010] [Indexed: 11/17/2022]
Abstract
G protein-coupled receptors (GPCRs) are critical factors in regulating morphogenesis, mating, infection and virulence in fungi. In this study, various computational strategies were applied to identify GPCR-like proteins from the genomes of both Verticillium dahliae and Verticillium albo-atrum. The putative GPCRs were distributed over 13 classes, and significantly, three of those represented novel classes of GPCR-like proteins in fungi. The three novel GPCRs had high levels of identity to their counterparts in higher eukaryotes, including Homo sapiens. The numbers of GPCR-like proteins in the two Verticillium spp. were similar to those seen in other filamentous fungi, such as Magnaporthe grisea, Neurospora crassa and Fusarium graminearum. Additionally, the carbon/amino acid receptors were divided into three different subclasses, indicating that differences among the GPCRs existed not only among different classes but also within classes. In conclusion, the identification and classification of GPCRs and their homology to some well-studied fungi will be an important starting point for future research in Verticillium spp.
Collapse
Affiliation(s)
- Hongxia Zheng
- Department of Biomedicine, School of Life Science, East China Normal University, Shanghai, China
| | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Fisette JF, Toutant J, Dugré-Brisson S, Desgroseillers L, Chabot B. hnRNP A1 and hnRNP H can collaborate to modulate 5' splice site selection. RNA (NEW YORK, N.Y.) 2010; 16:228-38. [PMID: 19926721 PMCID: PMC2802032 DOI: 10.1261/rna.1890310] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/18/2009] [Accepted: 10/05/2009] [Indexed: 05/21/2023]
Abstract
The mammalian proteins hnRNP A1 and hnRNP H control many splicing decisions in viral and cellular primary transcripts. To explain some of these activities, we have proposed that self-interactions between bound proteins create an RNA loop that represses internal splice sites while simultaneously activating the external sites that are brought in closer proximity. Here we show that a variety of hnRNP H binding sites can affect 5' splice site selection. The addition of two sets of hnRNP H sites in a model pre-mRNA modulates 5' splice site selection cooperatively, consistent with the looping model. Notably, binding sites for hnRNP A1 and H on the same pre-mRNA can similarly collaborate to modulate 5' splice site selection. The C-terminal portion of hnRNP H that contains the glycine-rich domains (GRD) is essential for splicing activity, and it can be functionally replaced by the GRD of hnRNP A1. Finally, we used the bioluminescence resonance energy transfer (BRET) technology to document the existence of homotypic and heterotypic interactions between hnRNP H and hnRNP A1 in live cells. Overall, our study suggests that interactions between different hnRNP proteins bound to distinct locations on a pre-mRNA can change its conformation to affect splicing decisions.
Collapse
Affiliation(s)
- Jean-François Fisette
- Département de Microbiologie et d'Infectiologie, Faculté de Médecine et des Sciences de la Santé, Université de Sherbrooke, Sherbrooke, Québec H3C 3J7, Canada
| | | | | | | | | |
Collapse
|
17
|
Gatherer D. Peptide vocabulary analysis reveals ultra-conservation and homonymity in protein sequences. Bioinform Biol Insights 2009; 1:101-26. [PMID: 20066129 PMCID: PMC2789693 DOI: 10.4137/bbi.s415] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
A new algorithm is presented for vocabulary analysis (word detection) in texts of human origin. It performs at 60%-70% overall accuracy and greater than 80% accuracy for longer words, and approximately 85% sensitivity on Alice in Wonderland, a considerable improvement on previous methods. When applied to protein sequences, it detects short sequences analogous to words in human texts, i.e. intolerant to changes in spelling (mutation), and relatively context-independent in their meaning (function). Some of these are homonyms of up to 7 amino acids, which can assume different structures in different proteins. Others are ultra-conserved stretches of up to 18 amino acids within proteins of less than 40% overall identity, reflecting extreme constraint or convergent evolution. Different species are found to have qualitatively different major peptide vocabularies, e.g. some are dominated by large gene families, while others are rich in simple repeats or dominated by internally repetitive proteins. This suggests the possibility of a peptide vocabulary signature, analogous to genome signatures in DNA. Homonyms may be useful in detecting convergent evolution and positive selection in protein evolution. Ultra-conserved words may be useful in identifying structures intolerant to substitution over long periods of evolutionary time.
Collapse
Affiliation(s)
- Derek Gatherer
- MRC Virology Unit, Institute of Virology, Church Street, Glasgow G11 5JR UK
| |
Collapse
|
18
|
Almazán F, Galán C, Enjuanes L. Engineering infectious cDNAs of coronavirus as bacterial artificial chromosomes. Methods Mol Biol 2009; 454:275-91. [PMID: 19057870 PMCID: PMC7121107 DOI: 10.1007/978-1-59745-181-9_20] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The construction of coronavirus (CoV) infectious clones had been hampered by the large size of the viral genome (around 30 kb) and the instability of plasmids carrying CoV replicase sequences in Escherichia coli. Several approaches have been developed to overcome these problems. Here we describe the engineering of CoV full-length cDNA clones using bacterial artificial chromosomes (BACs). In this system the viral RNA is expressed in the cell nucleus under the control of the cytomegalovirus promoter and further amplified in the cytoplasm by the viral replicase. The BAC-based strategy is an efficient system that allows easy manipulation of CoV genomes to study fundamental viral processes and also to develop genetically defined vaccines. The procedure is illustrated by the cloning of the genome of SARS coronavirus, Urbani strain.
Collapse
Affiliation(s)
- Fernando Almazán
- Centro Nacional de Biotecnología, CSIC, Department of Molecular and Cell Biology, Cantoblanco, Madrid, Spain
| | | | | |
Collapse
|
19
|
Nelson W, Luo M, Ma J, Estep M, Estill J, He R, Talag J, Sisneros N, Kudrna D, Kim H, Ammiraju JSS, Collura K, Bharti AK, Messing J, Wing RA, SanMiguel P, Bennetzen JL, Soderlund C. Methylation-sensitive linking libraries enhance gene-enriched sequencing of complex genomes and map DNA methylation domains. BMC Genomics 2008; 9:621. [PMID: 19099592 PMCID: PMC2628917 DOI: 10.1186/1471-2164-9-621] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2008] [Accepted: 12/19/2008] [Indexed: 11/30/2022] Open
Abstract
Background Many plant genomes are resistant to whole-genome assembly due to an abundance of repetitive sequence, leading to the development of gene-rich sequencing techniques. Two such techniques are hypomethylated partial restriction (HMPR) and methylation spanning linker libraries (MSLL). These libraries differ from other gene-rich datasets in having larger insert sizes, and the MSLL clones are designed to provide reads localized to "epigenetic boundaries" where methylation begins or ends. Results A large-scale study in maize generated 40,299 HMPR sequences and 80,723 MSLL sequences, including MSLL clones exceeding 100 kb. The paired end reads of MSLL and HMPR clones were shown to be effective in linking existing gene-rich sequences into scaffolds. In addition, it was shown that the MSLL clones can be used for anchoring these scaffolds to a BAC-based physical map. The MSLL end reads effectively identified epigenetic boundaries, as indicated by their preferential alignment to regions upstream and downstream from annotated genes. The ability to precisely map long stretches of fully methylated DNA sequence is a unique outcome of MSLL analysis, and was also shown to provide evidence for errors in gene identification. MSLL clones were observed to be significantly more repeat-rich in their interiors than in their end reads, confirming the correlation between methylation and retroelement content. Both MSLL and HMPR reads were found to be substantially gene-enriched, with the SalI MSLL libraries being the most highly enriched (31% align to an EST contig), while the HMPR clones exhibited exceptional depletion of repetitive DNA (to ~11%). These two techniques were compared with other gene-enrichment methods, and shown to be complementary. Conclusion MSLL technology provides an unparalleled approach for mapping the epigenetic status of repetitive blocks and for identifying sequences mis-identified as genes. Although the types and natures of epigenetic boundaries are barely understood at this time, MSLL technology flags both approximate boundaries and methylated genes that deserve additional investigation. MSLL and HMPR sequences provide a valuable resource for maize genome annotation, and are a uniquely valuable complement to any plant genome sequencing project. In order to make these results fully accessible to the community, a web display was developed that shows the alignment of MSLL, HMPR, and other gene-rich sequences to the BACs; this display is continually updated with the latest ESTs and BAC sequences.
Collapse
Affiliation(s)
- William Nelson
- Arizona Genomics Computational Laboratory, BIO5 Institute, University of Arizona, Tucson, Arizona, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Zhang MQ. Using MZEF to find internal coding exons. CURRENT PROTOCOLS IN BIOINFORMATICS 2008; Chapter 4:Unit 4.2. [PMID: 18792940 DOI: 10.1002/0471250953.bi0402s00] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
MZEF (Michael Zhang's Exon Finder) was designed to help identify one of the most important classes of exons, i.e. the internal coding exons, in human genomic DNA sequences. It is neither for predicting intronless genes, nor for assembling predicted exons into complete gene models. There is also a mouse version (mMZEF) and an Arabidopsis version (aMZEF). This unit presents the Unix and Web versions of MZEF and reviews how to interpret the MZEF results.
Collapse
Affiliation(s)
- Micheal Q Zhang
- Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, USA
| |
Collapse
|
21
|
Abstract
Comparative genomics is a powerful tool for gaining insight into genomic function and evolution. However, in plants, sequence data that would enable detailed comparisons of both coding and noncoding regions have been limited in availability. Here we report the generation and analysis of sequences for an unduplicated conserved syntenic segment (CSS) in the genomes of five members of the agriculturally important plant family Solanaceae. This CSS includes a 105-kb region of tomato chromosome 2 and orthologous regions of the potato, eggplant, pepper, and petunia genomes. With a total neutral divergence of 0.73-0.78 substitutions/site, these sequences are similar enough that most noncoding regions can be aligned, yet divergent enough to be informative about evolutionary dynamics and selective pressures. The CSS contains 17 distinct genes with generally conserved order and orientation, but with numerous small-scale differences between species. Our analysis indicates that the last common ancestor of these species lived approximately 27-36 million years ago, that more than one-third of short genomic segments (5-15 bp) are under selection, and that more than two-thirds of selected bases fall in noncoding regions. In addition, we identify genes under positive selection and analyze hundreds of conserved noncoding elements. This analysis provides a window into 30 million years of plant evolution in the absence of polyploidization.
Collapse
|
22
|
Uberbacher EC, Hyatt D, Shah M. GrailEXP and Genome Analysis Pipeline for genome annotation. ACTA ACUST UNITED AC 2008; Chapter 6:Unit 6.5. [PMID: 18428363 DOI: 10.1002/0471142905.hg0605s39] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The Gene Recognition and Analysis Internet Link (GRAIL) is one of the most widely used systems for evaluating the protein-coding potential of anonymous DNA sequences. This unit describes the use of the XGRAIL and genQuest client-server applications to locate exons in DNA sequences, to develop gene models, and to search databases for homologs. A support protocol describes how to obtain the GRAIL and genQuest client software by anonymous FTP.
Collapse
|
23
|
An artificial neural network method for combining gene prediction based on equitable weights. Neurocomputing 2008. [DOI: 10.1016/j.neucom.2007.07.019] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
24
|
Levitsky VG, Ignatieva EV, Ananko EA, Turnaev II, Merkulova TI, Kolchanov NA, Hodgman TC. Effective transcription factor binding site prediction using a combination of optimization, a genetic algorithm and discriminant analysis to capture distant interactions. BMC Bioinformatics 2007; 8:481. [PMID: 18093302 PMCID: PMC2265442 DOI: 10.1186/1471-2105-8-481] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2007] [Accepted: 12/19/2007] [Indexed: 12/22/2022] Open
Abstract
Background Reliable transcription factor binding site (TFBS) prediction methods are essential for computer annotation of large amount of genome sequence data. However, current methods to predict TFBSs are hampered by the high false-positive rates that occur when only sequence conservation at the core binding-sites is considered. Results To improve this situation, we have quantified the performance of several Position Weight Matrix (PWM) algorithms, using exhaustive approaches to find their optimal length and position. We applied these approaches to bio-medically important TFBSs involved in the regulation of cell growth and proliferation as well as in inflammatory, immune, and antiviral responses (NF-κB, ISGF3, IRF1, STAT1), obesity and lipid metabolism (PPAR, SREBP, HNF4), regulation of the steroidogenic (SF-1) and cell cycle (E2F) genes expression. We have also gained extra specificity using a method, entitled SiteGA, which takes into account structural interactions within TFBS core and flanking regions, using a genetic algorithm (GA) with a discriminant function of locally positioned dinucleotide (LPD) frequencies. To ensure a higher confidence in our approach, we applied resampling-jackknife and bootstrap tests for the comparison, it appears that, optimized PWM and SiteGA have shown similar recognition performances. Then we applied SiteGA and optimized PWMs (both separately and together) to sequences in the Eukaryotic Promoter Database (EPD). The resulting SiteGA recognition models can now be used to search sequences for BSs using the web tool, SiteGA. Analysis of dependencies between close and distant LPDs revealed by SiteGA models has shown that the most significant correlations are between close LPDs, and are generally located in the core (footprint) region. A greater number of less significant correlations are mainly between distant LPDs, which spanned both core and flanking regions. When SiteGA and optimized PWM models were applied together, this substantially reduced false positives at least at higher stringencies. Conclusion Based on this analysis, SiteGA adds substantial specificity even to optimized PWMs and may be considered for large-scale genome analysis. It adds to the range of techniques available for TFBS prediction, and EPD analysis has led to a list of genes which appear to be regulated by the above TFs.
Collapse
Affiliation(s)
- Victor G Levitsky
- Institute of Cytology and Genetics SB RAS, Novosibirsk, 630090, Russia.
| | | | | | | | | | | | | |
Collapse
|
25
|
Kashyap L, Tabish M, Ganesh S, Dubey D. Identification and comparative analysis of novel alternatively spliced transcripts of RhoGEF domain encoding gene in C. elegans and C. briggsae. Bioinformation 2007; 2:43-9. [PMID: 18188419 PMCID: PMC2174416 DOI: 10.6026/97320630002043] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2007] [Revised: 08/16/2007] [Accepted: 08/23/2007] [Indexed: 11/23/2022] Open
Abstract
Y95B8A.12 gene of C. elegans encodes RhoGEF domain, which is a novel module in the Guanine nucleotide exchange factors (GEFs). Alternative splicing increases transcriptome and proteome diversification. Y95B8A.12 gene has two reported alternatively spliced transcripts by the C. elegans genome sequencing consortium. In the work presented here, we report the presence of four new spliced transcripts of Y95B8A.12 arising as a result of alternative splicing in the pre-mRNA encoded by Y95B8A.12 gene. Our methodology involved the use of various gene or exon finding programmes and several other bioinformatics tools followed by experimental validation. We have also studied alternative splicing pattern in RhoGEF domain encoding orthologues gene from C. briggsae and have obtained very similar results. These new unreported spliced transcripts, which were not detected through conventional approaches, not only point towards the extent of alternative splicing in C. elegans genes but also emphasize towards the need of analyzing genome data using a combinations of bioinformatics tools to delineate all possible gene products.
Collapse
Affiliation(s)
- Luv Kashyap
- Department of Biochemistry, Faculty of Life Sciences, Aligarh Muslim University, Aligarh, India
| | | | | | | |
Collapse
|
26
|
Stergiopoulos I, Groenewald M, Staats M, Lindhout P, Crous PW, De Wit PJGM. Mating-type genes and the genetic structure of a world-wide collection of the tomato pathogen Cladosporium fulvum. Fungal Genet Biol 2007; 44:415-29. [PMID: 17178244 DOI: 10.1016/j.fgb.2006.11.004] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2006] [Revised: 11/06/2006] [Accepted: 11/07/2006] [Indexed: 11/17/2022]
Abstract
Two mating-type genes, designated MAT1-1-1 and MAT1-2-1, were cloned and sequenced from the presumed asexual ascomycete Cladosporium fulvum (syn. Passalora fulva). The encoded products are highly homologous to mating-type proteins from members of the Mycosphaerellaceae, such as Mycosphaerella graminicola and Cercospora beticola. In addition, the two MAT idiomorphs of C. fulvum showed regions of homology and each contained one additional putative ORF without significant similarity to known sequences. The distribution of the two mating-type genes in a world-wide collection of 86 C. fulvum strains showed a departure from a 1:1 ratio (chi(2)=4.81, df=1). AFLP analysis revealed a high level of genotypic diversity, while strains of the fungus were identified with similar virulence spectra but distinct AFLP patterns and opposite mating-types. These features could suggest the occurrence of recombination in C. fulvum.
Collapse
MESH Headings
- Amino Acid Sequence
- Cladosporium/genetics
- Cladosporium/growth & development
- Cloning, Molecular
- DNA, Fungal/chemistry
- DNA, Fungal/genetics
- Gene Expression Regulation, Fungal
- Genes, Mating Type, Fungal/genetics
- Genetic Variation
- Haplotypes
- Solanum lycopersicum/microbiology
- Models, Genetic
- Molecular Sequence Data
- Phylogeny
- Polymorphism, Restriction Fragment Length
- Sequence Alignment
- Sequence Analysis, DNA
- Sequence Homology, Amino Acid
Collapse
Affiliation(s)
- Ioannis Stergiopoulos
- Laboratory of Phytopathology, Wageningen University and Research Centre, Binnenhaven 5, 6709 PD Wageningen, The Netherlands
| | | | | | | | | | | |
Collapse
|
27
|
Kashyap L, Tabish M. Comparative analysis of various gene finders specific to Caenorhabditis elegans genome. Bioinformation 2006; 1:203-7. [PMID: 17597889 PMCID: PMC1891687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2006] [Revised: 09/12/2006] [Accepted: 02/10/2006] [Indexed: 11/30/2022] Open
Abstract
Computational gene prediction and identifying alternatively spliced isoforms have always been a challenging task. In this paper, we describe the performance of three gene/exon finding programmes namely Fex, Gen view2 and Gene builder capable of predicting open reading frames or exons for a given set of sequences from C. elegans genome. The predicted exons were compared with the 'sequencing consortium' identified exons and degree of consensus among them is discussed. We found that exon prediction by Fex was similar to the consortium prediction as compared to Gen view2 and Gene builder results. Interestingly, some exons (six exons in five genes) predicted positive only by Fex and not by the 'sequencing consortium' are found at the C. elegans EST database. This data is critical for further debate and discussion on gene finding in C. elegans.
Collapse
Affiliation(s)
| | - Mohammad Tabish
- Department of Biochemistry, Faculty of Life Sciences, A. M. University, Aligarh, U.P. 202002, India
| |
Collapse
|
28
|
Abstract
We introduce a new system, called shortHMM, for predicting exons, which predicts individual exons using two related genomes. In this system, we build a hidden semi-Markov model to identify exons. In the hidden Markov model, we propose joint probability models of nucleotides in introns, splice sites, 5'UTR, 3'UTR, and intergenic regions by exploiting the homology between related genomes. In order to reduce the false positive rate of the hidden Markov model, we develop a screening process which is able to identify intergenic regions. We then build a classifier by combining the statistics from the hidden Markov model and the screening process. We implement shortHMM on human-mouse sequence alignments. The source codes are available at < www.stat.purdue.edu/ jingwu/hmm >. Compared to TWINSCAN and SLAM, shortHMM is substantially more powerful in identifying AT-rich RefSeq exons (8% more AT-rich RefSeq exons were predicted), as well as slightly more powerful in identifying RefSeq exons (3-10% more RefSeq exons were predicted), at a similar or lower false positive rate, with less computing time and with less memory usage. Last, shortHMM is also capable of finding new potential exons.
Collapse
Affiliation(s)
- Jing Wu
- Department of Statistics, Purdue University, West Lafayette, Indiana 47906, USA.
| | | |
Collapse
|
29
|
Abstract
This article introduces the field of bioinformatics and describes bioinformatic approaches and their application to the study of protein allergens. The predominant bioinformatics tools and resources are listed and discussed.
Collapse
Affiliation(s)
- Pinar Kondu Akalin
- Iontek, Meridyen Is Merkezi Ali Riza Gurcan Cad. Cirpici Yolu, Istanbul 34010, Turkey.
| |
Collapse
|
30
|
Agrawal R, Stormo GD. Using mRNAs lengths to accurately predict the alternatively spliced gene products in Caenorhabditis elegans. Bioinformatics 2006; 22:1239-44. [PMID: 16595562 DOI: 10.1093/bioinformatics/btl076] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Computational gene prediction methods are an important component of whole genome analyses. While ab initio gene finders have demonstrated major improvements in accuracy, the most reliable methods are evidence-based gene predictors. These algorithms can rely on several different sources of evidence including predictions from multiple ab initio gene finders, matches to known proteins, sequence conservation and partial cDNAs to predict the final product. Despite the success of these algorithms, prediction of complete gene structures, especially for alternatively spliced products, remains a difficult task. RESULTS LOCUS (Length Optimized Characterization of Unknown Spliceforms) is a new evidence-based gene finding algorithm which integrates a length-constraint into a dynamic programming-based framework for prediction of gene products. On a Caenorhabditis elegans test set of alternatively spliced internal exons, its performance exceeds that of current ab initio gene finders and in most cases can accurately predict the correct form of all the alternative products. As the length information used by the algorithm can be obtained in a high-throughput fashion, we propose that integration of such information into a gene-prediction pipeline is feasible and doing so may improve our ability to fully characterize the complete set of mRNAs for a genome. AVAILABILITY LOCUS is available from http://ural.wustl.edu/software.html
Collapse
Affiliation(s)
- Ritesh Agrawal
- Department of Genetics, Washington University School of Medicine 660 S. Euclid, Campus Box 8232, St. Louis, MO 63110, USA
| | | |
Collapse
|
31
|
Brzoska PM, Brown C, Cassel M, Ceccardi T, Di Francisco V, Dubman A, Evans J, Fang R, Harris M, Hoover J, Hu F, Larry C, Li P, Malicdem M, Maltchenko S, Shannon M, Perkins S, Poulter K, Webster-Laig M, Xiao C, Young S, Spier G, Guegler K, Gilbert D, Samaha RR. An efficient and high-throughput approach for experimental validation of novel human gene predictions. Genomics 2006; 87:437-45. [PMID: 16406193 DOI: 10.1016/j.ygeno.2005.11.016] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2005] [Revised: 10/26/2005] [Accepted: 11/24/2005] [Indexed: 11/29/2022]
Abstract
A highly automated RT-PCR-based approach has been established to validate novel human gene predictions with no prior experimental evidence of mRNA splicing (ab initio predictions). Ab initio gene predictions were selected for high-throughput validation using predicted protein classification, sequence similarity to other genomes, colocalization with an MPSS tag, or microarray expression. Initial microarray prioritization followed by RT-PCR validation was the most efficient combination, resulting in approximately 35% of the ab initio predictions being validated by RT-PCR. Of the 7252 novel genes that were prioritized and processed, 796 constituted real transcripts. In addition, high-throughput RACE successfully extended the 5' and/or 3' ends of >60% of RT-PCR-validated genes. Reevaluation of these transcripts produced 574 novel transcripts using RefSeq as a reference. RT-PCR sequencing in combination with RACE on ab initio gene predictions could be used to define the transcriptome across all species.
Collapse
Affiliation(s)
- Pius M Brzoska
- Applied Biosystems, 850 Lincoln Center Drive, Foster City, CA 94404, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
32
|
Bekaert M, Richard H, Prum B, Rousset JP. Identification of programmed translational -1 frameshifting sites in the genome of Saccharomyces cerevisiae. Genome Res 2006; 15:1411-20. [PMID: 16204194 PMCID: PMC1240084 DOI: 10.1101/gr.4258005] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Frameshifting is a recoding event that allows the expression of two polypeptides from the same mRNA molecule. Most recoding events described so far are used by viruses and transposons to express their replicase protein. The very few number of cellular proteins known to be expressed by a -1 ribosomal frameshifting has been identified by chance. The goal of the present work was to set up a systematic strategy, based on complementary bioinformatics, molecular biology, and functional approaches, without a priori knowledge of the mechanism involved. Two independent methods were devised. The first looks for genomic regions in which two ORFs, each carrying a protein pattern, are in a frameshifted arrangement. The second uses Hidden Markov Models and likelihood in a two-step approach. When this strategy was applied to the Saccharomyces cerevisiae genome, 189 candidate regions were found, of which 58 were further functionally investigated. Twenty-eight of them expressed a full-length mRNA covering the two ORFs, and 11 showed a -1 frameshift efficiency varying from 5% to 13% (50-fold higher than background), some of which corresponds to genes with known functions. From other ascomycetes, four frameshifted ORFs are found fully conserved. Strikingly, most of the candidates do not display a classical viral-like frameshift signal and would have escaped a search based on current models of frameshifting. These results strongly suggest that -1 frameshifting might be more widely distributed than previously thought.
Collapse
Affiliation(s)
- Michaël Bekaert
- Institut de Génétique et Microbiologie CNRS UMR 8621, Université Paris-Sud, 91405 Orsay Cedex, France
| | | | | | | |
Collapse
|
33
|
Martinez-Contreras R, Fisette JF, Nasim FUH, Madden R, Cordeau M, Chabot B. Intronic binding sites for hnRNP A/B and hnRNP F/H proteins stimulate pre-mRNA splicing. PLoS Biol 2006; 4:e21. [PMID: 16396608 PMCID: PMC1326234 DOI: 10.1371/journal.pbio.0040021] [Citation(s) in RCA: 175] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2005] [Accepted: 11/15/2005] [Indexed: 12/20/2022] Open
Abstract
hnRNP A/B proteins modulate the alternative splicing of several mammalian and viral pre-mRNAs, and are typically viewed as proteins that enforce the activity of splicing silencers. Here we show that intronic hnRNP A/B–binding sites (ABS) can stimulate the in vitro splicing of pre-mRNAs containing artificially enlarged introns. Stimulation of in vitro splicing could also be obtained by providing intronic ABS in trans through the use of antisense oligonucleotides containing a non-hybridizing ABS-carrying tail. ABS-tailed oligonucleotides also improved the in vivo inclusion of an alternative exon flanked by an enlarged intron. Notably, binding sites for hnRNP F/H proteins (FBS) replicate the activity of ABS by improving the splicing of an enlarged intron and by modulating 5′ splice-site selection. One hypothesis formulated to explain these effects is that bound hnRNP proteins self-interact to bring in closer proximity the external pair of splice sites. Consistent with this model, positioning FBS or ABS at both ends of an intron was required to stimulate splicing of some pre-mRNAs. In addition, a computational analysis of the configuration of putative FBS and ABS located at the ends of introns supports the view that these motifs have evolved to support cooperative interactions. Our results document a positive role for the hnRNP A/B and hnRNP F/H proteins in generic splicing, and suggest that these proteins may modulate the conformation of mammalian pre-mRNAs. Typically viewed as enforcing splicing silencers, hnRNP A/B proteins may facilitate splicing by modulating the conformation of mammalian pre-mRNAs.
Collapse
Affiliation(s)
- Rebeca Martinez-Contreras
- 1 RNA/RNP Group, Département de microbiologie et d'infectiologie, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Jean-François Fisette
- 1 RNA/RNP Group, Département de microbiologie et d'infectiologie, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Faiz-ul Hassan Nasim
- 1 RNA/RNP Group, Département de microbiologie et d'infectiologie, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Richard Madden
- 2 Centre de genomique fonctionnelle de Sherbrooke, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Mélanie Cordeau
- 1 RNA/RNP Group, Département de microbiologie et d'infectiologie, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, Québec, Canada
| | - Benoit Chabot
- 1 RNA/RNP Group, Département de microbiologie et d'infectiologie, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, Québec, Canada
- 2 Centre de genomique fonctionnelle de Sherbrooke, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, Québec, Canada
| |
Collapse
|
34
|
Abstract
Driven by competition, automation, and technology, the genomics community has far exceeded its ambition to sequence the human genome by 2005. By analyzing mammalian genomes, we have shed light on the history of our DNA sequence, determined that alternatively spliced RNAs and retroposed pseudogenes are incredibly abundant, and glimpsed the apparently huge number of non-coding RNAs that play significant roles in gene regulation. Ultimately, genome science is likely to provide comprehensive catalogs of these elements. However, the methods we have been using for most of the last 10 years will not yield even one complete open reading frame (ORF) for every gene--the first plateau on the long climb toward a comprehensive catalog. These strategies--sequencing randomly selected cDNA clones, aligning protein sequences identified in other organisms, sequencing more genomes, and manual curation--will have to be supplemented by large-scale amplification and sequencing of specific predicted mRNAs. The steady improvements in gene prediction that have occurred over the last 10 years have increased the efficacy of this approach and decreased its cost. In this Perspective, I review the state of gene prediction roughly 10 years ago, summarize the progress that has been made since, argue that the primary ORF identification methods we have relied on so far are inadequate, and recommend a path toward completing the Catalog of Protein Coding Genes, Version 1.0.
Collapse
Affiliation(s)
- Michael R Brent
- Laboratory for Computational Genomics and Department of Computer Science, Washington University, St. Louis, Missouri 63130, USA.
| |
Collapse
|
35
|
Wagner JL, Palti Y, DiDario D, Faraco J. Sequence of the canine major histocompatibility complex region containing non-classical class I genes. ACTA ACUST UNITED AC 2005; 65:549-55. [PMID: 15896203 DOI: 10.1111/j.1399-0039.2005.00411.x] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
We have sequenced a segment of 150,102 nucleotides of canine major histocompatibility complex (MHC) DNA, corresponding to the junction of the class I and class III regions. The distal portion contained five class III genes including two tumor necrosis factor genes and the proximal portion contained five genes or pseudogenes belonging to the class I region. The order of the class III region genes was conserved as in the porcine and human MHC regions. The order of the class Ib loci from the proximal side outwards was DLA-53, DLA-12a, DLA-64, stress-induced phosphoprotein-1, followed by DLA-12. Only DLA-64 and DLA-12 display an overall predicted protein sequence compatible with the expression of membrane-anchored glycoproteins. The other class 1b loci do not appear to be functional by sequence analysis. In all, these 10 genes spanned 24% of the total sequence. The remaining 76% comprised of a number of non-coding and repetitive DNA elements including long interspersed nuclear element (LINE) fragments, short interspersed nuclear elements (SINE), and microsatellites.
Collapse
Affiliation(s)
- J L Wagner
- Blood and Marrow Transplant Program, Department of Medicine, Thomas Jefferson University, Philadelphia, PA 19107, USA.
| | | | | | | |
Collapse
|
36
|
Wang Z, Chen Y, Li Y. A brief review of computational gene prediction methods. GENOMICS PROTEOMICS & BIOINFORMATICS 2005; 2:216-21. [PMID: 15901250 PMCID: PMC5187414 DOI: 10.1016/s1672-0229(04)02028-5] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
With the development of genome sequencing for many organisms, more and more raw sequences need to be annotated. Gene prediction by computational methods for finding the location of protein coding regions is one of the essential issues in bioinformatics. Two classes of methods are generally adopted: similarity based searches and ab initio prediction. Here, we review the development of gene prediction methods, summarize the measures for evaluating predictor quality, highlight open problems in this area, and discuss future research directions.
Collapse
Affiliation(s)
- Zhuo Wang
- Biomedical Instrument Institute, Shanghai Jiaotong University, Shanghai 200030, China
- Shanghai Center for Bioinformation Technology, Shanghai 200035, China
- Corresponding authors.
| | - Yazhu Chen
- Biomedical Instrument Institute, Shanghai Jiaotong University, Shanghai 200030, China
| | - Yixue Li
- Shanghai Center for Bioinformation Technology, Shanghai 200035, China
- Corresponding authors.
| |
Collapse
|
37
|
|
38
|
Freund M, Asang C, Kammler S, Konermann C, Krummheuer J, Hipp M, Meyer I, Gierling W, Theiss S, Preuss T, Schindler D, Kjems J, Schaal H. A novel approach to describe a U1 snRNA binding site. Nucleic Acids Res 2004; 31:6963-75. [PMID: 14627829 PMCID: PMC290269 DOI: 10.1093/nar/gkg901] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
RNA duplex formation between U1 snRNA and a splice donor (SD) site can protect pre-mRNA from degradation prior to splicing and initiates formation of the spliceosome. This process was monitored, using sub-genomic HIV-1 expression vectors, by expression analysis of the glycoprotein env, whose formation critically depends on functional SD4. We systematically derived a hydrogen bond model for the complementarity between the free 5' end of U1 snRNA and 5' splice sites and numerous mutations following transient transfection of HeLa-T4+ cells with 5' splice site mutated vectors. The resulting model takes into account number, interdependence and neighborhood relationships of predicted hydrogen bond formation in a region spanning the three most 3' base pairs of the exon (-3 to -1) and the eight most 5' base pairs of the intron (+1 to +8). The model is represented by an algorithm classifying U1 snRNA binding sites which can or cannot functionally substitute SD4 with respect to Rev-mediated env expression. In a data set of 5' splice site mutations of the human ATM gene we found a significant correlation between the algorithmic classification and exon skipping (P = 0.018, chi2-test), showing that the applicability of the proposed model reaches far beyond HIV-1 splicing. However, the algorithmic classification must not be taken as an absolute measure of SD usage as it may be modified by upstream sequence elements. Upstream to SD4 we identified a fragment supporting ASF/SF2 binding. Mutating GAR nucleotide repeats within this site decreased the SD4-dependent Rev-mediated env expression, which could be balanced simply by artificially increasing the complementarity of SD4.
Collapse
Affiliation(s)
- Marcel Freund
- Institut für Virologie, Heinrich-Heine-Universität Düsseldorf, Geb. 22.21, Universitätsstrasse 1, D-40225 Düsseldorf, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
|
40
|
Uberbacher EC, Hyatt D, Shah M. GrailEXP and Genome Analysis Pipeline for genome annotation. CURRENT PROTOCOLS IN BIOINFORMATICS 2004; Chapter 4:Unit4.9. [PMID: 18428726 DOI: 10.1002/0471250953.bi0409s04] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
The Basic Protocol describes the use of GrailEXP, the latest version of the gene finding system from Oak Ridge National Laboratory. GrailEXP provides gene models, by making use of sequence similarity with Expressed Sequence Tags (ESTs) and known genes. GrailEXP also provides alternatively spliced constructs for each gene based on the available EST evidence. The Support Protocol describes the use of the Genome Analysis Pipeline, a web application which allows users to perform comprehensive sequence analysis by offering a selection from a wide choice of supported gene finders, other biological feature finders, and database searches.
Collapse
|
41
|
Zhou Y, Yang L, Wang H, Lu F, Wan H. Prediction of eukaryotic gene structures based on multilevel optimization. ACTA ACUST UNITED AC 2004. [DOI: 10.1007/bf02900313] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
42
|
Zhang L, Luo L. Splice site prediction with quadratic discriminant analysis using diversity measure. Nucleic Acids Res 2003; 31:6214-20. [PMID: 14576308 PMCID: PMC275452 DOI: 10.1093/nar/gkg805] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Based on the conservation of nucleotides at splicing sites and the features of base composition and base correlation around these sites we use the method of increment of diversity combined with quadratic discriminant analysis (IDQD) to study the dependence structure of splicing sites and predict the exons/introns and their boundaries for four model genomes: Caenorhabditis elegans, Arabidopsis thaliana, Drosophila melanogaster and human. The comparison of compositional features between two sequences and the comparison of base dependencies at adjacent or non-adjacent positions of two sequences can be integrated automatically in the increment of diversity (ID). Eight feature variables around a potential splice site are defined in terms of ID. They are integrated in a single formal framework given by IDQD. In our calculations 7 (8) base region around the donor (acceptor) sites have been considered in studying the conservation of nucleotides and sequences of 48 bp on either side of splice sites have been used in studying the compositional and base-correlating features. The windows are enlarged to 16 (donor), 29 (acceptor) and 80 bp (either side) to improve the prediction for human splice sites. The prediction capability of the present method is comparable with the leading splice site detector--GeneSplicer.
Collapse
Affiliation(s)
- Lirong Zhang
- Laboratory of Theoretical Biophysics, Faculty of Science and Technology, Inner Mongolia University, Hohhot, 010021 China
| | | |
Collapse
|
43
|
Matsuyama A, Shiraishi T, Trapasso F, Kuroki T, Alder H, Mori M, Huebner K, Croce CM. Fragile site orthologs FHIT/FRA3B and Fhit/Fra14A2: evolutionarily conserved but highly recombinogenic. Proc Natl Acad Sci U S A 2003; 100:14988-93. [PMID: 14630947 PMCID: PMC299872 DOI: 10.1073/pnas.2336256100] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
Common fragile sites are regions that show elevated susceptibility to DNA damage, leading to alterations that can contribute to cancer development. FRA3B, located at chromosome region 3p14.2, is the most frequently expressed human common fragile site, and allelic losses at FRA3B have been observed in many types of cancer. The FHIT gene, encompassing the FRA3B region, is a tumor-suppressor gene. To identify the features of FHIT/FRA3B that might contribute to fragility, sequences of the human FHIT and the flanking PTPRG gene were compared with those of murine Fhit and Ptprg. Human and mouse orthologous genes, FHIT and Fhit, are more highly conserved through evolution than PTPRG/Ptprg and yet contain more sequence elements that are exquisitely sensitive to genomic rearrangements, such as high-flexibility regions and long interspersed nuclear element 1s, suggesting that common fragile sites serve a function. The conserved AT-rich high-flexibility regions are the most characteristic of common fragile sites.
Collapse
Affiliation(s)
- Ayumi Matsuyama
- Kimmel Cancer Center, Thomas Jefferson University, 233 South 10th Street, Philadelphia, PA 19107, USA
| | | | | | | | | | | | | | | |
Collapse
|
44
|
Tittiger C, Barkawi LS, Bengoa CS, Blomquist GJ, Seybold SJ. Structure and juvenile hormone-mediated regulation of the HMG-CoA reductase gene from the Jeffrey pine beetle, Dendroctonus jeffreyi. Mol Cell Endocrinol 2003; 199:11-21. [PMID: 12581875 DOI: 10.1016/s0303-7207(02)00358-1] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In several pine bark beetle species, juvenile hormone (JH) III regulated 3-hydroxy-3-methylglutaryl coenzyme A reductase (HMG-R) gene expression has an important role in monoterpenoid pheromone production in males. We investigated the structure and regulated expression of the HMG-R gene (HMG-R) in the Jeffrey pine beetle, Dendroctonus jeffreyi. cDNA and genomic sequences were recovered using a combination of library screening and PCR. The transcribed portion of the gene spans 9.8 kb and is interrupted by 13 introns. When compared to vertebrate HMG-Rs, the distribution of intron sites suggests a functional role for those in the 5' untranslated region and membrane anchor domains. Northern blots show that topically applied JH III stimulates HMG-R expression up to 30-fold in male D. jeffreyi, compared to untreated insects, in both a dose- and time-dependent manner. There was no increase in expression levels in similarly treated female insects. The expression pattern is consistent with the production of monoterpenoid pheromone components in male D. jeffreyi, and suggests the utility of the system as a new tool for studying the mechanism of JH action.
Collapse
Affiliation(s)
- Claus Tittiger
- Department of Biochemistry, Mail Stop 330, University of Nevada, Reno 89557, USA.
| | | | | | | | | |
Collapse
|
45
|
Kim KB, Park K, Kong EB. A method for identifying splice sites and translation start sites in human genomic sequences. JOURNAL OF BIOCHEMISTRY AND MOLECULAR BIOLOGY 2002; 35:513-7. [PMID: 12359095 DOI: 10.5483/bmbrep.2002.35.5.513] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
We describe a new method for identifying the sequences that signal the start of translation, and the boundaries between exons and introns (donor and acceptor sites) in human mRNA. According to the mandatory keyword, ORGANISM, and feature key, CDS, a large set of standard data for each signal site was extracted from the ASCII flat file, gbpri.seq, in the GenBank release 108.0. This was used to generate the scoring matrices, which summarize the sequence information for each signal site. The scoring matrices take into account the independent nucleotide frequencies between adjacent bases in each position within the signal site regions, and the relative weight on each nucleotide in proportion to their probabilities in the known signal sites. Using a scoring scheme that is based on the nucleotide scoring matrices, the method has great sensitivity and specificity when used to locate signals in uncharacterized human genomic DNA. These matrices are especially effective at distinguishing true and false sites.
Collapse
Affiliation(s)
- Ki-Bong Kim
- Information Technology Institute, SmallSoft Co, Ltd, Daejeon 305-811, Korea.
| | | | | |
Collapse
|
46
|
Abstract
The human genome sequence is the book of our life. Buried in this large volume are our genes, which are scattered as small DNA fragments throughout the genome and comprise a small percentage of the total text. Finding these indistinct 'needles' in a vast genomic 'haystack' can be extremely challenging. In response to this challenge, computational prediction approaches have proliferated in recent years that predict the location and structure of genes. Here, I discuss these approaches and explain why they have become essential for the analyses of newly sequenced genomes.
Collapse
Affiliation(s)
- Michael Q Zhang
- Watson School of Biological Sciences, Cold Spring Harbor Laboratory, 1 Bungtown Road, PO Box 100, Cold Spring Harbor, New York 11724, USA.
| |
Collapse
|
47
|
Li W, Bernaola-Galván P, Haghighi F, Grosse I. Applications of recursive segmentation to the analysis of DNA sequences. COMPUTERS & CHEMISTRY 2002; 26:491-510. [PMID: 12144178 DOI: 10.1016/s0097-8485(02)00010-4] [Citation(s) in RCA: 64] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Recursive segmentation is a procedure that partitions a DNA sequence into domains with a homogeneous composition of the four nucleotides A, C, G and T. This procedure can also be applied to any sequence converted from a DNA sequence, such as to a binary strong(G + C)/weak(A + T) sequence, to a binary sequence indicating the presence or absence of the dinucleotide CpG, or to a sequence indicating both the base and the codon position information. We apply various conversion schemes in order to address the following five DNA sequence analysis problems: isochore mapping, CpG island detection, locating the origin and terminus of replication in bacterial genomes, finding complex repeats in telomere sequences, and delineating coding and noncoding regions. We find that the recursive segmentation procedure can successfully detect isochore borders, CpG islands, and the origin and terminus of replication, but it needs improvement for detecting complex repeats as well as borders between coding and noncoding regions.
Collapse
Affiliation(s)
- Wentian Li
- Center for Genomics and Human Genetics, North Shore-LIJ Research Institute, Manhasset, NY 11030, USA.
| | | | | | | |
Collapse
|
48
|
Cragg RA, Christie GR, Phillips SR, Russi RM, Küry S, Mathers JC, Taylor PM, Ford D. A novel zinc-regulated human zinc transporter, hZTL1, is localized to the enterocyte apical membrane. J Biol Chem 2002; 277:22789-97. [PMID: 11937503 DOI: 10.1074/jbc.m200577200] [Citation(s) in RCA: 111] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Zinc is essential to a wide range of cellular processes; therefore, it is important to elucidate the molecular mechanisms of zinc homeostasis. To date, no zinc transporters expressed at the enterocyte apical membrane, and so essential to mammalian zinc homeostasis, have been discovered. We identified hZTL1 as a human expressed sequence tag with homology to the basolateral enterocyte zinc transporter ZnT1 and deduced the full-length cDNA sequence by PCR. The protein of 523 amino acids belongs to the cation diffusion facilitator family of membrane transporters. Unusually, the predicted topology comprises 12 rather than 6 transmembrane domains. ZTL1 mRNA was detected by reverse transcription-PCR in a range of mouse tissues. A Myc-tagged hZTL1 clone was expressed in transiently transfected polarized human intestinal Caco-2 cells at the apical membrane. Expression of hZTL1 mRNA in Caco-2 cells increased with zinc supplementation of the nutrient medium; however, in the placental cell line JAR hZTL1 appeared not to be regulated by zinc. Heterologous expression of hZTL1 in Xenopus laevis oocytes increased zinc uptake across the plasma membrane. The localization, regulatory properties, and function of hZTL1 indicate a role in regulating the absorption of dietary zinc across the apical enterocyte membrane.
Collapse
Affiliation(s)
- Ruth A Cragg
- Department of Biological and Nutritional Sciences, University of Newcastle, Kings Rd., Newcastle upon Tyne, NE1 7RU, United Kingdom
| | | | | | | | | | | | | | | |
Collapse
|
49
|
Graham LA, Davies PL. The odorant-binding proteins of Drosophila melanogaster: annotation and characterization of a divergent gene family. Gene 2002; 292:43-55. [PMID: 12119098 DOI: 10.1016/s0378-1119(02)00672-8] [Citation(s) in RCA: 86] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Insect odorant-binding proteins (OBPs) are thought to facilitate the delivery of hydrophobic odorants, such as sex pheromones or food odors, to receptors on sensory neurons. Increasingly, OBP family members are also being found in non-sensory tissues where they might carry other types of small hydrophobic molecules. They are identifiable by four or six conserved Cys residues and contain six alpha-helices which enclose a hydrophobic ligand-binding pocket. Through exhaustive BLAST searches we have increased the total number of OBPs identified in Drosophila melanogaster to 38, and have amplified the DNA complementary to RNA corresponding to 21 of these by reverse transcriptase polymerase chain reaction. Isoforms frequently share less than 30% amino acid identity and appear to have radically changed since the separation of the major insect orders. However, their sequences are consistent with known OBP structures. Most are located in clusters of between four and 14 genes and several were unusual in that they contained additions, deletions, or fusions. These hexa-helical insect OBPs are structurally unrelated to the functionally analogous lipocalin-like beta-barrel OBPs of vertebrates. As only two lipocalin-like proteins have been found in D. melanogaster, these helical proteins appear to be the dominant carrier of small hydrophobic molecules in insects.
Collapse
Affiliation(s)
- Laurie A Graham
- Department of Biochemistry, Queen's University, Kingston, Ontario K7L 3N6, Canada.
| | | |
Collapse
|
50
|
Maddouri M, Elloumi M. A data mining approach based on machine learning techniques to classify biological sequences. Knowl Based Syst 2002. [DOI: 10.1016/s0950-7051(01)00143-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|