1
|
Liu X, Xiao C, Xu X, Zhang J, Mo F, Chen JY, Delihas N, Zhang L, An NA, Li CY. Origin of functional de novo genes in humans from "hopeful monsters". WILEY INTERDISCIPLINARY REVIEWS. RNA 2024; 15:e1845. [PMID: 38605485 DOI: 10.1002/wrna.1845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 03/13/2024] [Accepted: 03/18/2024] [Indexed: 04/13/2024]
Abstract
For a long time, it was believed that new genes arise only from modifications of preexisting genes, but the discovery of de novo protein-coding genes that originated from noncoding DNA regions demonstrates the existence of a "motherless" origination process for new genes. However, the features, distributions, expression profiles, and origin modes of these genes in humans seem to support the notion that their origin is not a purely "motherless" process; rather, these genes arise preferentially from genomic regions encoding preexisting precursors with gene-like features. In such a case, the gene loci are typically not brand new. In this short review, we will summarize the definition and features of human de novo genes and clarify their process of origination from ancestral non-coding genomic regions. In addition, we define the favored precursors, or "hopeful monsters," for the origin of de novo genes and present a discussion of the functional significance of these young genes in brain development and tumorigenesis in humans. This article is categorized under: RNA Evolution and Genomics > RNA and Ribonucleoprotein Evolution.
Collapse
Affiliation(s)
- Xiaoge Liu
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Chunfu Xiao
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Xinwei Xu
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Jie Zhang
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Fan Mo
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Stem Cell and Regeneration, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Jia-Yu Chen
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Chemistry and Biomedicine Innovation Center (ChemBIC), Nanjing University, Nanjing, China
| | - Nicholas Delihas
- Department of Microbiology and Immunology, Renaissance School of Medicine, Stony Brook University, Stony Brook, New York, USA
| | - Li Zhang
- Chinese Institute for Brain Research, Beijing, China
| | - Ni A An
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
| | - Chuan-Yun Li
- State Key Laboratory of Protein and Plant Gene Research, Laboratory of Bioinformatics and Genomic Medicine, Institute of Molecular Medicine, College of Future Technology, Peking University, Beijing, China
- Chinese Institute for Brain Research, Beijing, China
- Southwest United Graduate School, Kunming, China
| |
Collapse
|
2
|
Hu H, Dong B, Fan X, Wang M, Wang T, Liu Q. Mutational Bias and Natural Selection Driving the Synonymous Codon Usage of Single-Exon Genes in Rice (Oryza sativa L.). RICE (NEW YORK, N.Y.) 2023; 16:11. [PMID: 36849744 PMCID: PMC9971424 DOI: 10.1186/s12284-023-00627-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 02/16/2023] [Indexed: 06/18/2023]
Abstract
The relative abundance of single-exon genes (SEGs) in higher plants is perplexing. Uncovering the synonymous codon usage pattern of SEGs will benefit for further understanding their underlying evolutionary mechanism in plants. Using internal correspondence analysis (ICA), we reveal a significant difference in synonymous codon usage between SEGs and multiple-exon genes (MEGs) in rice. But the effect is weak, accounting for only 2.61% of the total codon usage variability. SEGs and MEGs contain remarkably different base compositions, and are under clearly differential selective constraints, with the former having higher GC content, and evolving relatively faster during evolution. In the group of SEGs, the variability in synonymous codon usage among genes is partially due to the variations in GC content, gene function, and gene expression level, which accounts for 22.03%, 5.99%, and 3.32% of the total codon usage variability, respectively. Therefore, mutational bias and natural selection should work on affecting the synonymous codon usage of SEGs in rice. These findings may deepen our knowledge for the mechanisms of origination, differentiation and regulation of SEGs in plants.
Collapse
Affiliation(s)
- Huan Hu
- The Key Laboratory for Quality Improvement of Agricultural Products of Zhejiang Province, College of Advanced Agricultural Sciences, Zhejiang A & F University, Lin'an, Hangzhou, 311300, People's Republic of China
| | - Boran Dong
- The Key Laboratory for Quality Improvement of Agricultural Products of Zhejiang Province, College of Advanced Agricultural Sciences, Zhejiang A & F University, Lin'an, Hangzhou, 311300, People's Republic of China
| | - Xiaoji Fan
- The Key Laboratory of Microbial Technology and Bioinformatics of Zhejiang Province, Hangzhou, 310012, People's Republic of China
| | - Meixia Wang
- The Key Laboratory of Microbial Technology and Bioinformatics of Zhejiang Province, Hangzhou, 310012, People's Republic of China
| | - Tingzhang Wang
- The Key Laboratory of Microbial Technology and Bioinformatics of Zhejiang Province, Hangzhou, 310012, People's Republic of China.
| | - Qingpo Liu
- The Key Laboratory for Quality Improvement of Agricultural Products of Zhejiang Province, College of Advanced Agricultural Sciences, Zhejiang A & F University, Lin'an, Hangzhou, 311300, People's Republic of China.
| |
Collapse
|
3
|
Hijazi H, Reis LM, Pehlivan D, Bernstein JA, Muriello M, Syverson E, Bonner D, Estiar MA, Gan-Or Z, Rouleau GA, Lyulcheva E, Greenhalgh L, Tessarech M, Colin E, Guichet A, Bonneau D, van Jaarsveld RH, Lachmeijer AMA, Ruaud L, Levy J, Tabet AC, Ploski R, Rydzanicz M, Kępczyński Ł, Połatyńska K, Li Y, Fatih JM, Marafi D, Rosenfeld JA, Coban-Akdemir Z, Bi W, Gibbs RA, Hobson GM, Hunter JV, Carvalho CMB, Posey JE, Semina EV, Lupski JR. TCEAL1 loss-of-function results in an X-linked dominant neurodevelopmental syndrome and drives the neurological disease trait in Xq22.2 deletions. Am J Hum Genet 2022; 109:2270-2282. [PMID: 36368327 PMCID: PMC9748253 DOI: 10.1016/j.ajhg.2022.10.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 10/13/2022] [Indexed: 11/12/2022] Open
Abstract
An Xq22.2 region upstream of PLP1 has been proposed to underly a neurological disease trait when deleted in 46,XX females. Deletion mapping revealed that heterozygous deletions encompassing the smallest region of overlap (SRO) spanning six Xq22.2 genes (BEX3, RAB40A, TCEAL4, TCEAL3, TCEAL1, and MORF4L2) associate with an early-onset neurological disease trait (EONDT) consisting of hypotonia, intellectual disability, neurobehavioral abnormalities, and dysmorphic facial features. None of the genes within the SRO have been associated with monogenic disease in OMIM. Through local and international collaborations facilitated by GeneMatcher and Matchmaker Exchange, we have identified and herein report seven de novo variants involving TCEAL1 in seven unrelated families: three hemizygous truncating alleles; one hemizygous missense allele; one heterozygous TCEAL1 full gene deletion; one heterozygous contiguous deletion of TCEAL1, TCEAL3, and TCEAL4; and one heterozygous frameshift variant allele. Variants were identified through exome or genome sequencing with trio analysis or through chromosomal microarray. Comparison with previously reported Xq22 deletions encompassing TCEAL1 identified a more-defined syndrome consisting of hypotonia, abnormal gait, developmental delay/intellectual disability especially affecting expressive language, autistic-like behavior, and mildly dysmorphic facial features. Additional features include strabismus, refractive errors, variable nystagmus, gastroesophageal reflux, constipation, dysmotility, recurrent infections, seizures, and structural brain anomalies. An additional maternally inherited hemizygous missense allele of uncertain significance was identified in a male with hypertonia and spasticity without syndromic features. These data provide evidence that TCEAL1 loss of function causes a neurological rare disease trait involving significant neurological impairment with features overlapping the EONDT phenotype in females with the Xq22 deletion.
Collapse
Affiliation(s)
- Hadia Hijazi
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Linda M Reis
- Department of Pediatrics and Children's Research Institute, Medical College of Wisconsin and Children's Wisconsin, Milwaukee, WI, USA
| | - Davut Pehlivan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA; Section of Pediatric Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA; Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA; Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, TX, USA
| | - Jonathan A Bernstein
- Department of Pediatrics, Division of Medical Genetics, Stanford School of Medicine, Stanford, CA, USA
| | - Michael Muriello
- Department of Pediatrics and Children's Research Institute, Medical College of Wisconsin and Children's Wisconsin, Milwaukee, WI, USA
| | - Erin Syverson
- Department of Pediatrics and Children's Research Institute, Medical College of Wisconsin and Children's Wisconsin, Milwaukee, WI, USA
| | - Devon Bonner
- Department of Pediatrics, Division of Medical Genetics, Stanford School of Medicine, Stanford, CA, USA
| | - Mehrdad A Estiar
- Department of Human Genetics, McGill University, Montreal, QC, Canada; The Neuro (Montreal Neurological Institute-Hospital), McGill University, Montreal, QC, Canada
| | - Ziv Gan-Or
- Department of Human Genetics, McGill University, Montreal, QC, Canada; The Neuro (Montreal Neurological Institute-Hospital), McGill University, Montreal, QC, Canada; Department of Neurology & Neurosurgery, McGill University, Montreal, QC, Canada
| | - Guy A Rouleau
- Department of Human Genetics, McGill University, Montreal, QC, Canada; The Neuro (Montreal Neurological Institute-Hospital), McGill University, Montreal, QC, Canada; Department of Neurology & Neurosurgery, McGill University, Montreal, QC, Canada
| | - Ekaterina Lyulcheva
- Liverpool Centre for Genomic Medicine, Liverpool Women's Hospital, Liverpool, UK
| | - Lynn Greenhalgh
- Liverpool Centre for Genomic Medicine, Liverpool Women's Hospital, Liverpool, UK
| | - Marine Tessarech
- Department of Medical Genetics, Angers University Hospital, Angers, France; Mitovasc Unit, UMR CNRS 6015-INSERM 1083, University of Angers, Angers, France
| | - Estelle Colin
- Department of Medical Genetics, Angers University Hospital, Angers, France; Mitovasc Unit, UMR CNRS 6015-INSERM 1083, University of Angers, Angers, France
| | - Agnès Guichet
- Department of Medical Genetics, Angers University Hospital, Angers, France; Mitovasc Unit, UMR CNRS 6015-INSERM 1083, University of Angers, Angers, France
| | - Dominique Bonneau
- Department of Medical Genetics, Angers University Hospital, Angers, France; Mitovasc Unit, UMR CNRS 6015-INSERM 1083, University of Angers, Angers, France
| | - R H van Jaarsveld
- Department of Genetics, University Medical Center Utrecht, Utrecht, the Netherlands
| | - A M A Lachmeijer
- Department of Genetics, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Lyse Ruaud
- INSERM UMR1141, Neurodiderot, University of Paris, 75019 Paris, France; APHP.Nord, Robert Debré University Hospital, Department of Genetics, 75019 Paris, France
| | - Jonathan Levy
- APHP.Nord, Robert Debré University Hospital, Department of Genetics, 75019 Paris, France
| | - Anne-Claude Tabet
- APHP.Nord, Robert Debré University Hospital, Department of Genetics, 75019 Paris, France
| | - Rafal Ploski
- Department of Medical Genetics, Medical University of Warsaw, Warsaw, Poland
| | | | - Łukasz Kępczyński
- Department of Genetics, Polish Mother's Memorial Hospital - Research Institute, Łódź, Poland
| | - Katarzyna Połatyńska
- Department of Developmental Neurology an Epileptology, Polish Mother's Memorial Hospital - Research Institute, Łódź, Poland
| | - Yidan Li
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Jawid M Fatih
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Dana Marafi
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Jill A Rosenfeld
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA; Baylor Genetics, Houston, TX, USA
| | - Zeynep Coban-Akdemir
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Weimin Bi
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA; Baylor Genetics, Houston, TX, USA
| | - Richard A Gibbs
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Grace M Hobson
- Department of Research, Nemours Children's Health, Wilmington, DE, USA
| | - Jill V Hunter
- E.B. Singleton Department of Pediatric Radiology, Texas Children's Hospital, Houston, TX, USA
| | - Claudia M B Carvalho
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Jennifer E Posey
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Elena V Semina
- Department of Pediatrics and Children's Research Institute, Medical College of Wisconsin and Children's Wisconsin, Milwaukee, WI, USA; Departments of Ophthalmology and Visual Sciences and Cell Biology, Neurobiology and Anatomy, Medical College of Wisconsin, Milwaukee, WI, USA.
| | - James R Lupski
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA; Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA; Texas Children's Hospital, Houston, TX, USA.
| |
Collapse
|
4
|
Chrystal PW, Lambacher NJ, Doucette LP, Bellingham J, Schiff ER, Noel NCL, Li C, Tsiropoulou S, Casey GA, Zhai Y, Nadolski NJ, Majumder MH, Tagoe J, D'Esposito F, Cordeiro MF, Downes S, Clayton-Smith J, Ellingford J, Mahroo OA, Hocking JC, Cheetham ME, Webster AR, Jansen G, Blacque OE, Allison WT, Au PYB, MacDonald IM, Arno G, Leroux MR. The inner junction protein CFAP20 functions in motile and non-motile cilia and is critical for vision. Nat Commun 2022; 13:6595. [PMID: 36329026 PMCID: PMC9633640 DOI: 10.1038/s41467-022-33820-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 10/03/2022] [Indexed: 11/06/2022] Open
Abstract
Motile and non-motile cilia are associated with mutually-exclusive genetic disorders. Motile cilia propel sperm or extracellular fluids, and their dysfunction causes primary ciliary dyskinesia. Non-motile cilia serve as sensory/signalling antennae on most cell types, and their disruption causes single-organ ciliopathies such as retinopathies or multi-system syndromes. CFAP20 is a ciliopathy candidate known to modulate motile cilia in unicellular eukaryotes. We demonstrate that in zebrafish, cfap20 is required for motile cilia function, and in C. elegans, CFAP-20 maintains the structural integrity of non-motile cilia inner junctions, influencing sensory-dependent signalling and development. Human patients and zebrafish with CFAP20 mutations both exhibit retinal dystrophy. Hence, CFAP20 functions within a structural/functional hub centered on the inner junction that is shared between motile and non-motile cilia, and is distinct from other ciliopathy-associated domains or macromolecular complexes. Our findings suggest an uncharacterised pathomechanism for retinal dystrophy, and potentially for motile and non-motile ciliopathies in general.
Collapse
Affiliation(s)
- Paul W Chrystal
- Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada.
- Department of Medical Genetics, University of Alberta, Edmonton, AB, Canada.
| | - Nils J Lambacher
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
- Centre for Cell Biology, Development, and Disease, Simon Fraser University, Burnaby, BC, Canada
| | - Lance P Doucette
- Department of Ophthalmology & Visual Science, University of Alberta, Edmonton, AB, Canada
| | | | - Elena R Schiff
- Moorfields Eye Hospital, London, UK
- School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland
| | - Nicole C L Noel
- Department of Medical Genetics, University of Alberta, Edmonton, AB, Canada
| | - Chunmei Li
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
- Centre for Cell Biology, Development, and Disease, Simon Fraser University, Burnaby, BC, Canada
| | - Sofia Tsiropoulou
- School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland
| | - Geoffrey A Casey
- Department of Medical Genetics, University of Alberta, Edmonton, AB, Canada
| | - Yi Zhai
- Department of Ophthalmology & Visual Science, University of Alberta, Edmonton, AB, Canada
| | - Nathan J Nadolski
- Division of Anatomy, Department of Surgery, University of Alberta, Edmonton, AB, Canada
| | - Mohammed H Majumder
- Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada
| | - Julia Tagoe
- Lethbridge Outreach Genetics Service, Alberta Health Services, Lethbridge, AB, Canada
| | - Fabiana D'Esposito
- Western Eye Hospital, Imperial College Healthcare NHS Trust, London, UK
- ICORG, Imperial College London, London, UK
| | | | - Susan Downes
- Oxford Eye Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Jill Clayton-Smith
- Manchester Centre for Genomic Medicine, Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
- Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University NHS Foundation Trust, Health Innovation Manchester, Manchester, UK
| | - Jamie Ellingford
- Manchester Centre for Genomic Medicine, Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
- Division of Evolution and Genomic Sciences, School of Biological Sciences, University of Manchester, Manchester, UK
- Genomics England, London, UK
| | - Omar A Mahroo
- UCL Institute of Ophthalmology, London, UK
- Moorfields Eye Hospital, London, UK
| | - Jennifer C Hocking
- Department of Medical Genetics, University of Alberta, Edmonton, AB, Canada
- Division of Anatomy, Department of Surgery, University of Alberta, Edmonton, AB, Canada
- Department of Cell Biology, University of Alberta, Edmonton, AB, Canada
- Women and Children's Health Research Institute, University of Alberta, Edmonton, AB, Canada
| | | | - Andrew R Webster
- UCL Institute of Ophthalmology, London, UK
- Moorfields Eye Hospital, London, UK
| | - Gert Jansen
- Department of Cell Biology, Erasmus University Medical Centre, Rotterdam, The Netherlands
| | - Oliver E Blacque
- School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Belfield, Dublin 4, Ireland
| | - W Ted Allison
- Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada.
- Department of Medical Genetics, University of Alberta, Edmonton, AB, Canada.
| | - Ping Yee Billie Au
- Department of Medical Genetics, Alberta Children's Hospital Research Institute, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.
| | - Ian M MacDonald
- Department of Medical Genetics, University of Alberta, Edmonton, AB, Canada.
- Department of Ophthalmology & Visual Science, University of Alberta, Edmonton, AB, Canada.
| | - Gavin Arno
- UCL Institute of Ophthalmology, London, UK.
- Moorfields Eye Hospital, London, UK.
- North Thames Genomic Laboratory Hub, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK.
| | - Michel R Leroux
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada.
- Centre for Cell Biology, Development, and Disease, Simon Fraser University, Burnaby, BC, Canada.
| |
Collapse
|
5
|
Pengelly RJ, Bakhtiar D, Borovská I, Královičová J, Vořechovský I. Exonic splicing code and protein binding sites for calcium. Nucleic Acids Res 2022; 50:5493-5512. [PMID: 35474482 PMCID: PMC9177970 DOI: 10.1093/nar/gkac270] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Revised: 04/01/2022] [Accepted: 04/05/2022] [Indexed: 11/12/2022] Open
Abstract
Auxilliary splicing sequences in exons, known as enhancers (ESEs) and silencers (ESSs), have been subject to strong selection pressures at the RNA and protein level. The protein component of this splicing code is substantial, recently estimated at ∼50% of the total information within ESEs, but remains poorly understood. The ESE/ESS profiles were previously associated with the Irving-Williams (I-W) stability series for divalent metals, suggesting that the ESE/ESS evolution was shaped by metal binding sites. Here, we have examined splicing activities of exonic sequences that encode protein binding sites for Ca2+, a weak binder in the I-W affinity order. We found that predicted exon inclusion levels for the EF-hand motifs and for Ca2+-binding residues in nonEF-hand proteins were higher than for average exons. For canonical EF-hands, the increase was centred on the EF-hand chelation loop and, in particular, on Ca2+-coordinating residues, with a 1>12>3∼5>9 hierarchy in the 12-codon loop consensus and usage bias at codons 1 and 12. The same hierarchy but a lower increase was observed for noncanonical EF-hands, except for S100 proteins. EF-hand loops preferentially accumulated exon splits in two clusters, one located in their N-terminal halves and the other around codon 12. Using splicing assays and published crosslinking and immunoprecipitation data, we identify candidate trans-acting factors that preferentially bind conserved GA-rich motifs encoding negatively charged amino acids in the loops. Together, these data provide evidence for the high capacity of codons for Ca2+-coordinating residues to be retained in mature transcripts, facilitating their exon-level expansion during eukaryotic evolution.
Collapse
Affiliation(s)
- Reuben J Pengelly
- University of Southampton, Faculty of Medicine, Southampton SO16 6YD, UK
| | - Dara Bakhtiar
- University of Southampton, Faculty of Medicine, Southampton SO16 6YD, UK
| | - Ivana Borovská
- Slovak Academy of Sciences, Centre of Biosciences, 840 05 Bratislava, Slovak Republic
| | - Jana Královičová
- University of Southampton, Faculty of Medicine, Southampton SO16 6YD, UK
- Slovak Academy of Sciences, Centre of Biosciences, 840 05 Bratislava, Slovak Republic
- Slovak Academy of Sciences, Institute of Zoology, 845 06 Bratislava, Slovak Republic
| | - Igor Vořechovský
- University of Southampton, Faculty of Medicine, Southampton SO16 6YD, UK
| |
Collapse
|
6
|
Potapova NA. Nonsense Mutations in Eukaryotes. BIOCHEMISTRY. BIOKHIMIIA 2022; 87:400-412. [PMID: 35790376 DOI: 10.1134/s0006297922050029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 02/14/2022] [Accepted: 03/22/2022] [Indexed: 06/15/2023]
Abstract
Nonsense mutations are a type of mutations which results in a premature termination codon occurrence. In general, these mutations have been considered to be among the most harmful ones which lead to premature protein translation termination and result in shortened nonfunctional polypeptide. However, there is evidence that not all nonsense mutations are harmful as well as some molecular mechanisms exist which allow to avoid pathogenic effects of these mutations. This review addresses relevant information on nonsense mutations in eukaryotic genomes, characteristics of these mutations, and different molecular mechanisms preventing or mitigating harmful effects thereof.
Collapse
Affiliation(s)
- Nadezhda A Potapova
- Kharkevich Institute for Information Transmission Problems (IITP), Russian Academy of Sciences, Moscow, 127051, Russia.
| |
Collapse
|
7
|
Kreienkamp HJ, Wagner M, Weigand H, McConkie-Rossell A, McDonald M, Keren B, Mignot C, Gauthier J, Soucy JF, Michaud JL, Dumas M, Smith R, Löbel U, Hempel M, Kubisch C, Denecke J, Campeau PM, Bain JM, Lessel D. Variant-specific effects define the phenotypic spectrum of HNRNPH2-associated neurodevelopmental disorders in males. Hum Genet 2021; 141:257-272. [PMID: 34907471 PMCID: PMC8807443 DOI: 10.1007/s00439-021-02412-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Accepted: 12/07/2021] [Indexed: 01/10/2023]
Abstract
Bain type of X-linked syndromic intellectual developmental disorder, caused by pathogenic missense variants in HRNRPH2, was initially described in six female individuals affected by moderate-to-severe neurodevelopmental delay. Although it was initially postulated that the condition would not be compatible with life in males, several affected male individuals harboring pathogenic variants in HNRNPH2 have since been documented. However, functional in-vitro analyses of identified variants have not been performed and, therefore, possible genotype–phenotype correlations remain elusive. Here, we present eight male individuals, including a pair of monozygotic twins, harboring pathogenic or likely pathogenic HNRNPH2 variants. Notably, we present the first individuals harboring nonsense or frameshift variants who, similarly to an individual harboring a de novo p.(Arg29Cys) variant within the first quasi-RNA-recognition motif (qRRM), displayed mild developmental delay, and developed mostly autistic features and/or psychiatric co-morbidities. Additionally, we present two individuals harboring a recurrent de novo p.(Arg114Trp), within the second qRRM, who had a severe neurodevelopmental delay with seizures. Functional characterization of the three most common HNRNPH2 missense variants revealed dysfunctional nucleocytoplasmic shuttling of proteins harboring the p.(Arg206Gln) and p.(Pro209Leu) variants, located within the nuclear localization signal, whereas proteins with p.(Arg114Trp) showed reduced interaction with members of the large assembly of splicing regulators (LASR). Moreover, RNA-sequencing of primary fibroblasts of the individual harboring the p.(Arg114Trp) revealed substantial alterations in the regulation of alternative splicing along with global transcriptome changes. Thus, we further expand the clinical and variant spectrum in HNRNPH2-associated disease in males and provide novel molecular insights suggesting the disorder to be a spliceopathy on the molecular level.
Collapse
Affiliation(s)
- Hans-Jürgen Kreienkamp
- Institute of Human Genetics, University Medical Center Hamburg-Eppendorf, Martinistrasse 52, 20246, Hamburg, Germany
| | - Matias Wagner
- Institute of Human Genetics, Technical University of Munich, Munich, Germany
| | - Heike Weigand
- Department of Pediatric Neurology, Developmental Medicine and Social Pediatrics, Dr. von Hauner's Children's Hospital, University of Munich, Munich, Germany
| | | | - Marie McDonald
- Division of Medical Genetics, Department of Pediatrics, Duke University, Durham, USA
| | - Boris Keren
- Département de Génétique, Hôpital La Pitié-Salpêtrière, Assistance Publique-Hôpitaux de Paris, Paris, France
| | - Cyril Mignot
- Département de Génétique, Hôpital La Pitié-Salpêtrière, Assistance Publique-Hôpitaux de Paris, Paris, France
| | - Julie Gauthier
- Molecular Diagnostic Laboratory, CHU Sainte-Justine, Montreal, QC, Canada
- Division of Medical Genetics, Department of Pediatrics, CHU Sainte-Justine and Université de Montréal, Montreal, QC, Canada
| | - Jean-François Soucy
- Molecular Diagnostic Laboratory, CHU Sainte-Justine, Montreal, QC, Canada
- Division of Medical Genetics, Department of Pediatrics, CHU Sainte-Justine and Université de Montréal, Montreal, QC, Canada
| | - Jacques L Michaud
- Molecular Diagnostic Laboratory, CHU Sainte-Justine, Montreal, QC, Canada
- Division of Medical Genetics, Department of Pediatrics, CHU Sainte-Justine and Université de Montréal, Montreal, QC, Canada
| | - Meghan Dumas
- Division of Genetic, Department of Pediatrics, The Barbara Bush Children's Hospital, Maine Medical Center, Portland, ME, USA
| | - Rosemarie Smith
- Division of Genetic, Department of Pediatrics, The Barbara Bush Children's Hospital, Maine Medical Center, Portland, ME, USA
| | - Ulrike Löbel
- Department of Diagnostic and Interventional Neuroradiology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Maja Hempel
- Institute of Human Genetics, University Medical Center Hamburg-Eppendorf, Martinistrasse 52, 20246, Hamburg, Germany
| | - Christian Kubisch
- Institute of Human Genetics, University Medical Center Hamburg-Eppendorf, Martinistrasse 52, 20246, Hamburg, Germany
| | - Jonas Denecke
- Department of Pediatrics, University Medical Center Eppendorf, Hamburg, Germany
| | - Philippe M Campeau
- Department of Pediatrics, CHU Sainte-Justine and University of Montreal, Montreal, Canada
| | - Jennifer M Bain
- Division of Child Neurology, Department of Neurology, Columbia University Irving Medical Center, New York, USA
| | - Davor Lessel
- Institute of Human Genetics, University Medical Center Hamburg-Eppendorf, Martinistrasse 52, 20246, Hamburg, Germany.
| |
Collapse
|
8
|
Ho AT, Hurst LD. Effective Population Size Predicts Local Rates but Not Local Mitigation of Read-through Errors. Mol Biol Evol 2021; 38:244-262. [PMID: 32797190 PMCID: PMC7783166 DOI: 10.1093/molbev/msaa210] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
In correctly predicting that selection efficiency is positively correlated with the effective population size (Ne), the nearly neutral theory provides a coherent understanding of between-species variation in numerous genomic parameters, including heritable error (germline mutation) rates. Does the same theory also explain variation in phenotypic error rates and in abundance of error mitigation mechanisms? Translational read-through provides a model to investigate both issues as it is common, mostly nonadaptive, and has good proxy for rate (TAA being the least leaky stop codon) and potential error mitigation via "fail-safe" 3' additional stop codons (ASCs). Prior theory of translational read-through has suggested that when population sizes are high, weak selection for local mitigation can be effective thus predicting a positive correlation between ASC enrichment and Ne. Contra to prediction, we find that ASC enrichment is not correlated with Ne. ASC enrichment, although highly phylogenetically patchy, is, however, more common both in unicellular species and in genes expressed in unicellular modes in multicellular species. By contrast, Ne does positively correlate with TAA enrichment. These results imply that local phenotypic error rates, not local mitigation rates, are consistent with a drift barrier/nearly neutral model.
Collapse
Affiliation(s)
- Alexander T Ho
- Milner Centre for Evolution, University of Bath, Bath, United Kingdom
- Corresponding author: E-mail:
| | - Laurence D Hurst
- Milner Centre for Evolution, University of Bath, Bath, United Kingdom
| |
Collapse
|
9
|
The Axenfeld-Rieger Syndrome Gene FOXC1 Contributes to Left-Right Patterning. Genes (Basel) 2021; 12:genes12020170. [PMID: 33530637 PMCID: PMC7912076 DOI: 10.3390/genes12020170] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Revised: 01/14/2021] [Accepted: 01/21/2021] [Indexed: 02/06/2023] Open
Abstract
Precise spatiotemporal expression of the Nodal-Lefty-Pitx2 cascade in the lateral plate mesoderm establishes the left–right axis, which provides vital cues for correct organ formation and function. Mutations of one cascade constituent PITX2 and, separately, the Forkhead transcription factor FOXC1 independently cause a multi-system disorder known as Axenfeld–Rieger syndrome (ARS). Since cardiac involvement is an established ARS phenotype and because disrupted left–right patterning can cause congenital heart defects, we investigated in zebrafish whether foxc1 contributes to organ laterality or situs. We demonstrate that CRISPR/Cas9-generated foxc1a and foxc1b mutants exhibit abnormal cardiac looping and that the prevalence of cardiac situs defects is increased in foxc1a−/−; foxc1b−/− homozygotes. Similarly, double homozygotes exhibit isomerism of the liver and pancreas, which are key features of abnormal gut situs. Placement of the asymmetric visceral organs relative to the midline was also perturbed by mRNA overexpression of foxc1a and foxc1b. In addition, an analysis of the left–right patterning components, identified in the lateral plate mesoderm of foxc1 mutants, reduced or abolished the expression of the NODAL antagonist lefty2. Together, these data reveal a novel contribution from foxc1 to left–right patterning, demonstrating that this role is sensitive to foxc1 gene dosage, and provide a plausible mechanism for the incidence of congenital heart defects in Axenfeld–Rieger syndrome patients.
Collapse
|
10
|
Nucleotide composition affects codon usage toward the 3'-end. PLoS One 2019; 14:e0225633. [PMID: 31800603 PMCID: PMC6892556 DOI: 10.1371/journal.pone.0225633] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Accepted: 11/09/2019] [Indexed: 12/24/2022] Open
Abstract
The 3’-end of the coding sequence in several species is known to show specific codon usage bias. Several factors have been suggested to underlie this phenomenon, including selection against translation efficiency, selection for translation accuracy, and selection against RNA folding. All are supported by some evidence, but there is no general agreement as to which factors are the main determinants. Nor is it known how universal this phenomenon is, and whether the same factors explain it in different species. To answer these questions, we developed a measure that quantifies the codon usage bias at the gene end, and used it to compute this bias for 91 species that span the three domains of life. In addition, we characterized the codons in each species by features that allow discrimination between the different factors. Combining all these data, we were able to show that there is a universal trend to favor AT-rich codons toward the gene end. Moreover, we suggest that this trend is explained by avoidance from forming RNA secondary structures around the stop codon, which may interfere with normal translation termination.
Collapse
|
11
|
A novel IRF2BPL truncating variant is associated with endolysosomal storage. Mol Biol Rep 2019; 47:711-714. [DOI: 10.1007/s11033-019-05109-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Accepted: 09/26/2019] [Indexed: 01/27/2023]
|
12
|
Huttener R, Thorrez L, In't Veld T, Granvik M, Snoeck L, Van Lommel L, Schuit F. GC content of vertebrate exome landscapes reveal areas of accelerated protein evolution. BMC Evol Biol 2019; 19:144. [PMID: 31311498 PMCID: PMC6636035 DOI: 10.1186/s12862-019-1469-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Accepted: 06/26/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Rapid accumulation of vertebrate genome sequences render comparative genomics a powerful approach to study macro-evolutionary events. The assessment of phylogenic relationships between species routinely depends on the analysis of sequence homology at the nucleotide or protein level. RESULTS We analyzed mRNA GC content, codon usage and divergence of orthologous proteins in 55 vertebrate genomes. Data were visualized in genome-wide landscapes using a sliding window approach. Landscapes of GC content reveal both evolutionary conservation of clustered genes, and lineage-specific changes, so that it was possible to construct a phylogenetic tree that closely matched the classic "tree of life". Landscapes of GC content also strongly correlated to landscapes of amino acid usage: positive correlation with glycine, alanine, arginine and proline and negative correlation with phenylalanine, tyrosine, methionine, isoleucine, asparagine and lysine. Peaks of GC content correlated strongly with increased protein divergence. CONCLUSIONS Landscapes of base- and amino acid composition of the coding genome opens a new approach in comparative genomics, allowing identification of discrete regions in which protein evolution accelerated over deep evolutionary time. Insight in the evolution of genome structure may spur novel studies assessing the evolutionary benefit of genes in particular genomic regions.
Collapse
Affiliation(s)
- R Huttener
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - L Thorrez
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium.,Tissue Engineering Laboratory, Dept of Development and Regeneration, KU Leuven, Kortrijk, Belgium
| | - T In't Veld
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - M Granvik
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - L Snoeck
- Tissue Engineering Laboratory, Dept of Development and Regeneration, KU Leuven, Kortrijk, Belgium
| | - L Van Lommel
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - F Schuit
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium.
| |
Collapse
|
13
|
Abrahams L, Hurst LD. Refining the Ambush Hypothesis: Evidence That GC- and AT-Rich Bacteria Employ Different Frameshift Defence Strategies. Genome Biol Evol 2018; 10:1153-1173. [PMID: 29617761 PMCID: PMC5909447 DOI: 10.1093/gbe/evy075] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/30/2018] [Indexed: 12/13/2022] Open
Abstract
Stop codons are frequently selected for beyond their regular termination function for error control. The “ambush hypothesis” proposes out-of-frame stop codons (OSCs) terminating frameshifted translations are selected for. Although early indirect evidence was partially supportive, recent evidence suggests OSC frequencies are not exceptional when considering underlying nucleotide content. However, prior null tests fail to control amino acid/codon usages or possible local mutational biases. We therefore return to the issue using bacterial genomes, considering several tests defining and testing against a null. We employ simulation approaches preserving amino acid order but shuffling synonymous codons or preserving codons while shuffling amino acid order. Additionally, we compare codon usage in amino acid pairs, where one codon can but the next, otherwise identical codon, cannot encode an OSC. OSC frequencies exceed expectations typically in AT-rich genomes, the +1 frame and for TGA/TAA but not TAG. With this complex evidence, simply rejecting or accepting the ambush hypothesis is not warranted. We propose a refined post hoc model, whereby AT-rich genomes have more accidental frameshifts, handled by RF2–RF3 complexes (associated with TGA/TAA) and are mostly +1 (or −2) slips. Supporting this, excesses positively correlate with in silico predicted frameshift probabilities. Thus, we propose a more viable framework, whereby genomes broadly adopt one of the two strategies to combat frameshifts: preventing frameshifting (GC-rich) or permitting frameshifts but minimizing impacts when most are caught early (AT-rich). Our refined framework holds promise yet some features, such as the bias of out-of-frame sense codons, remain unexplained.
Collapse
Affiliation(s)
- Liam Abrahams
- Department of Biology and Biochemistry, The Milner Centre for Evolution, University of Bath, United Kingdom
| | - Laurence D Hurst
- Department of Biology and Biochemistry, The Milner Centre for Evolution, University of Bath, United Kingdom
| |
Collapse
|
14
|
Marcogliese PC, Shashi V, Spillmann RC, Stong N, Rosenfeld JA, Koenig MK, Martínez-Agosto JA, Herzog M, Chen AH, Dickson PI, Lin HJ, Vera MU, Salamon N, Graham JM, Ortiz D, Infante E, Steyaert W, Dermaut B, Poppe B, Chung HL, Zuo Z, Lee PT, Kanca O, Xia F, Yang Y, Smith EC, Jasien J, Kansagra S, Spiridigliozzi G, El-Dairi M, Lark R, Riley K, Koeberl DD, Golden-Grant K, Yamamoto S, Wangler MF, Mirzaa G, Hemelsoet D, Lee B, Nelson SF, Goldstein DB, Bellen HJ, Pena LDM. IRF2BPL Is Associated with Neurological Phenotypes. Am J Hum Genet 2018; 103:245-260. [PMID: 30057031 PMCID: PMC6081494 DOI: 10.1016/j.ajhg.2018.07.006] [Citation(s) in RCA: 65] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 07/02/2018] [Indexed: 12/23/2022] Open
Abstract
Interferon regulatory factor 2 binding protein-like (IRF2BPL) encodes a member of the IRF2BP family of transcriptional regulators. Currently the biological function of this gene is obscure, and the gene has not been associated with a Mendelian disease. Here we describe seven individuals who carry damaging heterozygous variants in IRF2BPL and are affected with neurological symptoms. Five individuals who carry IRF2BPL nonsense variants resulting in a premature stop codon display severe neurodevelopmental regression, hypotonia, progressive ataxia, seizures, and a lack of coordination. Two additional individuals, both with missense variants, display global developmental delay and seizures and a relatively milder phenotype than those with nonsense alleles. The IRF2BPL bioinformatics signature based on population genomics is consistent with a gene that is intolerant to variation. We show that the fruit-fly IRF2BPL ortholog, called pits (protein interacting with Ttk69 and Sin3A), is broadly detected, including in the nervous system. Complete loss of pits is lethal early in development, whereas partial knockdown with RNA interference in neurons leads to neurodegeneration, revealing a requirement for this gene in proper neuronal function and maintenance. The identified IRF2BPL nonsense variants behave as severe loss-of-function alleles in this model organism, and ectopic expression of the missense variants leads to a range of phenotypes. Taken together, our results show that IRF2BPL and pits are required in the nervous system in humans and flies, and their loss leads to a range of neurological phenotypes in both species.
Collapse
Affiliation(s)
- Paul C Marcogliese
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Vandana Shashi
- Division of Medical Genetics, Department of Pediatrics, Duke University School of Medicine, Durham, NC 27710, USA
| | - Rebecca C Spillmann
- Division of Medical Genetics, Department of Pediatrics, Duke University School of Medicine, Durham, NC 27710, USA
| | - Nicholas Stong
- Institute for Genomic Medicine, Columbia University Medical Center, New York, NY 10032, USA
| | - Jill A Rosenfeld
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Mary Kay Koenig
- Division of Child & Adolescent Neurology, Department of Pediatrics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Julián A Martínez-Agosto
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Pediatrics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Child and Adolescent Psychiatry, Resnick Neuropsychiatric Hospital, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Matthew Herzog
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Agnes H Chen
- Department of Pediatrics, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Patricia I Dickson
- Department of Pediatrics, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Henry J Lin
- Department of Pediatrics, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Moin U Vera
- Department of Pediatrics, Los Angeles Biomedical Research Institute at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Noriko Salamon
- Department of Radiology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - John M Graham
- Department of Pediatrics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Damara Ortiz
- Children's Hospital of Pittsburgh, University of Pittsburgh Medical Center, University of Pittsburgh, Pittsburgh, PA 15224, USA
| | - Elena Infante
- Children's Hospital of Pittsburgh, University of Pittsburgh Medical Center, University of Pittsburgh, Pittsburgh, PA 15224, USA
| | - Wouter Steyaert
- Department of Medical Genetics, Ghent University Hospital, 9000 Ghent, Belgium
| | - Bart Dermaut
- Department of Medical Genetics, Ghent University Hospital, 9000 Ghent, Belgium
| | - Bruce Poppe
- Department of Medical Genetics, Ghent University Hospital, 9000 Ghent, Belgium
| | - Hyung-Lok Chung
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Zhongyuan Zuo
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Pei-Tseng Lee
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Oguz Kanca
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Fan Xia
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Yaping Yang
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Edward C Smith
- Division of Neurology, Department of Pediatrics, Duke University School of Medicine, Durham, NC 27710, USA
| | - Joan Jasien
- Division of Neurology, Department of Pediatrics, Duke University School of Medicine, Durham, NC 27710, USA
| | - Sujay Kansagra
- Division of Neurology, Department of Pediatrics, Duke University School of Medicine, Durham, NC 27710, USA
| | - Gail Spiridigliozzi
- Department of Psychiatry and Behavioral Sciences, Duke University School of Medicine, Durham, NC 27710, USA
| | - Mays El-Dairi
- Department of Ophthalmology, Duke University School of Medicine, Durham, NC 27710, USA
| | - Robert Lark
- Department of Orthopedic Surgery, Duke University School of Medicine, Durham, NC 27710, USA
| | - Kacie Riley
- Division of Medical Genetics, Department of Pediatrics, Duke University School of Medicine, Durham, NC 27710, USA
| | - Dwight D Koeberl
- Division of Medical Genetics, Department of Pediatrics, Duke University School of Medicine, Durham, NC 27710, USA
| | - Katie Golden-Grant
- Division of Genetic Medicine, Seattle Children's Hospital, Seattle, WA 98105, USA
| | - Shinya Yamamoto
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Program in Developmental Biology, Baylor College of Medicine, Houston, TX 77030, USA; Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX 77030, USA; Department of Neuroscience, Baylor College of Medicine, Houston, TX 77030, USA
| | - Michael F Wangler
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Program in Developmental Biology, Baylor College of Medicine, Houston, TX 77030, USA; Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX 77030, USA
| | - Ghayda Mirzaa
- Center for Integrative Brain Research, Seattle Children's Research Institute, Seattle, WA 98105, USA; Department of Pediatrics, University of Washington, Seattle, WA 98105, USA
| | - Dimitri Hemelsoet
- Department of Neurology, Ghent University Hospital, 9000 Ghent, Belgium
| | - Brendan Lee
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Stanley F Nelson
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - David B Goldstein
- Institute for Genomic Medicine, Columbia University Medical Center, New York, NY 10032, USA
| | - Hugo J Bellen
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Program in Developmental Biology, Baylor College of Medicine, Houston, TX 77030, USA; Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX 77030, USA; Department of Neuroscience, Baylor College of Medicine, Houston, TX 77030, USA; Howard Hughes Medical Institute, Baylor College of Medicine, Houston, TX 77030, USA.
| | - Loren D M Pena
- Division of Medical Genetics, Department of Pediatrics, Duke University School of Medicine, Durham, NC 27710, USA.
| |
Collapse
|
15
|
Abrahams L, Hurst LD. Adenine Enrichment at the Fourth CDS Residue in Bacterial Genes Is Consistent with Error Proofing for +1 Frameshifts. Mol Biol Evol 2018; 34:3064-3080. [PMID: 28961919 PMCID: PMC5850271 DOI: 10.1093/molbev/msx223] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Beyond selection for optimal protein functioning, coding sequences (CDSs) are under selection at the RNA and DNA levels. Here, we identify a possible signature of “dual-coding,” namely extensive adenine (A) enrichment at bacterial CDS fourth sites. In 99.07% of studied bacterial genomes, fourth site A use is greater than expected given genomic A-starting codon use. Arguing for nucleotide level selection, A-starting serine and arginine second codons are heavily utilized when compared with their non-A starting synonyms. Several models have the ability to explain some of this trend. In part, A-enrichment likely reduces 5′ mRNA stability, promoting translation initiation. However T/U, which may also reduce stability, is avoided. Further, +1 frameshifts on the initiating ATG encode a stop codon (TGA) provided A is the fourth residue, acting either as a frameshift “catch and destroy” or a frameshift stop and adjust mechanism and hence implicated in translation initiation. Consistent with both, genomes lacking TGA stop codons exhibit weaker fourth site A-enrichment. Sequences lacking a Shine–Dalgarno sequence and those without upstream leader genes, that may be more error prone during initiation, have greater utilization of A, again suggesting a role in initiation. The frameshift correction model is consistent with the notion that many genomic features are error-mitigation factors and provides the first evidence for site-specific out of frame stop codon selection. We conjecture that the NTG universal start codon may have evolved as a consequence of TGA being a stop codon and the ability of NTGA to rapidly terminate or adjust a ribosome.
Collapse
Affiliation(s)
- Liam Abrahams
- Department of Biology and Biochemistry, The Milner Centre for Evolution, University of Bath, Bath, United Kingdom
| | - Laurence D Hurst
- Department of Biology and Biochemistry, The Milner Centre for Evolution, University of Bath, Bath, United Kingdom
| |
Collapse
|
16
|
Evolutionary forces affecting synonymous variations in plant genomes. PLoS Genet 2017; 13:e1006799. [PMID: 28531201 PMCID: PMC5460877 DOI: 10.1371/journal.pgen.1006799] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2016] [Revised: 06/06/2017] [Accepted: 05/04/2017] [Indexed: 01/04/2023] Open
Abstract
Base composition is highly variable among and within plant genomes, especially at third codon positions, ranging from GC-poor and homogeneous species to GC-rich and highly heterogeneous ones (particularly Monocots). Consequently, synonymous codon usage is biased in most species, even when base composition is relatively homogeneous. The causes of these variations are still under debate, with three main forces being possibly involved: mutational bias, selection and GC-biased gene conversion (gBGC). So far, both selection and gBGC have been detected in some species but how their relative strength varies among and within species remains unclear. Population genetics approaches allow to jointly estimating the intensity of selection, gBGC and mutational bias. We extended a recently developed method and applied it to a large population genomic dataset based on transcriptome sequencing of 11 angiosperm species spread across the phylogeny. We found that at synonymous positions, base composition is far from mutation-drift equilibrium in most genomes and that gBGC is a widespread and stronger process than selection. gBGC could strongly contribute to base composition variation among plant species, implying that it should be taken into account in plant genome analyses, especially for GC-rich ones. In protein coding genes, base composition strongly varies within and among plant genomes, especially at positions where changes do not alter the coded protein (synonymous variations). Some species, such as the model plant Arabidopsis thaliana, are relatively GC-poor and homogeneous while others, such as grasses, are highly heterogeneous and GC-rich. The causes of these variations are still debated: are they mainly due to selective or neutral processes? Answering to this question is important to correctly infer whether variations in base composition may have functional roles or not. We extended a population genetics method to jointly estimate the different forces that may affect synonymous variations and applied it to genomic datasets in 11 flowering plant species. We found that GC-biased gene conversion, a neutral process associated with recombination that mimics selection by favouring G and C bases, is a widespread and stronger process than selection and that it could explain the large variation in base composition observed in plant genomes. Our results bear implications for analysing plant genomes and for correctly interpreting what could be functional or not.
Collapse
|
17
|
Francis A, Dhaka N, Bakshi M, Jung KH, Sharma MK, Sharma R. Comparative phylogenomic analysis provides insights into TCP gene functions in Sorghum. Sci Rep 2016; 6:38488. [PMID: 27917941 PMCID: PMC5137041 DOI: 10.1038/srep38488] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2016] [Accepted: 11/10/2016] [Indexed: 12/30/2022] Open
Abstract
Sorghum is a highly efficient C4 crop with potential to mitigate challenges associated with food, feed and fuel. TCP proteins are of particular interest for crop improvement programs due to their well-demonstrated roles in crop domestication and shaping plant architecture thereby, affecting agronomic traits. We identified 20 TCP genes from Sorghum. Except SbTCP8, all are either intronless or contain introns in the untranslated regions. Comparative phylogenetic analysis of Arabidopsis, rice, Brachypodium and Sorghum TCP proteins revealed two distinct classes categorized into ten sub-clades. Sub-clade F is dicot-specific, whereas A2, G1 and I1 groups only contained genes from grasses. Sub-clade B was missing in Sorghum, whereas group A1 was missing in rice indicating species-specific divergence of TCP proteins. TCP proteins of Sorghum are enriched in disorder promoting residues with class I containing higher percent disorder than class II proteins. Seven pairs of paralogous TCP genes were identified from Sorghum, five of which seem to predate Rice-Sorghum divergence. All of them have diverged in their expression. Based on the expression and orthology analysis, five Sorghum genes have been shortlisted for further investigation for their roles in regulating plant morphology. Whereas, three genes have been identified as candidates for engineering abiotic stress tolerance.
Collapse
Affiliation(s)
- Aleena Francis
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Mehrauli Road, New Delhi, 110067, India
| | - Namrata Dhaka
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Mehrauli Road, New Delhi, 110067, India
| | - Mohit Bakshi
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Mehrauli Road, New Delhi, 110067, India
| | - Ki-Hong Jung
- Graduate School of Biotechnology & Crop Biotech Institute, Kyung Hee University, Yongin, 17104, Republic of Korea
| | - Manoj K. Sharma
- School of Biotechnology, Jawaharlal Nehru University, New Mehrauli Road, New Delhi, 110067, India
| | - Rita Sharma
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Mehrauli Road, New Delhi, 110067, India
| |
Collapse
|
18
|
The combinatorics of overlapping genes. J Theor Biol 2016; 415:90-101. [PMID: 27737786 DOI: 10.1016/j.jtbi.2016.09.018] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2016] [Revised: 08/31/2016] [Accepted: 09/22/2016] [Indexed: 11/23/2022]
Abstract
Overlapping genes exist in all domains of life and are much more abundant than expected upon their first discovery in the late 1970s. Assuming that the reference gene is read in frame +0, an overlapping gene can be encoded in two reading frames in the sense strand, denoted by +1 and +2, and in three reading frames in the opposite strand, denoted by -0, -1, and -2. This motivated numerous researchers to study the constraints induced by the genetic code on the various overlapping frames, mostly based on information theory. Our focus in this paper is on the constraints induced on two overlapping genes in terms of amino acids, as well as polypeptides. We show that simple linear constraints bind the amino-acid composition of two proteins encoded by overlapping genes. Novel constraints are revealed when polypeptides are considered, and not just single amino acids. For example, in double-coding sequences with an overlapping reading frame -2, each Tyrosine (denoted as Tyr or Y) in the overlapping frame overlaps a Tyrosine in the reference frame +0 (and reciprocally), whereas specific words (e.g. YY) never occur. We thus distinguish between null constraints (YY = 0 in frame -2) and non-null constraints (Y in frame +0 ⇔ Y in frame -2). Our equivalence-based constraints are symmetrical and thus enable the characterization of the joint composition of overlapping proteins. We describe several formal frameworks and a graph algorithm to characterize and compute these constraints. As expected, the degrees of freedom left by these constraints vary drastically among the different overlapping frames. Interestingly, the biological meaning of constraints induced on two overlapping proteins (hydropathy, forbidden di-peptides, expected overlap length …) is also specific to the reading frame. We study the combinatorics of these constraints for overlapping polypeptides of length n, pointing out that, (i) except for frame -2, non-null constraints are deduced from the amino-acid (length = 1) constraints and (ii) null constraints are deduced from the di-peptide (length = 2) constraints. These results yield support for understanding the mechanisms and evolution of overlapping genes, and for developing novel overlapping gene detection methods.
Collapse
|
19
|
McLysaght A, Hurst LD. Open questions in the study of de novo genes: what, how and why. Nat Rev Genet 2016; 17:567-78. [PMID: 27452112 DOI: 10.1038/nrg.2016.78] [Citation(s) in RCA: 144] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The study of de novo protein-coding genes is maturing from the ad hoc reporting of individual cases to the systematic analysis of extensive genomic data from several species. We identify three key challenges for this emerging field: understanding how best to identify de novo genes, how they arise and why they spread. We highlight the intellectual challenges of understanding how a de novo gene becomes integrated into pre-existing functions and becomes essential. We suggest that, as with protein sequence evolution, antagonistic co-evolution may be key to de novo gene evolution, particularly for new essential genes and new cancer-associated genes.
Collapse
Affiliation(s)
- Aoife McLysaght
- The Smurfit Institute of Genetics, University of Dublin, Trinity College, Dublin 2, Ireland
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, Somerset BA2 7AY, UK
| |
Collapse
|
20
|
Abstract
Exonic splice enhancers (ESEs) are short nucleotide motifs, enriched near exon ends, that enhance the recognition of the splice site and thus promote splicing. Are intronless genes under selection to avoid these motifs so as not to attract the splicing machinery to an mRNA that should not be spliced, thereby preventing the production of an aberrant transcript? Consistent with this possibility, we find that ESEs in putative recent retrocopies are at a higher density and evolving faster than those in other intronless genes, suggesting that they are being lost. Moreover, intronless genes are less dense in putative ESEs than intron-containing ones. However, this latter difference is likely due to the skewed base composition of intronless sequences, a skew that is in line with the general GC richness of few exon genes. Indeed, after controlling for such biases, we find that both intronless and intron-containing genes are denser in ESEs than expected by chance. Importantly, nucleotide-controlled analysis of evolutionary rates at synonymous sites in ESEs indicates that the ESEs in intronless genes are under purifying selection in both human and mouse. We conclude that on the loss of introns, some but not all, ESE motifs are lost, the remainder having functions beyond a role in splice promotion. These results have implications for the design of intronless transgenes and for understanding the causes of selection on synonymous sites.
Collapse
Affiliation(s)
- Rosina Savisaar
- Department of Biology and Biochemistry, The Milner Centre for Evolution, University of Bath, Bath, United Kingdom
| | - Laurence D Hurst
- Department of Biology and Biochemistry, The Milner Centre for Evolution, University of Bath, Bath, United Kingdom
| |
Collapse
|
21
|
Wang K, Cao K, Hannenhalli S. Chromatin and Genomic determinants of alternative splicing. ACM-BCB ... ... : THE ... ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE. ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE 2015; 2015:345-354. [PMID: 28825057 DOI: 10.1145/2808719.2808755] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Alternative splicing significantly contributes to proteomic diversity and mis-regulation of splicing can cause diseases in human. Although both genomic and chromatin features have been shown to associate with splicing, the mechanisms by which various chromatin marks influence splicing is not clear for the most part. Moreover, it is not known whether the influence of specific genomic features on splicing is potentially modulated by the chromatin context. Here we report a deep neural network (DNN) model for predicting exon inclusion based on comprehensive genomic and chromatin features. Our analysis in three cell lines shows that, while both genomic and chromatin features can predict splicing to varying degrees, genomic features are the primary drivers of splicing, and the predictive power of chromatin features can largely be explained by their correlation with genomic features; chromatin features do not yield substantial independent contribution to splicing predictability. However, our model identified specific interactions between chromatin and genomic features suggesting that the effect of genomic elements may be modulated by chromatin context.
Collapse
Affiliation(s)
- Kun Wang
- Center for Bioinformatics and Computational Biology, University of Maryland
| | - Kan Cao
- Cell Biology Molecular Genetics, University of Maryland
| | | |
Collapse
|
22
|
Chen JY, Shen QS, Zhou WZ, Peng J, He BZ, Li Y, Liu CJ, Luan X, Ding W, Li S, Chen C, Tan BCM, Zhang YE, He A, Li CY. Emergence, Retention and Selection: A Trilogy of Origination for Functional De Novo Proteins from Ancestral LncRNAs in Primates. PLoS Genet 2015; 11:e1005391. [PMID: 26177073 PMCID: PMC4503675 DOI: 10.1371/journal.pgen.1005391] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Accepted: 06/24/2015] [Indexed: 01/08/2023] Open
Abstract
While some human-specific protein-coding genes have been proposed to originate from ancestral lncRNAs, the transition process remains poorly understood. Here we identified 64 hominoid-specific de novo genes and report a mechanism for the origination of functional de novo proteins from ancestral lncRNAs with precise splicing structures and specific tissue expression profiles. Whole-genome sequencing of dozens of rhesus macaque animals revealed that these lncRNAs are generally not more selectively constrained than other lncRNA loci. The existence of these newly-originated de novo proteins is also not beyond anticipation under neutral expectation, as they generally have longer theoretical lifespan than their current age, due to their GC-rich sequence property enabling stable ORFs with lower chance of non-sense mutations. Interestingly, although the emergence and retention of these de novo genes are likely driven by neutral forces, population genetics study in 67 human individuals and 82 macaque animals revealed signatures of purifying selection on these genes specifically in human population, indicating a proportion of these newly-originated proteins are already functional in human. We thus propose a mechanism for creation of functional de novo proteins from ancestral lncRNAs during the primate evolution, which may contribute to human-specific genetic novelties by taking advantage of existed genomic contexts. Although gene duplication has been believed as a predominant mechanism for creating new genes, recent reports suggested that new proteins could evolve “de novo” from non-coding DNA regions. These de novo genes are also named as “motherless” genes due to their lack of ancestral proteins as precursors, while recently we and others found that lncRNAs may represent an intermediate stage of their origination. To further elucidate this lncRNA-protein transition process, here we identified 64 hominoid-specific de novo genes and report a new mechanism for the origination of functional de novo proteins from ancestral non-coding transcripts: These non-coding “precursors” are generally not more selectively constrained than other lncRNA loci; and the existence of these de novo proteins is not beyond anticipation under neutral expectation; however, population genetics study in 67 human individuals and 82 macaque animals revealed signatures of purifying selection on these genes specifically in human population, indicating a proportion of these newly-originated proteins are already functional in human. We thus propose a mechanism for creation of functional de novo proteins from ancestral lncRNAs during the primate evolution.
Collapse
Affiliation(s)
- Jia-Yu Chen
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Qing Sunny Shen
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Wei-Zhen Zhou
- Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing, China
| | - Jiguang Peng
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Bin Z. He
- FAS Center for Systems Biology & Howard Hughes Medical Institute, Harvard University, Cambridge, Massachusetts, United States of America
| | - Yumei Li
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Chu-Jun Liu
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Xuke Luan
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Beijing, China
| | - Wanqiu Ding
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Shuxian Li
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
| | - Chunyan Chen
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | | | - Yong E. Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Aibin He
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Beijing, China
- * E-mail: (AH); (CYL)
| | - Chuan-Yun Li
- Beijing Key Laboratory of Cardiometabolic Molecular Medicine, Institute of Molecular Medicine, Peking University, Beijing, China
- * E-mail: (AH); (CYL)
| |
Collapse
|
23
|
Thirumalairaj K, Abraham A, Devarajan B, Gaikwad N, Kim U, Muthukkaruppan V, Vanniarajan A. A stepwise strategy for rapid and cost-effective RB1 screening in Indian retinoblastoma patients. J Hum Genet 2015; 60:547-52. [PMID: 26084579 DOI: 10.1038/jhg.2015.62] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2015] [Revised: 04/28/2015] [Accepted: 04/30/2015] [Indexed: 01/02/2023]
Abstract
India has the highest number of retinoblastoma (RB) patients among the developing countries owing to its increasing population. Of the patients with RB, about 40% have the heritable form of the disease, making genetic analysis of the RB1 gene an integral part of disease management. However, given the large size of the RB1 gene with its widely dispersed exons and no reported hotspots, genetic testing can be cumbersome. To overcome this problem, we have developed a rapid screening strategy by prioritizing the order of exons to be analyzed, based on the frequency of nonsense mutations, deletions and duplications reported in the RB1-Leiden Open Variation Database and published literature on Indian patients. Using this strategy for genetic analysis, mutations were identified in 76% of patients in half the actual time and one third of the cost. This reduction in time and cost will allow for better risk prediction for siblings and offspring, thereby facilitating genetic counseling for families, especially in developing countries.
Collapse
Affiliation(s)
- Kannan Thirumalairaj
- Department of Molecular Genetics, Aravind Medical Research Foundation, Madurai, India
| | - Aloysius Abraham
- Department of Molecular Genetics, Aravind Medical Research Foundation, Madurai, India
| | | | - Namrata Gaikwad
- Department of Orbit, Oculoplasty and Oncology, Aravind Eye Hospital, Madurai, India
| | - Usha Kim
- Department of Orbit, Oculoplasty and Oncology, Aravind Eye Hospital, Madurai, India
| | | | - Ayyasamy Vanniarajan
- Department of Molecular Genetics, Aravind Medical Research Foundation, Madurai, India
| |
Collapse
|
24
|
Massey SE. Genetic code evolution reveals the neutral emergence of mutational robustness, and information as an evolutionary constraint. Life (Basel) 2015; 5:1301-32. [PMID: 25919033 PMCID: PMC4500140 DOI: 10.3390/life5021301] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 04/02/2015] [Accepted: 04/03/2015] [Indexed: 01/09/2023] Open
Abstract
The standard genetic code (SGC) is central to molecular biology and its origin and evolution is a fundamental problem in evolutionary biology, the elucidation of which promises to reveal much about the origins of life. In addition, we propose that study of its origin can also reveal some fundamental and generalizable insights into mechanisms of molecular evolution, utilizing concepts from complexity theory. The first is that beneficial traits may arise by non-adaptive processes, via a process of "neutral emergence". The structure of the SGC is optimized for the property of error minimization, which reduces the deleterious impact of point mutations. Via simulation, it can be shown that genetic codes with error minimization superior to the SGC can emerge in a neutral fashion simply by a process of genetic code expansion via tRNA and aminoacyl-tRNA synthetase duplication, whereby similar amino acids are added to codons related to that of the parent amino acid. This process of neutral emergence has implications beyond that of the genetic code, as it suggests that not all beneficial traits have arisen by the direct action of natural selection; we term these "pseudaptations", and discuss a range of potential examples. Secondly, consideration of genetic code deviations (codon reassignments) reveals that these are mostly associated with a reduction in proteome size. This code malleability implies the existence of a proteomic constraint on the genetic code, proportional to the size of the proteome (P), and that its reduction in size leads to an "unfreezing" of the codon - amino acid mapping that defines the genetic code, consistent with Crick's Frozen Accident theory. The concept of a proteomic constraint may be extended to propose a general informational constraint on genetic fidelity, which may be used to explain variously, differences in mutation rates in genomes with differing proteome sizes, differences in DNA repair capacity and genome GC content between organisms, a selective pressure in the evolution of sexual reproduction, and differences in translational fidelity. Lastly, the utility of the concept of an informational constraint to other diverse fields of research is explored.
Collapse
Affiliation(s)
- Steven E Massey
- Biology Department, PO Box 23360, University of Puerto Rico-Rio Piedras, San Juan, PR 00931, USA.
| |
Collapse
|
25
|
Sung W, Ackerman MS, Gout JF, Miller SF, Williams E, Foster PL, Lynch M. Asymmetric Context-Dependent Mutation Patterns Revealed through Mutation-Accumulation Experiments. Mol Biol Evol 2015; 32:1672-83. [PMID: 25750180 DOI: 10.1093/molbev/msv055] [Citation(s) in RCA: 96] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Despite the general assumption that site-specific mutation rates are independent of the local sequence context, a growing body of evidence suggests otherwise. To further examine context-dependent patterns of mutation, we amassed 5,645 spontaneous mutations in wild- type (WT) and mismatch-repair deficient (MMR(-)) mutation-accumulation (MA) lines of the gram-positive model organism Bacillus subtilis. We then analyzed>7,500 spontaneous base-substitution mutations across B. subtilis, Escherichia coli, and Mesoplasma florum WT and MMR(-) MA lines, finding a context-dependent mutation pattern that is asymmetric around the origin of replication. Different neighboring nucleotides can alter site-specific mutation rates by as much as 75-fold, with sites neighboring G:C base pairs or dimers involving alternating pyrimidine-purine and purine-pyrimidine nucleotides having significantly elevated mutation rates. The influence of context-dependent mutation on genome architecture is strongest in M. florum, consistent with the reduced efficiency of selection in organisms with low effective population size. If not properly accounted for, the disparities arising from patterns of context-dependent mutation can significantly influence interpretations of positive and purifying selection.
Collapse
Affiliation(s)
- Way Sung
- Department of Biology, Indiana University, Bloomington
| | | | | | | | | | | | - Michael Lynch
- Department of Biology, Indiana University, Bloomington
| |
Collapse
|
26
|
Zhao Y, Epstein RJ. Conserved nonsense-prone CpG sites in apoptosis-regulatory genes: conditional stop signs on the road to cell death. Evol Bioinform Online 2013; 9:275-83. [PMID: 23908585 PMCID: PMC3728200 DOI: 10.4137/ebo.s11759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
Methylation-prone CpG dinucleotides are strongly conserved in the germline, yet are also predisposed to somatic mutation. Here we quantify the relationship between germline codon mutability and somatic carcinogenesis by comparing usage of the nonsense-prone CGA (→TGA) codons in gene groups that differ in apoptotic function; to this end, suppressor genes were subclassified as either apoptotic (gatekeepers) or repair (caretakers). Mutations affecting CGA codons in sporadic tumors proved to be highly asymmetric. Moreover, nonsense mutations were 3-fold more likely to affect gatekeepers than caretakers. In addition, intragenic CGA clustering nonrandomly affected functionally critical regions of gatekeepers. We conclude that human gatekeeper suppressor genes are enriched for nonsense-prone codons, and submit that this germline vulnerability to tumors could reflect in utero selection for a methylation-dependent capability to short-circuit environmental insults that otherwise trigger apoptosis and fetal loss.
Collapse
Affiliation(s)
- Yongzhong Zhao
- Department of Genetics, Mount Sinai School of Medicine, New York, USA
| | | |
Collapse
|
27
|
Govani FS, Giess A, Mollet IG, Begbie ME, Jones MD, Game L, Shovlin CL. Directional next-generation RNA sequencing and examination of premature termination codon mutations in endoglin/hereditary haemorrhagic telangiectasia. Mol Syndromol 2013; 4:184-96. [PMID: 23801935 DOI: 10.1159/000350208] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/17/2012] [Indexed: 01/12/2023] Open
Abstract
Hereditary haemorrhagic telangiectasia (HHT) is a disease characterised by abnormal vascular structures, and most commonly caused by mutations in ENG, ACVRL1 or SMAD4 encoding endothelial cell-expressed proteins involved in TGF-β superfamily signalling. The majority of mutations reported on the HHT mutation database are predicted to lead to stop codons, either due to frameshifts or direct nonsense substitutions. The proportion is higher for ENG (67%) and SMAD4 (65%) than for ACVRL1 (42%), p < 0.0001. Here, by focussing on ENG, we report why conventional views of these mutations may need to be revised. Of the 111 stop codon-generating ENG mutations, on ExPASy translation, all except one were premature termination codons (PTCs), sited at least 50-55 bp upstream of the final exon-exon boundary of the main endoglin isoform, L-endoglin. This strongly suggests that the mutated RNA species will undergo nonsense-mediated decay. We provide new in vitro expression data to support dominant negative activity of stable truncated endoglin proteins but suggest these will not generate HHT: the single natural stop codon mutation in L-endoglin (sited within 50-55 nucleotides of the final exon-exon boundary) is unlikely to generate functional protein since it replaces the entire transmembrane domain, as would 8 further natural stop codon mutations, if the minor S-endoglin isoform were implicated in HHT pathogenesis. Finally, next-generation RNA sequencing data of 7 different RNA libraries from primary human endothelial cells demonstrate that multiple intronic regions of ENG are transcribed. The potential consequences of heterozygous deletions or duplications of such regions are discussed. These data support the haploinsufficiency model for HHT pathogenesis, explain why final exon mutations have not been detected to date in HHT, emphasise the potential need for functional examination of non-PTC-generating mutations, and lead to proposals for an alternate stratification system of mutational types for HHT genotype-phenotype correlations.
Collapse
Affiliation(s)
- F S Govani
- NHLI Cardiovascular Sciences, Hammersmith Campus, Imperial College London, London, UK
| | | | | | | | | | | | | |
Collapse
|
28
|
Shabalina SA, Spiridonov NA, Kashina A. Sounds of silence: synonymous nucleotides as a key to biological regulation and complexity. Nucleic Acids Res 2013; 41:2073-94. [PMID: 23293005 PMCID: PMC3575835 DOI: 10.1093/nar/gks1205] [Citation(s) in RCA: 187] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Messenger RNA is a key component of an intricate regulatory network of its own. It accommodates numerous nucleotide signals that overlap protein coding sequences and are responsible for multiple levels of regulation and generation of biological complexity. A wealth of structural and regulatory information, which mRNA carries in addition to the encoded amino acid sequence, raises the question of how these signals and overlapping codes are delineated along non-synonymous and synonymous positions in protein coding regions, especially in eukaryotes. Silent or synonymous codon positions, which do not determine amino acid sequences of the encoded proteins, define mRNA secondary structure and stability and affect the rate of translation, folding and post-translational modifications of nascent polypeptides. The RNA level selection is acting on synonymous sites in both prokaryotes and eukaryotes and is more common than previously thought. Selection pressure on the coding gene regions follows three-nucleotide periodic pattern of nucleotide base-pairing in mRNA, which is imposed by the genetic code. Synonymous positions of the coding regions have a higher level of hybridization potential relative to non-synonymous positions, and are multifunctional in their regulatory and structural roles. Recent experimental evidence and analysis of mRNA structure and interspecies conservation suggest that there is an evolutionary tradeoff between selective pressure acting at the RNA and protein levels. Here we provide a comprehensive overview of the studies that define the role of silent positions in regulating RNA structure and processing that exert downstream effects on proteins and their functions.
Collapse
Affiliation(s)
- Svetlana A Shabalina
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20984, USA.
| | | | | |
Collapse
|
29
|
Jin P, Cai R, Zhou X, Li-Ling J, Ma F. Features of missense/nonsense mutations in exonic splicing enhancer sequences from cancer-related human genes. Mutat Res 2012; 740:6-12. [PMID: 23123687 DOI: 10.1016/j.mrfmmm.2012.10.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2011] [Revised: 08/31/2012] [Accepted: 10/19/2012] [Indexed: 11/18/2022]
Affiliation(s)
- Ping Jin
- College of Life Science, Nanjing Normal University, Nanjing, China
| | | | | | | | | |
Collapse
|
30
|
Hoehn KB, McGaugh SE, Noor MAF. Effects of premature termination codon polymorphisms in the Drosophila pseudoobscura subclade. J Mol Evol 2012; 75:141-50. [PMID: 23132097 PMCID: PMC3508312 DOI: 10.1007/s00239-012-9528-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2012] [Accepted: 10/24/2012] [Indexed: 12/15/2022]
Abstract
Premature termination codon (PTC) mutations can have dramatic effects--both adaptive and deleterious--on gene expression and function. Here, we examine the number and selective effects of PTC mutations within the Drosophila pseudoobscura subclade using 18 resequenced genomes aligned to the reference genome. We located and characterized 1,679 PTC mutations in 605 genes across each of these genomes relative to the D. pseudoobscura reference genome, and use RT-PCR to confirm transcription of a subset of these genes containing PTC mutations. We confirm previous findings that genes containing PTC mutations are less selectively constrained and less broadly expressed than non-PTC-containing genes, suggesting that the most of these mutations are at least mildly deleterious. Further, we find highly significant codon usage bias in regions downstream of the PTC in 38 of these PTC-containing genes, suggesting that some of these PTC mutations--if not alternatively spliced out of the transcript--have neutral effects. Ultimately, these analyzes support the view that the PTC mutations are mostly detrimental, but are nonetheless common enough in genomes that a subset could be effectively neutral.
Collapse
Affiliation(s)
- Kenneth B Hoehn
- Biology Department, Duke University, PO Box 90388, Durham, NC 27708, USA.
| | | | | |
Collapse
|
31
|
Guo WT, Xu WY, Gu MM. [Nonsense-mediated mRNA decay and human monogenic disease]. YI CHUAN = HEREDITAS 2012; 34:935-42. [PMID: 22917898 DOI: 10.3724/sp.j.1005.2012.00935] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Nonsense-mediated mRNA decay (NMD) is a widespread quality control mechanism in eukaryotic cells. It can recognize and degrade aberrant transcripts harbouring a premature translational termination codon (PTC), and thereby prevent the production of C-terminally truncated proteins which might be deleterious. Approximately, 30% of human genetic diseases are caused by transcripts containing PTCs. These transcripts are potential targets of NMD. As for monogenic diseases, NMD has effects on the phenotype or mode of inheritance. Here, we explain the mechanism of this surveillance pathway, and take several neuromuscular disorders as examples to discuss its influence for human monogenic diseases. The deeper understanding for NMD will shed light on the nosogenesis and therapies of monogenic diseases.
Collapse
Affiliation(s)
- Wen-Ting Guo
- Department of Medical Genetics, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
| | | | | |
Collapse
|
32
|
Error prevention and mitigation as forces in the evolution of genes and genomes. Nat Rev Genet 2011; 12:875-81. [PMID: 22094950 DOI: 10.1038/nrg3092] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Why are short introns rarely a multiple of three nucleotides long? Why do essential genes cluster? Why are genes in operons often lined up in the order in which they are needed in the encoded pathway? In this Opinion article, we argue that these and many other - ostensibly disparate - observations are all pieces of an emerging picture in which multiple aspects of gene anatomy and genome architecture have evolved in response to error-prone gene expression.
Collapse
|
33
|
|
34
|
Affiliation(s)
- Claus O Wilke
- Section of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute of Cell and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America.
| |
Collapse
|