1
|
Versoza CJ, Jensen JD, Pfeifer SP. The landscape of structural variation in aye-ayes ( Daubentonia madagascariensis). BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.08.622672. [PMID: 39605644 PMCID: PMC11601217 DOI: 10.1101/2024.11.08.622672] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Aye-ayes (Daubentonia madagascariensis) are one of the 25 most critically endangered primate species in the world. Endemic to Madagascar, their small and highly fragmented populations make them particularly vulnerable to both genetic disease and anthropogenic environmental changes. Over the past decade, conservation genomic efforts have largely focused on inferring and monitoring population structure based on single nucleotide variants to identify and protect critical areas of genetic diversity. However, the recent release of a highly contiguous genome assembly allows, for the first time, for the study of structural genomic variation (deletions, duplications, insertions, and inversions) which are likely to impact a substantial proportion of the species' genome. Based on whole-genome, short-read sequencing data from 14 individuals, >1,000 high-confidence autosomal structural variants were detected, affecting ~240 kb of the aye-aye genome. The majority of these variants (>85%) were deletions shorter than 200 bp, consistent with the notion that longer structural mutations are often associated with strongly deleterious fitness effects. For example, two deletions longer than 850 bp located within disease-linked genes were predicted to impose substantial fitness deficits owing to a resulting frameshift and gene fusion, respectively; whereas several other major effect variants outside of coding regions are likely to impact gene regulatory landscapes. Taken together, this first glimpse into the landscape of structural variation in aye-ayes will enable future opportunities to advance our understanding of the traits impacting the fitness of this endangered species, as well as allow for enhanced evolutionary comparisons across the full primate clade.
Collapse
Affiliation(s)
- Cyril J. Versoza
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Jeffrey D. Jensen
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Susanne P. Pfeifer
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
2
|
He D, Zhang M, Li Y, Liu F, Ban B. Insights into the ANKRD11 variants and short-stature phenotype through literature review and ClinVar database search. Orphanet J Rare Dis 2024; 19:292. [PMID: 39135054 PMCID: PMC11318275 DOI: 10.1186/s13023-024-03301-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 08/05/2024] [Indexed: 08/16/2024] Open
Abstract
Ankyrin repeat domain containing-protein 11 (ANKRD11), a transcriptional factor predominantly localized in the cell nucleus, plays a crucial role in the expression regulation of key genes by recruiting chromatin remodelers and interacting with specific transcriptional repressors or activators during numerous biological processes. Its pathogenic variants are strongly linked to the pathogenesis and progression of multisystem disorder known as KBG syndrome. With the widespread application of high-throughput DNA sequencing technologies in clinical medicine, numerous pathogenic variants in the ANKRD11 gene have been reported. Patients with KBG syndrome usually exhibit a broad phenotypic spectrum with a variable degree of severity, even if having identical variants. In addition to distinctive dental, craniofacial and neurodevelopmental abnormalities, patients often present with skeletal anomalies, particularly postnatal short stature. The relationship between ANKRD11 variants and short stature is not well-understood, with limited knowledge regarding its occurrence rate or underlying biological mechanism involved. This review aims to provide an updated analysis of the molecular spectrum associated with ANKRD11 variants, investigate the prevalence of the short stature among patients harboring these variants, evaluate the efficacy of recombinant human growth hormone in treating children with short stature and ANKRD11 variants, and explore the biological mechanisms underlying short stature from both scientific and clinical perspectives. Our investigation indicated that frameshift and nonsense were the most frequent types in 583 pathogenic or likely pathogenic variants identified in the ANKRD11 gene. Among the 245 KBGS patients with height data, approximately 50% displayed short stature. Most patients showed a positive response to rhGH therapy, although the number of patients receiving treatment was limited. ANKRD11 deficiency potentially disrupts longitudinal bone growth by affecting the orderly differentiation of growth plate chondrocytes. Our review offers crucial insights into the association between ANKRD11 variants and short stature and provides valuable guidance for precise clinical diagnosis and treatment of patients with KBG syndrome.
Collapse
Affiliation(s)
- Dongye He
- Department of Endocrinology, Genetics and Metabolism, Affiliated Hospital of Jining Medical University, Jining, Shandong, 272029, China.
- Medical Research Center, Affiliated Hospital of Jining Medical University, Jining, China.
| | - Mei Zhang
- Department of Endocrinology, Genetics and Metabolism, Affiliated Hospital of Jining Medical University, Jining, Shandong, 272029, China
- Chinese Research Center for Behavior Medicine in Growth and Development, Jining, China
| | - Yanying Li
- Department of Endocrinology, Genetics and Metabolism, Affiliated Hospital of Jining Medical University, Jining, Shandong, 272029, China
- Chinese Research Center for Behavior Medicine in Growth and Development, Jining, China
| | - Fupeng Liu
- Department of Endocrinology, Genetics and Metabolism, Affiliated Hospital of Jining Medical University, Jining, Shandong, 272029, China
- Medical Research Center, Affiliated Hospital of Jining Medical University, Jining, China
| | - Bo Ban
- Department of Endocrinology, Genetics and Metabolism, Affiliated Hospital of Jining Medical University, Jining, Shandong, 272029, China.
- Medical Research Center, Affiliated Hospital of Jining Medical University, Jining, China.
- Chinese Research Center for Behavior Medicine in Growth and Development, Jining, China.
| |
Collapse
|
3
|
García Mesa JJ, Zhu Z, Cartwright RA. COATi: Statistical Pairwise Alignment of Protein-Coding Sequences. Mol Biol Evol 2024; 41:msae117. [PMID: 38869090 PMCID: PMC11255384 DOI: 10.1093/molbev/msae117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 04/26/2024] [Accepted: 05/28/2024] [Indexed: 06/14/2024] Open
Abstract
Sequence alignment is an essential method in bioinformatics and the basis of many analyses, including phylogenetic inference, ancestral sequence reconstruction, and gene annotation. Sequencing artifacts and errors made during genome assembly, such as abiological frameshifts and incorrect early stop codons, can impact downstream analyses leading to erroneous conclusions in comparative and functional genomic studies. More significantly, while indels can occur both within and between codons in natural sequences, most amino-acid- and codon-based aligners assume that indels only occur between codons. This mismatch between biology and alignment algorithms produces suboptimal alignments and errors in downstream analyses. To address these issues, we present COATi, a statistical, codon-aware pairwise aligner that supports complex insertion-deletion models and can handle artifacts present in genomic data. COATi allows users to reduce the amount of discarded data while generating more accurate sequence alignments. COATi can infer indels both within and between codons, leading to improved sequence alignments. We applied COATi to a dataset containing orthologous protein-coding sequences from humans and gorillas and conclude that 41% of indels occurred between codons, agreeing with previous work in other species. We also applied COATi to semiempirical benchmark alignments and find that it outperforms several popular alignment programs on several measures of alignment quality and accuracy.
Collapse
Affiliation(s)
- Juan José García Mesa
- The Biodesign Institute, Arizona State University, Tempe, AZ, USA
- Ira A. Fulton Schools of Engineering, Arizona State University, Tempe, AZ, USA
| | - Ziqi Zhu
- The Biodesign Institute, Arizona State University, Tempe, AZ, USA
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Reed A Cartwright
- The Biodesign Institute, Arizona State University, Tempe, AZ, USA
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
4
|
Kim S, Han DJ, Lee SY, Moon Y, Kang SJ, Kim TM. A Subset of Microsatellite Unstable Cancer Genomes Prone to Short Insertions over Deletions Is Associated with Elevated Anticancer Immunity. Genes (Basel) 2024; 15:770. [PMID: 38927706 PMCID: PMC11202581 DOI: 10.3390/genes15060770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Revised: 06/10/2024] [Accepted: 06/10/2024] [Indexed: 06/28/2024] Open
Abstract
Deficiencies in DNA mismatch repair (MMRd) leave characteristic footprints of microsatellite instability (MSI) in cancer genomes. We used data from the Cancer Genome Atlas and International Cancer Genome Consortium to conduct a comprehensive analysis of MSI-associated cancers, focusing on indel mutational signatures. We classified MSI-high genomes into two subtypes based on their indel profiles: deletion-dominant (MMRd-del) and insertion-dominant (MMRd-ins). Compared with MMRd-del genomes, MMRd-ins genomes exhibit distinct mutational and transcriptomic features, including a higher prevalence of T>C substitutions and related mutation signatures. Short insertions and deletions in MMRd-ins and MMRd-del genomes target different sets of genes, resulting in distinct indel profiles between the two subtypes. In addition, indels in the MMRd-ins genomes are enriched with subclonal alterations that provide clues about a distinct evolutionary relationship between the MMRd-ins and MMRd-del genomes. Notably, the transcriptome analysis indicated that MMRd-ins cancers upregulate immune-related genes, show a high level of immune cell infiltration, and display an elevated neoantigen burden. The genomic and transcriptomic distinctions between the two types of MMRd genomes highlight the heterogeneity of genetic mechanisms and resulting genomic footprints and transcriptomic changes in cancers, which has potential clinical implications.
Collapse
Affiliation(s)
- Sunmin Kim
- Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul 06591, Republic of Korea; (S.K.)
- Cancer Research Institute, College of Medicine, The Catholic University of Korea, Seoul 06591, Republic of Korea
- Department of Biomedicine & Health Sciences, Graduate School, The Catholic University of Korea, Seoul 06591, Republic of Korea
| | - Dong-Jin Han
- Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul 06591, Republic of Korea; (S.K.)
- Cancer Research Institute, College of Medicine, The Catholic University of Korea, Seoul 06591, Republic of Korea
- Department of Biomedicine & Health Sciences, Graduate School, The Catholic University of Korea, Seoul 06591, Republic of Korea
| | - Seo-Young Lee
- Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul 06591, Republic of Korea; (S.K.)
- Cancer Research Institute, College of Medicine, The Catholic University of Korea, Seoul 06591, Republic of Korea
- Department of Biomedicine & Health Sciences, Graduate School, The Catholic University of Korea, Seoul 06591, Republic of Korea
| | - Youngbeen Moon
- Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul 06591, Republic of Korea; (S.K.)
- Cancer Research Institute, College of Medicine, The Catholic University of Korea, Seoul 06591, Republic of Korea
- Department of Biomedicine & Health Sciences, Graduate School, The Catholic University of Korea, Seoul 06591, Republic of Korea
| | - Su Jung Kang
- Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul 06591, Republic of Korea; (S.K.)
- Cancer Research Institute, College of Medicine, The Catholic University of Korea, Seoul 06591, Republic of Korea
- Department of Biomedicine & Health Sciences, Graduate School, The Catholic University of Korea, Seoul 06591, Republic of Korea
| | - Tae-Min Kim
- Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul 06591, Republic of Korea; (S.K.)
- Cancer Research Institute, College of Medicine, The Catholic University of Korea, Seoul 06591, Republic of Korea
- Department of Biomedicine & Health Sciences, Graduate School, The Catholic University of Korea, Seoul 06591, Republic of Korea
- CMC Institute for Basic Medical Science, The Catholic Medical Center, The Catholic University of Korea, Seoul 06591, Republic of Korea
| |
Collapse
|
5
|
Yang Y, Braga MV, Dean MD. Insertion-Deletion Events Are Depleted in Protein Regions with Predicted Secondary Structure. Genome Biol Evol 2024; 16:evae093. [PMID: 38735759 PMCID: PMC11102076 DOI: 10.1093/gbe/evae093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 04/16/2024] [Accepted: 04/21/2024] [Indexed: 05/14/2024] Open
Abstract
A fundamental goal in evolutionary biology and population genetics is to understand how selection shapes the fate of new mutations. Here, we test the null hypothesis that insertion-deletion (indel) events in protein-coding regions occur randomly with respect to secondary structures. We identified indels across 11,444 sequence alignments in mouse, rat, human, chimp, and dog genomes and then quantified their overlap with four different types of secondary structure-alpha helices, beta strands, protein bends, and protein turns-predicted by deep-learning methods of AlphaFold2. Indels overlapped secondary structures 54% as much as expected and were especially underrepresented over beta strands, which tend to form internal, stable regions of proteins. In contrast, indels were enriched by 155% over regions without any predicted secondary structures. These skews were stronger in the rodent lineages compared to the primate lineages, consistent with population genetic theory predicting that natural selection will be more efficient in species with larger effective population sizes. Nonsynonymous substitutions were also less common in regions of protein secondary structure, although not as strongly reduced as in indels. In a complementary analysis of thousands of human genomes, we showed that indels overlapping secondary structure segregated at significantly lower frequency than indels outside of secondary structure. Taken together, our study shows that indels are selected against if they overlap secondary structure, presumably because they disrupt the tertiary structure and function of a protein.
Collapse
Affiliation(s)
- Yi Yang
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Matthew V Braga
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Matthew D Dean
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
6
|
Gao B, Li P, Zhu S. Single Deletion Unmasks Hidden Anti-Gram-Negative Bacterial Activity of an Insect Defensin-Derived Peptide. J Med Chem 2024; 67:2512-2528. [PMID: 38335999 DOI: 10.1021/acs.jmedchem.3c01584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/12/2024]
Abstract
Insect defensins are a large family of antimicrobial peptides primarily active against Gram-positive bacteria. Here, we explore their hidden anti-Gram-negative bacterial potential via a nature-guided strategy inspired by natural deletion variants of Drosophila defensins. Referring to these variants, we deleted the equivalent region of an insect defensin with the first cysteine-containing N-terminus, and the last three cysteine-containing C-terminal regions remained. This 15-mer peptide exhibits low solubility and specifically targets Gram-positive bacteria. Further deletion of alanine-9 remarkably improves its solubility, unmasks its hidden anti-Gram-negative bacterial activity, and alters its states in different environments. Intriguingly, compared with the oxidized form, the 14-mer reduced peptide shows increased activity on Gram-positive and Gram-negative bacteria through a membrane-disruptive mechanism. The broad-spectrum activity and tolerance to high-salt environments and human serum, together with no toxicity to mammalian or human cells, make it a promising candidate for the design of new peptide antibiotics against Gram-negative bacterial infections.
Collapse
Affiliation(s)
- Bin Gao
- Group of Peptide Biology and Evolution, State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China
| | - Ping Li
- Key Laboratory for Biomedical Effects of Nanomaterials and Nanosafety (Chinese Academy of Sciences), National Center for Nanoscience and Technology, No.11 ZhongGuanCun BeiYiTiao, Haidian District, Beijing 100190, China
| | - Shunyi Zhu
- Group of Peptide Biology and Evolution, State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China
| |
Collapse
|
7
|
Ramos RM, Petroli RJ, D'Alessandre NDR, Guardia GDA, Afonso ACDF, Nishi MY, Domenice S, Galante PAF, Mendonca BB, Batista RL. Small Indels in the Androgen Receptor Gene: Phenotype Implications and Mechanisms of Mutagenesis. J Clin Endocrinol Metab 2023; 109:68-79. [PMID: 37572362 DOI: 10.1210/clinem/dgad470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 08/02/2023] [Accepted: 08/07/2023] [Indexed: 08/14/2023]
Abstract
CONTEXT Despite high abundance of small indels in human genomes, their precise roles and underlying mechanisms of mutagenesis in Mendelian disorders require further investigation. OBJECTIVE To profile the distribution, functional implications, and mechanisms of small indels in the androgen receptor (AR) gene in individuals with androgen insensitivity syndrome (AIS). METHODS We conducted a systematic review of previously reported indels within the coding region of the AR gene, including 3 novel indels. Distribution throughout the AR coding region was examined and compared with genomic population data. Additionally, we assessed their impact on the AIS phenotype and investigated potential mechanisms driving their occurrence. RESULTS A total of 82 indels in AIS were included. Notably, all frameshift indels exhibited complete AIS. The distribution of indels across the AR gene showed a predominance in the N-terminal domain, most leading to frameshift mutations. Small deletions accounted for 59.7%. Most indels occurred in nonrepetitive sequences, with 15.8% situated within triplet regions. Gene burden analysis demonstrated significant enrichment of frameshift indels in AIS compared with controls (P < .00001), and deletions were overrepresented in AIS (P < .00001). CONCLUSION Our findings underscore a robust genotype-phenotype relationship regarding small indels in the AR gene in AIS, with a vast majority presenting complete AIS. Triplet regions and homopolymeric runs emerged as prone loci for small indels within the AR. Most were frameshift indels, with polymerase slippage potentially explaining half of AR indel occurrences. Complex frameshift indels exhibited association with palindromic runs. These discoveries advance understanding of the genetic basis of AIS and shed light on potential mechanisms underlying pathogenic small indel events.
Collapse
Affiliation(s)
- Raquel Martinez Ramos
- Developmental Endocrinology Unit, Hormone and Molecular Genetics Laboratory (LIM/42), Endocrinology Division, Internal Medicine Department, Medical School, University of São Paulo (USP), São Paulo, SP, 05403-000, Brazil
| | - Reginaldo José Petroli
- Faculdade de Medicina da Universidade Federal de Alagoas (UFAL), Programa de Pós-Graduação em Ciências Médicas-UFAL, Maceió, AL, 57072-900, Brazil
| | | | | | - Ana Caroline de Freitas Afonso
- Developmental Endocrinology Unit, Hormone and Molecular Genetics Laboratory (LIM/42), Endocrinology Division, Internal Medicine Department, Medical School, University of São Paulo (USP), São Paulo, SP, 05403-000, Brazil
| | - Mirian Yumie Nishi
- Developmental Endocrinology Unit, Hormone and Molecular Genetics Laboratory (LIM/42), Endocrinology Division, Internal Medicine Department, Medical School, University of São Paulo (USP), São Paulo, SP, 05403-000, Brazil
| | - Sorahia Domenice
- Developmental Endocrinology Unit, Hormone and Molecular Genetics Laboratory (LIM/42), Endocrinology Division, Internal Medicine Department, Medical School, University of São Paulo (USP), São Paulo, SP, 05403-000, Brazil
| | | | - Berenice Bilharinho Mendonca
- Developmental Endocrinology Unit, Hormone and Molecular Genetics Laboratory (LIM/42), Endocrinology Division, Internal Medicine Department, Medical School, University of São Paulo (USP), São Paulo, SP, 05403-000, Brazil
| | - Rafael Loch Batista
- Developmental Endocrinology Unit, Hormone and Molecular Genetics Laboratory (LIM/42), Endocrinology Division, Internal Medicine Department, Medical School, University of São Paulo (USP), São Paulo, SP, 05403-000, Brazil
- Instituto do Câncer do Estado de São Paulo da Faculdade, de Medicina da Universidade de São Paulo (ICESP), São Paulo, SP, 01246-000, Brazil
| |
Collapse
|
8
|
Sykes J, Holland BR, Charleston MA. A review of visualisations of protein fold networks and their relationship with sequence and function. Biol Rev Camb Philos Soc 2023; 98:243-262. [PMID: 36210328 PMCID: PMC10092621 DOI: 10.1111/brv.12905] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 09/08/2022] [Accepted: 09/09/2022] [Indexed: 01/12/2023]
Abstract
Proteins form arguably the most significant link between genotype and phenotype. Understanding the relationship between protein sequence and structure, and applying this knowledge to predict function, is difficult. One way to investigate these relationships is by considering the space of protein folds and how one might move from fold to fold through similarity, or potential evolutionary relationships. The many individual characterisations of fold space presented in the literature can tell us a lot about how well the current Protein Data Bank represents protein fold space, how convergence and divergence may affect protein evolution, how proteins affect the whole of which they are part, and how proteins themselves function. A synthesis of these different approaches and viewpoints seems the most likely way to further our knowledge of protein structure evolution and thus, facilitate improved protein structure design and prediction.
Collapse
Affiliation(s)
- Janan Sykes
- School of Natural Sciences, University of Tasmania, Private Bag 37, Hobart, Tasmania, 7001, Australia
| | - Barbara R Holland
- School of Natural Sciences, University of Tasmania, Private Bag 37, Hobart, Tasmania, 7001, Australia
| | - Michael A Charleston
- School of Natural Sciences, University of Tasmania, Private Bag 37, Hobart, Tasmania, 7001, Australia
| |
Collapse
|
9
|
Yao Y, Sun K, Yang Q, Zhou Z, Qian J, Li Z, Shao C, Qian X, Tang Q, Xie J. Development of a multiplex panel with 31 multi-allelic InDels for forensic DNA typing. Int J Legal Med 2023; 137:1-12. [PMID: 36326889 DOI: 10.1007/s00414-022-02907-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Accepted: 10/20/2022] [Indexed: 11/06/2022]
Abstract
Insertion/Deletion (InDel) polymorphic genetic markers are abundant in human genomes. Diallelic InDel markers have been widely studied for forensic purposes, yet the low polymorphic information content limits their application and current InDel panels remain to be improved. In this study, multi-allelic InDels located out of low complexity sequence regions were selected in the datasets from East Asian populations, and a multiplex amplification system containing 31 multi-allelic InDel markers and the Amelogenin marker (FA-HID32plex) was constructed and optimized. The preliminary study on sensitivity, species specificity, inhibitor tolerance, mixture resolution, and the detection of degraded samples demonstrates that the FA-HID32plex is highly sensitive, specific, and robust for traces and degraded samples. The combined power of discrimination (CPD) of 31 multi-allelic InDel markers was 0.999 999 999 999 999 999 85, and the cumulative probability of exclusion (CPE) was 0.999 920 in a Chinese Han population, which indicates a high discrimination power. Altogether, the FA-HID32plex panel could provide reliable supplements or stand-alone information in individual identification and paternity testing, especially for challenging samples.
Collapse
Affiliation(s)
- Yining Yao
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China
| | - Kuan Sun
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China.,Department of Fetal Medicine and Prenatal Diagnosis Center, Shanghai First Maternity and Infant Hospital, Tongji University School of Medicine, 2699 West Gaoke Rd, 201204, Shanghai, China.,Shanghai Key Laboratory of Maternal Fetal Medicine, Shanghai First Maternity and Infant Hospital, School of Medicine, Tongji University, Shanghai, 200092, China
| | - Qinrui Yang
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China
| | - Zhihan Zhou
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China
| | - Jinglei Qian
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China
| | - Zhimin Li
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China
| | - Chengchen Shao
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China
| | - Xiaoqin Qian
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China
| | - Qiqun Tang
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fudan University, Shanghai, 200032, China
| | - Jianhui Xie
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, 138 Yixueyuan Road, Shanghai, 200032, China.
| |
Collapse
|
10
|
Seo TK, Redelings BD, Thorne JL. Correlations between alignment gaps and nucleotide substitution or amino acid replacement. Proc Natl Acad Sci U S A 2022; 119:e2204435119. [PMID: 35972964 PMCID: PMC9407537 DOI: 10.1073/pnas.2204435119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Accepted: 07/11/2022] [Indexed: 11/18/2022] Open
Abstract
To assess the conventional treatment in evolutionary inference of alignment gaps as missing data, we propose a simple nonparametric test of the null hypothesis that the locations of alignment gaps are independent of the nucleotide substitution or amino acid replacement process. When we apply the test to 1,390 protein alignments that are informed by protein tertiary structure and use a 5% significance level, the null hypothesis of independence between amino acid replacement and gap location is rejected for ∼65% of datasets. Via simulations that include substitution and insertion-deletion, we show that the test performs well with true alignments. When we simulate according to the null hypothesis and then apply the test to optimal alignments that are inferred by each of four widely used software packages, the null hypothesis is rejected too frequently. Via further simulations and analyses, we show that the overly frequent rejections of the null hypothesis are not solely due to weaknesses of widely used software for finding optimal alignments. Instead, our evidence suggests that optimal alignments are unrepresentative of true alignments and that biased evolutionary inferences may result from relying upon individual optimal alignments.
Collapse
Affiliation(s)
- Tae-Kun Seo
- Division of Life Sciences, Korea Polar Research Institute, Yeonsu-gu, Incheon 21990, Republic of Korea
| | - Benjamin D. Redelings
- Biology Department, Duke University, Durham, NC 27708
- Ronin Institute, Durham, NC 27705
- Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045
| | - Jeffrey L. Thorne
- Department of Biological Sciences, North Carolina State University, Raleigh, NC 27695
- Department of Statistics, North Carolina State University, Raleigh, NC 27695
| |
Collapse
|
11
|
Jowkar G, Pečerska J, Maiolo M, Gil M, Anisimova M. ARPIP: Ancestral sequence Reconstruction with insertions and deletions under the Poisson Indel Process. Syst Biol 2022:6648472. [PMID: 35866991 DOI: 10.1093/sysbio/syac050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 07/06/2022] [Indexed: 11/12/2022] Open
Abstract
Modern phylogenetic methods allow inference of ancestral molecular sequences given an alignment and phylogeny relating present day sequences. This provides insight into the evolutionary history of molecules, helping to understand gene function and to study biological processes such as adaptation and convergent evolution across a variety of applications. Here we propose a dynamic programming algorithm for fast joint likelihood-based reconstruction of ancestral sequences under the Poisson Indel Process (PIP). Unlike previous approaches, our method, named ARPIP, enables the reconstruction with insertions and deletions based on an explicit indel model. Consequently, inferred indel events have an explicit biological interpretation. Likelihood computation is achieved in linear time with respect to the number of sequences. Our method consists of two steps, namely finding the most probable indel points and reconstructing ancestral sequences. First, we find the most likely indel points and prune the phylogeny to reflect the insertion and deletion events per site. Second, we infer the ancestral states on the pruned subtree in a manner similar to FastML. We applied ARPIP on simulated datasets and on real data from the Betacoronavirus genus. ARPIP reconstructs both the indel events and substitutions with a high degree of accuracy. Our method fares well when compared to established state-of-the-art methods such as FastML and PAML. Moreover, the method can be extended to explore both optimal and suboptimal reconstructions, include rate heterogeneity through time and more. We believe it will expand the range of novel applications of ancestral sequence reconstruction.
Collapse
Affiliation(s)
- Gholamhossein Jowkar
- Zurich University of Applied Sciences, School of Life Sciences and Facility Management, CH-8820, Wädenswil, Switzerland.,Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland.,University of Neuchâtel, Institute of biology, CH-2000 Neuchâtel, Switzerland
| | - Jūlija Pečerska
- Zurich University of Applied Sciences, School of Life Sciences and Facility Management, CH-8820, Wädenswil, Switzerland.,Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland
| | - Massimo Maiolo
- Zurich University of Applied Sciences, School of Life Sciences and Facility Management, CH-8820, Wädenswil, Switzerland.,Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland.,University of Bern, Institute of Pathology, CH-3008 Bern, Switzerland
| | - Manuel Gil
- Zurich University of Applied Sciences, School of Life Sciences and Facility Management, CH-8820, Wädenswil, Switzerland.,Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland
| | - Maria Anisimova
- Zurich University of Applied Sciences, School of Life Sciences and Facility Management, CH-8820, Wädenswil, Switzerland.,Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland
| |
Collapse
|
12
|
Savino S, Desmet T, Franceus J. Insertions and deletions in protein evolution and engineering. Biotechnol Adv 2022; 60:108010. [PMID: 35738511 DOI: 10.1016/j.biotechadv.2022.108010] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 06/15/2022] [Accepted: 06/16/2022] [Indexed: 11/17/2022]
Abstract
Protein evolution or engineering studies are traditionally focused on amino acid substitutions and the way these contribute to fitness. Meanwhile, the insertion and deletion of amino acids is often overlooked, despite being one of the most common sources of genetic variation. Recent methodological advances and successful engineering stories have demonstrated that the time is ripe for greater emphasis on these mutations and their understudied effects. This review highlights the evolutionary importance and biotechnological relevance of insertions and deletions (indels). We provide a comprehensive overview of approaches that can be employed to include indels in random, (semi)-rational or computational protein engineering pipelines. Furthermore, we discuss the tolerance to indels at the structural level, address how domain indels can link the function of unrelated proteins, and feature studies that illustrate the surprising and intriguing potential of frameshift mutations.
Collapse
Affiliation(s)
- Simone Savino
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, Coupure Links 653, 9000 Ghent, Belgium
| | - Tom Desmet
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, Coupure Links 653, 9000 Ghent, Belgium
| | - Jorick Franceus
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, Coupure Links 653, 9000 Ghent, Belgium..
| |
Collapse
|
13
|
Insertion-and-Deletion Mutations between the Genomes of SARS-CoV, SARS-CoV-2, and Bat Coronavirus RaTG13. Microbiol Spectr 2022; 10:e0071622. [PMID: 35658573 PMCID: PMC9241832 DOI: 10.1128/spectrum.00716-22] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
The evolutional process of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) development remains inconclusive. This study compared the genome sequences of severe acute respiratory syndrome coronavirus (SARS-CoV), bat coronavirus RaTG13, and SARS-CoV-2. In total, the genomes of SARS-CoV-2 and RaTG13 were 77.9% and 77.7% identical to the genome of SARS-CoV, respectively. A total of 3.6% (1,068 bases) of the SARS-CoV-2 genome was derived from insertion and/or deletion (indel) mutations, and 18.6% (5,548 bases) was from point mutations from the genome of SARS-CoV. At least 35 indel sites were confirmed in the genome of SARS-CoV-2, in which 17 were with ≥10 consecutive bases long. Ten of these relatively long indels were located in the spike (S) gene, five in nonstructural protein 3 (Nsp3) gene of open reading frame (ORF) 1a, and one in ORF8 and noncoding region. Seventeen (48.6%) of the 35 indels were based on insertion-and-deletion mutations with exchanged gene sequences of 7–325 consecutive bases. Almost the complete ORF8 gene was replaced by a single 325 consecutive base-long indel. The distribution of these indels was roughly in accordance with the distribution of the rate of point mutation rate around the indels. The genome sequence of SARS-CoV-2 was 96.0% identical to that of RaTG13. There was no long insertion-and-deletion mutation between the genomes of RaTG13 and SARS-CoV-2. The findings of the uneven distribution of multiple indels and the presence of multiple long insertion-and-deletion mutations with exchanged consecutive base sequences in the viral genome may provide insights into SARS-CoV-2 development. IMPORTANCE The developmental mechanism of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) remains inconclusive. This study compared the base sequence one-by-one between severe acute respiratory syndrome coronavirus (SARS-CoV) or bat coronavirus RaTG13 and SARS-CoV-2. The genomes of SARS-CoV-2 and RaTG13 were 77.9% and 77.7% identical to the genome of SARS-CoV, respectively. Seventeen of the 35 sites with insertion and/or deletion mutations between SARS-CoV-2 and SARS-CoV were based on insertion-and-deletion mutations with the replacement of 7–325 consecutive bases. Most of these long insertion-and-deletion sites were concentrated in the nonstructural protein 3 (Nsp3) gene of open reading frame (ORF) 1a, S1 domain of the spike protein, and ORF8 genes. Such long insertion-and-deletion mutations were not observed between the genomes of RaTG13 and SARS-CoV-2. The presence of multiple long insertion-and-deletion mutations in the genome of SARS-CoV-2 and their uneven distributions may provide further insights into the development of the virus.
Collapse
|
14
|
DeWeerd RA, Németh E, Póti Á, Petryk N, Chen CL, Hyrien O, Szüts D, Green AM. Prospectively defined patterns of APOBEC3A mutagenesis are prevalent in human cancers. Cell Rep 2022; 38:110555. [PMID: 35320711 PMCID: PMC9283007 DOI: 10.1016/j.celrep.2022.110555] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2021] [Revised: 12/15/2021] [Accepted: 03/02/2022] [Indexed: 12/14/2022] Open
Abstract
Mutational signatures defined by single base substitution (SBS) patterns in cancer have elucidated potential mutagenic processes that contribute to malignancy. Two prevalent mutational patterns in human cancers are attributed to the APOBEC3 cytidine deaminase enzymes. Among the seven human APOBEC3 proteins, APOBEC3A is a potent deaminase and proposed driver of cancer mutagenesis. In this study, we prospectively examine genome-wide aberrations by expressing human APOBEC3A in avian DT40 cells. From whole-genome sequencing, we detect hundreds to thousands of base substitutions per genome. The APOBEC3A signature includes widespread cytidine mutations and a unique insertion-deletion (indel) signature consisting largely of cytidine deletions. This multi-dimensional APOBEC3A signature is prevalent in human cancer genomes. Our data further reveal replication-associated mutations, the rate of stem-loop and clustered mutations, and deamination of methylated cytidines. This comprehensive signature of APOBEC3A mutagenesis is a tool for future studies and a potential biomarker for APOBEC3 activity in cancer.
Collapse
Affiliation(s)
- Rachel A DeWeerd
- Department of Pediatrics, Washington University School of Medicine, St. Louis, MO, USA
| | - Eszter Németh
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest, Hungary
| | - Ádám Póti
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest, Hungary
| | - Nataliya Petryk
- Epigenetics & Cell Fate UMR7216, CNRS, University of Paris, 35 rue Hélène Brion, 75013 Paris, France
| | - Chun-Long Chen
- Institut Curie, Université PSL, Sorbonne Université, CNRS UMR3244, Dynamics of Genetic Information, Paris, France
| | - Olivier Hyrien
- Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Ecole Normale Supérieure, CNRS, INSERM, Université PSL, 46 rue d'Ulm, 75005 Paris, France
| | - Dávid Szüts
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest, Hungary.
| | - Abby M Green
- Department of Pediatrics, Washington University School of Medicine, St. Louis, MO, USA; Center for Genome Integrity, Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO, USA.
| |
Collapse
|
15
|
Yao Y, Sun K, Yang Q, Zhou Z, Shao C, Qian X, Tang Q, Xie J. Assessing Autosomal InDel Loci With Multiple Insertions or Deletions of Random DNA Sequences in Human Genome. Front Genet 2022; 12:809815. [PMID: 35178073 PMCID: PMC8844376 DOI: 10.3389/fgene.2021.809815] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 12/27/2021] [Indexed: 11/13/2022] Open
Abstract
Multiple mutational events of insertion/deletion occurring at or around InDel sites could form multi-allelic InDels and multi-InDels (abbreviated as MM-InDels), while InDels with random DNA sequences could imply a unique mutation event at these loci. In this study, preliminary investigation of MM-InDels with random sequences was conducted using high-throughput phased data from the 1000 Genomes Project. A total of 3,599 multi-allelic InDels and 6,375 multi-InDels were filtered with multiple alleles. A vast majority of the obtained MM-InDels (85.59%) presented 3 alleles, which implies that only one secondary insertion or deletion mutation event occurred at these loci. The more frequent presence of two adjacent InDel loci was observed within 20 bp. MM-InDels with random sequences presented an uneven distribution across the genome and showed a correlation with InDels, SNPs, recombination rate, and GC content. The average allelic frequencies and prevalence of multi-allelic InDels and multi-InDels presented similar distribution patterns in different populations. Altogether, MM-InDels with random sequences can provide useful information for population resolution.
Collapse
Affiliation(s)
- Yining Yao
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Kuan Sun
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Qinrui Yang
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Zhihan Zhou
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Chengchen Shao
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Xiaoqin Qian
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Qiqun Tang
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fudan University, Shanghai, China
| | - Jianhui Xie
- Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China
| |
Collapse
|
16
|
Chen J, Guo JT. Structural and functional analysis of somatic coding and UTR indels in breast and lung cancer genomes. Sci Rep 2021; 11:21178. [PMID: 34707120 PMCID: PMC8551294 DOI: 10.1038/s41598-021-00583-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 10/14/2021] [Indexed: 11/24/2022] Open
Abstract
Insertions and deletions (Indels) represent one of the major variation types in the human genome and have been implicated in diseases including cancer. To study the features of somatic indels in different cancer genomes, we investigated the indels from two large samples of cancer types: invasive breast carcinoma (BRCA) and lung adenocarcinoma (LUAD). Besides mapping somatic indels in both coding and untranslated regions (UTRs) from the cancer whole exome sequences, we investigated the overlap between these indels and transcription factor binding sites (TFBSs), the key elements for regulation of gene expression that have been found in both coding and non-coding sequences. Compared to the germline indels in healthy genomes, somatic indels contain more coding indels with higher than expected frame-shift (FS) indels in cancer genomes. LUAD has a higher ratio of deletions and higher coding and FS indel rates than BRCA. More importantly, these somatic indels in cancer genomes tend to locate in sequences with important functions, which can affect the core secondary structures of proteins and have a bigger overlap with predicted TFBSs in coding regions than the germline indels. The somatic CDS indels are also enriched in highly conserved nucleotides when compared with germline CDS indels.
Collapse
Affiliation(s)
- Jing Chen
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Jun-Tao Guo
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA.
| |
Collapse
|
17
|
Irving K, Wellenreuther M, Ritchie PA. Description of the growth hormone gene of the Australasian snapper, Chrysophrys auratus, and associated intra- and interspecific genetic variation. JOURNAL OF FISH BIOLOGY 2021; 99:1060-1070. [PMID: 34036582 DOI: 10.1111/jfb.14810] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 04/23/2021] [Accepted: 05/23/2021] [Indexed: 06/12/2023]
Abstract
The growth hormone (GH) gene of the marine teleost, the Australasian snapper (Chrysophrys auratus), was identified and characterized from the reference genome showing it was approximately 5577 bp in length and consisted of six exons and five introns. Large polymorphic repeat regions were found in the first and third introns, and putative transcription factor binding sites were identified. Phylogenetic analysis of the GH genes of perciform fish showed largely conserved coding regions and highly variable noncoding regions among species. Despite some exon sequence variation and an amino acid deletion identified between C. auratus and its sister species Chrysophrys/Pagrus major, the amino acid sequences and putative secondary structures were largely conserved across the Sparidae. A population-level assessment of 99 samples caught at five separate coastal locations in New Zealand revealed six variable alleles at the intron 1 site of the C. auratus GH gene. A population genetic analysis suggested that C. auratus from the five sample locations were largely panmictic, with no evidence for departure from the Hardy-Weinberg equilibrium, and have a high level of heterozygosity. Overall these results suggest that the GH gene is largely conserved across the coding regions, but some variability could be detected.
Collapse
Affiliation(s)
- Kate Irving
- School of Biological Sciences, Victoria University of Wellington, Wellington, New Zealand
| | - Maren Wellenreuther
- The New Zealand Institute for Plant and Food Research Limited, Nelson, New Zealand
- School of Biological Sciences, University of Auckland, Auckland, New Zealand
| | - Peter A Ritchie
- School of Biological Sciences, Victoria University of Wellington, Wellington, New Zealand
| |
Collapse
|
18
|
The landscape and driver potential of site-specific hotspots across cancer genomes. NPJ Genom Med 2021; 6:33. [PMID: 33986299 PMCID: PMC8119706 DOI: 10.1038/s41525-021-00197-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 04/15/2021] [Indexed: 11/09/2022] Open
Abstract
Large sets of whole cancer genomes make it possible to study mutation hotspots genome-wide. Here we detect, categorize, and characterize site-specific hotspots using 2279 whole cancer genomes from the Pan-Cancer Analysis of Whole Genomes project and provide a resource of annotated hotspots genome-wide. We investigate the excess of hotspots in both protein-coding and gene regulatory regions and develop measures of positive selection and functional impact for individual hotspots. Using cancer allele fractions, expression aberrations, mutational signatures, and a variety of genomic features, such as potential gain or loss of transcription factor binding sites, we annotate and prioritize all highly mutated hotspots. Genome-wide we find more high-frequency SNV and indel hotspots than expected given mutational background models. Protein-coding regions are generally enriched for SNV hotspots compared to other regions. Gene regulatory hotspots show enrichment of potential same-patient second-hit missense mutations, consistent with enrichment of hotspot driver mutations compared to singletons. For protein-coding regions, splice-sites, promoters, and enhancers, we see an excess of hotspots associated with cancer genes. Interestingly, missense hotspot mutations in tumor suppressors are associated with elevated expression, suggesting localized amino-acid changes with functional impact. For individual non-coding hotspots, only a small number show clear signs of positive selection, including known sites in the TERT promoter and the 5' UTR of TP53. Most of the new candidates have few mutations and limited driver evidence. However, a hotspot in an enhancer of the oncogene POU2AF1, which may create a transcription factor binding site, presents multiple lines of driver-consistent evidence.
Collapse
|
19
|
Characterization of intermediate-sized insertions using whole-genome sequencing data and analysis of their functional impact on gene expression. Hum Genet 2021; 140:1201-1216. [PMID: 33978893 DOI: 10.1007/s00439-021-02291-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 05/04/2021] [Indexed: 10/21/2022]
Abstract
Intermediate-sized insertions are one of the structural variants contributing to genome diversity. However, due to technical difficulties in identifying them, their importance in disease pathogenicity and gene expression regulation remains unclear. We used whole-genome sequencing data of 174 Japanese samples to characterize intermediate-sized insertions using a highly-accurate insertion calling method (IMSindel software and joint-call recovery) and obtained a catalogue of 4254 insertions. We constructed an imputation panel comprising of insertions and SNVs from all samples, and conducted imputation of intermediate-sized insertions for 82 publicly-available Japanese samples. Positive Predictive Value of imputation, evaluated using Nanopore long-read sequencing data, was 97%. Subsequent eQTL analysis predicted 128 (~ 3.0%) insertions as causative for gene expression level changes. Enrichment analysis of causal insertions for genome regulatory elements showed significant associations with CTCF-binding sites, super-enhancers, and promoters. Among 17 causal insertions found in the same causal set with GWAS hits, there were insertions associated with changes in expression of cancer-related genes such as BRCA1, ZNF222, and ABCB10. Analysis of insertions sequences revealed that 461 insertions were short tandem duplications frequently found in early-replicating regions of genome. Furthermore, comparison of functional importance of intermediate-sized insertions with that of intermediate-sized deletions detected in the same sample set in our previous study showed that insertions were more frequent in genic regions, and proportion of functional candidates was smaller in insertions. Here, we characterize a high-confidence set of intermediate-sized insertions and indicate their importance in gene expression regulation. Our results emphasize the importance of considering intermediate-sized insertions in trait association studies.
Collapse
|
20
|
Mohapatra SB, Manoj N. A conserved π-helix plays a key role in thermoadaptation of catalysis in the glycoside hydrolase family 4. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2020; 1869:140523. [PMID: 32853774 DOI: 10.1016/j.bbapap.2020.140523] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 07/22/2020] [Accepted: 07/29/2020] [Indexed: 01/21/2023]
Abstract
Here, we characterize the role of a π-helix in the molecular mechanisms underlying thermoadaptation in the glycoside hydrolase family 4 (GH4). The interspersed π-helix present in a subgroup is evolutionarily related to a conserved α-helix in other orthologs by a single residue insertion/deletion event. The insertional residue, Phe407, in a hyperthermophilic α-glucuronidase, makes specific interactions across the inter-subunit interface. In order to establish the sequence-structure-stability implications of the π-helix, the wild-type and the deletion variant (Δ407) were characterized. The variant showed a significant lowering of melting temperature and optimum temperature for the highest activity. Crystal structures of the proteins show a transformation of the π-helix to a continuous α-helix in the variant, identical to that in orthologs lacking this insertion. Thermodynamic parameters were determined from stability curves representing the temperature dependence of unfolding free energy. Though the proteins display maximum stabilities at similar temperatures, a higher melting temperature in the wild-type is achieved by a combination of higher enthalpy and lower heat capacity of unfolding. Comparisons of the structural changes, and the activity and thermodynamic profiles allow us to infer that specific non-covalent interactions, and the existence of residual structure in the unfolded state, are crucial determinants of its thermostability. These features permit the enzyme to balance the preservation of structure at a higher temperature with the thermodynamic stability required for optimum catalysis.
Collapse
Affiliation(s)
- Samar Bhallabha Mohapatra
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - Narayanan Manoj
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India.
| |
Collapse
|
21
|
Emond S, Petek M, Kay EJ, Heames B, Devenish SRA, Tokuriki N, Hollfelder F. Accessing unexplored regions of sequence space in directed enzyme evolution via insertion/deletion mutagenesis. Nat Commun 2020; 11:3469. [PMID: 32651386 PMCID: PMC7351745 DOI: 10.1038/s41467-020-17061-3] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Accepted: 06/01/2020] [Indexed: 11/22/2022] Open
Abstract
Insertions and deletions (InDels) are frequently observed in natural protein evolution, yet their potential remains untapped in laboratory evolution. Here we introduce a transposon-based mutagenesis approach (TRIAD) to generate libraries of random variants with short in-frame InDels, and screen TRIAD libraries to evolve a promiscuous arylesterase activity in a phosphotriesterase. The evolution exhibits features that differ from previous point mutagenesis campaigns: while the average activity of TRIAD variants is more compromised, a larger proportion has successfully adapted for the activity. Different functional profiles emerge: (i) both strong and weak trade-off between activities are observed; (ii) trade-off is more severe (20- to 35-fold increased kcat/KM in arylesterase with 60-400-fold decreases in phosphotriesterase activity) and (iii) improvements are present in kcat rather than just in KM, suggesting adaptive solutions. These distinct features make TRIAD an alternative to widely used point mutagenesis, accessing functional innovations and traversing unexplored fitness landscape regions.
Collapse
Affiliation(s)
- Stephane Emond
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK.
- Evonetix Ltd, Coldhams Business Park, Norman Way, Cambridge, CB1 3LH, UK.
| | - Maya Petek
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK
| | - Emily J Kay
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK
- Cancer Research UK Beatson Institute, Glasgow, G61 1BD, UK
| | - Brennen Heames
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK
- Institute for Evolution and Biodiversity, Westfälische Wilhelms-Universität, Hüfferstrasse 1, 48149, Münster, Germany
| | - Sean R A Devenish
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK
- Fluidic Analytics, The Paddocks Business Centre, Cherry Hinton Road, Cambridge, CB1 8DH, UK
| | - Nobuhiko Tokuriki
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Florian Hollfelder
- Department of Biochemistry, University of Cambridge, Cambridge, CB2 1GA, UK.
| |
Collapse
|
22
|
Casimiro-Soriguer CS, Rubio A, Jimenez J, Pérez-Pulido AJ. Ancient evolutionary signals of protein-coding sequences allow the discovery of new genes in the Drosophila melanogaster genome. BMC Genomics 2020; 21:210. [PMID: 32138644 PMCID: PMC7059364 DOI: 10.1186/s12864-020-6632-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Accepted: 02/28/2020] [Indexed: 12/20/2022] Open
Abstract
Background The current growth in DNA sequencing techniques makes of genome annotation a crucial task in the genomic era. Traditional gene finders focus on protein-coding sequences, but they are far from being exhaustive. The number of this kind of genes continuously increases due to new experimental data and development of improved bioinformatics algorithms. Results In this context, AnABlast represents a novel in silico strategy, based on the accumulation of short evolutionary signals identified by protein sequence alignments of low score. This strategy potentially highlights protein-coding regions in genomic sequences regardless of traditional homology or translation signatures. Here, we analyze the evolutionary information that the accumulation of these short signals encloses. Using the Drosophila melanogaster genome, we stablish optimal parameters for the accurate gene prediction with AnABlast and show that this new strategy significantly contributes to add genes, exons and pseudogenes regions, yet to be discovered in both already annotated and new genomes. Conclusions AnABlast can be freely used to analyze genomic regions of whole genomes where it contributes to complete the previous annotation.
Collapse
Affiliation(s)
- Carlos S Casimiro-Soriguer
- Centro Andaluz de Biologia del Desarrollo (CABD, UPO-CSIC-JA). Facultad de Ciencias Experimentales (Área de Genética), Universidad Pablo de Olavide, 41013, Sevilla, Spain
| | - Alejandro Rubio
- Centro Andaluz de Biologia del Desarrollo (CABD, UPO-CSIC-JA). Facultad de Ciencias Experimentales (Área de Genética), Universidad Pablo de Olavide, 41013, Sevilla, Spain
| | - Juan Jimenez
- Centro Andaluz de Biologia del Desarrollo (CABD, UPO-CSIC-JA). Facultad de Ciencias Experimentales (Área de Genética), Universidad Pablo de Olavide, 41013, Sevilla, Spain
| | - Antonio J Pérez-Pulido
- Centro Andaluz de Biologia del Desarrollo (CABD, UPO-CSIC-JA). Facultad de Ciencias Experimentales (Área de Genética), Universidad Pablo de Olavide, 41013, Sevilla, Spain.
| |
Collapse
|
23
|
Barton HJ, Zeng K. The Impact of Natural Selection on Short Insertion and Deletion Variation in the Great Tit Genome. Genome Biol Evol 2019; 11:1514-1524. [PMID: 30924871 PMCID: PMC6543879 DOI: 10.1093/gbe/evz068] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/27/2019] [Indexed: 12/11/2022] Open
Abstract
Insertions and deletions (INDELs) remain understudied, despite being the most common form of genetic variation after single nucleotide polymorphisms. This stems partly from the challenge of correctly identifying the ancestral state of an INDEL and thus identifying it as an insertion or a deletion. Erroneously assigned ancestral states can skew the site frequency spectrum, leading to artificial signals of selection. Consequently, the selective pressures acting on INDELs are, at present, poorly resolved. To tackle this issue, we have recently published a maximum likelihood approach to estimate the mutation rate and the distribution of fitness effects for INDELs. Our approach estimates and controls for the rate of ancestral state misidentification, overcoming issues plaguing previous INDEL studies. Here, we apply the method to INDEL polymorphism data from ten high coverage (∼44×) European great tit (Parus major) genomes. We demonstrate that coding INDELs are under strong purifying selection with a small proportion making it into the population (∼4%). However, among fixed coding INDELs, 71% of insertions and 86% of deletions are fixed by positive selection. In noncoding regions, we estimate ∼80% of insertions and ∼52% of deletions are effectively neutral, the remainder show signatures of purifying selection. Additionally, we see evidence of linked selection reducing INDEL diversity below background levels, both in proximity to exons and in areas of low recombination.
Collapse
Affiliation(s)
- Henry J Barton
- Department of Animal and Plant Sciences, University of Sheffield, United Kingdom
| | - Kai Zeng
- Department of Animal and Plant Sciences, University of Sheffield, United Kingdom
| |
Collapse
|
24
|
Mahajan S, Ramya TNC. Nature-inspired engineering of an F-type lectin for increased binding strength. Glycobiology 2019; 28:933-948. [PMID: 30202877 DOI: 10.1093/glycob/cwy082] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Accepted: 09/07/2018] [Indexed: 11/13/2022] Open
Abstract
Individual lectin-carbohydrate interactions are usually of low affinity. However, high avidity is frequently attained by the multivalent presentation of glycans on biological surfaces coupled with the occurrence of high order lectin oligomers or tandem repeats of lectin domains in the polypeptide. F-type lectins are l-fucose binding lectins with a typical sequence motif, HX(26)RXDX(4)R/K, whose residues participate in l-fucose binding. We previously reported the presence of a few eukaryotic F-type lectin domains with partial sequence duplication that results in the presence of two l-fucose-binding sequence motifs. We hypothesized that such partial sequence duplication would result in greater avidity of lectin-ligand interactions. Inspired by this example from Nature, we attempted to engineer a bacterial F-type lectin domain from Streptosporangium roseum to attain avid binding by mimicking partial duplication. The engineered lectin demonstrated 12-fold greater binding strength than the wild-type lectin to multivalent fucosylated glycoconjugates. However, the affinity to the monosaccharide l-fucose in solution was similar and partial sequence duplication did not result in an additional functional l-fucose binding site. We also cloned, expressed and purified a Branchiostoma floridae F-type lectin domain with naturally occurring partial sequence duplication and confirmed that the duplicated region with the F-type lectin sequence motif did not participate in l-fucose binding. We found that the greater binding strength of the engineered lectin from S. roseum was instead due to increased oligomerization. We believe that this Nature-inspired strategy might be useful for engineering lectins to improve binding strength in various applications.
Collapse
Affiliation(s)
- Sonal Mahajan
- Institute of Microbial Technology, Sector 39-A, Chandigarh, India
| | - T N C Ramya
- Institute of Microbial Technology, Sector 39-A, Chandigarh, India
| |
Collapse
|
25
|
Saga S, Sasaki N, Arai T. Molecular identification, characterization, and structure analysis of house musk shrew ( Suncus murinus) leptin. J Adv Vet Anim Res 2018; 6:1-8. [PMID: 31453164 PMCID: PMC6702923 DOI: 10.5455/javar.2019.f305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2018] [Revised: 11/28/2018] [Accepted: 12/01/2018] [Indexed: 11/24/2022] Open
Abstract
Objective: House musk shrew (Suncus murinus), a small experimental animal with low body fat, may be a possible model for human lipodystrophy. Leptin is an adipocyte-derived hormone thought to have an important role in the pathophysiology of lipodystrophy. The objectives of this study were to clarify the structure and distribution of suncus leptin. Materials and methods: To determine the primary structure of suncus leptin, we cloned the suncus Lep cDNA using the rapid amplification of cDNA ends method. The obtained amino acid (aa) sequence was compared with other mammals and the protein structure prediction was performed. Results: The suncus Lep cDNA encodes 170 aa. The putative suncus leptin precursor has a predicted signal peptide of 21 aa, and the mature leptin comprises 149 aa. The mature leptin is 75%–82% homologous to that of other species. Insertion of the three aa, VPQ, not seen in other mammals was found. This VPQ insertion is thought to be due to a nucleotide insertion of nine bases by slippage-like microindels. The predicted 3D structure of suncus leptin exhibited a typical four a-helix structure, however, the VPQ region protruded compared with human leptin. Lep mRNA expression was observed only in white and brown adipose tissues. Conclusion: This study revealed the structure and distribution of suncus leptin. Because the addition of VPQ, which is not found in other mammals, was observed, suncus leptin attracts attention to its physiological action, and to the possibility of being a model of human lipodystrophy.
Collapse
Affiliation(s)
- Sayaka Saga
- Laboratory of Veterinary Biochemistry, School of Veterinary Medicine, Nippon Veterinary and Life Science University, Tokyo, Japan
| | - Noriyasu Sasaki
- Laboratory of Veterinary Biochemistry, School of Veterinary Medicine, Nippon Veterinary and Life Science University, Tokyo, Japan
| | - Toshiro Arai
- Laboratory of Veterinary Biochemistry, School of Veterinary Medicine, Nippon Veterinary and Life Science University, Tokyo, Japan
| |
Collapse
|
26
|
HPV16 E2 variants correlated with radiotherapy treatment and biological significance in cervical cell carcinoma. INFECTION GENETICS AND EVOLUTION 2018; 65:238-243. [DOI: 10.1016/j.meegid.2018.08.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2018] [Revised: 07/05/2018] [Accepted: 08/01/2018] [Indexed: 11/21/2022]
|
27
|
Correlated Selection on Amino Acid Deletion and Replacement in Mammalian Protein Sequences. J Mol Evol 2018; 86:365-378. [PMID: 29955898 DOI: 10.1007/s00239-018-9853-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 06/21/2018] [Indexed: 10/28/2022]
Abstract
A low ratio of nonsynonymous and synonymous substitution rates (dN/dS) at a codon is an indicator of functional constraint caused by purifying selection. Intuitively, the functional constraint would also be expected to prevent such a codon from being deleted. However, to the best of our knowledge, the correlation between the rates of deletion and substitution has never actually been estimated. Here, we use 8595 protein-coding region sequences from nine mammalian species to examine the relationship between deletion rate and dN/dS. We find significant positive correlations at the levels of both sites and genes. We compared our data against controls consisting of simulated coding sequences evolving along identical phylogenetic trees, where deletions occur independently of substitutions. A much weaker correlation was found in the corresponding simulated sequences, probably caused by alignment errors. In the real data, the correlations cannot be explained by alignment errors. Separate investigations on nonsynonymous (dN) and synonymous (dS) substitution rates indicate that the correlation is most likely due to a similarity in patterns of selection rather than in mutation rates.
Collapse
|
28
|
Bogusz M, Whelan S. Phylogenetic Tree Estimation With and Without Alignment: New Distance Methods and Benchmarking. Syst Biol 2018; 66:218-231. [PMID: 27633353 DOI: 10.1093/sysbio/syw074] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2016] [Accepted: 08/23/2016] [Indexed: 12/20/2022] Open
Abstract
Phylogenetic tree inference is a critical component of many systematic and evolutionary studies. The majority of these studies are based on the two-step process of multiple sequence alignment followed by tree inference, despite persistent evidence that the alignment step can lead to biased results. Here we present a two-part study that first presents PaHMM-Tree, a novel neighbor joining-based method that estimates pairwise distances without assuming a single alignment. We then use simulations to benchmark its performance against a wide-range of other phylogenetic tree inference methods, including the first comparison of alignment-free distance-based methods against more conventional tree estimation methods. Our new method for calculating pairwise distances based on statistical alignment provides distance estimates that are as accurate as those obtained using standard methods based on the true alignment. Pairwise distance estimates based on the two-step process tend to be substantially less accurate. This improved performance carries through to tree inference, where PaHMM-Tree provides more accurate tree estimates than all of the pairwise distance methods assessed. For close to moderately divergent sequence data we find that the two-step methods using statistical inference, where information from all sequences is included in the estimation procedure, tend to perform better than PaHMM-Tree, particularly full statistical alignment, which simultaneously estimates both the tree and the alignment. For deep divergences we find the alignment step becomes so prone to error that our distance-based PaHMM-Tree outperforms all other methods of tree inference. Finally, we find that the accuracy of alignment-free methods tends to decline faster than standard two-step methods in the presence of alignment uncertainty, and identify no conditions where alignment-free methods are equal to or more accurate than standard phylogenetic methods even in the presence of substantial alignment error. [Alignment-free; distance-based phylogenetics; pair Hidden Markov Models; phylogenetic inference; statistical alignment.].
Collapse
Affiliation(s)
- Marcin Bogusz
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, 752 36 Uppsala, Sweden
| | - Simon Whelan
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, 752 36 Uppsala, Sweden
| |
Collapse
|
29
|
Sahm A, Bens M, Platzer M, Szafranski K. PosiGene: automated and easy-to-use pipeline for genome-wide detection of positively selected genes. Nucleic Acids Res 2017; 45:e100. [PMID: 28334822 PMCID: PMC5499814 DOI: 10.1093/nar/gkx179] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Accepted: 03/09/2017] [Indexed: 11/12/2022] Open
Abstract
Many comparative genomics studies aim to find the genetic basis of species-specific phenotypic traits. A prevailing strategy is to search genome-wide for genes that evolved under positive selection based on the non-synonymous to synonymous substitution ratio. However, incongruent results largely due to high false positive rates indicate the need for standardization of quality criteria and software tools. Main challenges are the ortholog and isoform assignment, the high sensitivity of the statistical models to alignment errors and the imperative to parallelize large parts of the software. We developed the software tool PosiGene that (i) detects positively selected genes (PSGs) on genome-scale, (ii) allows analysis of specific evolutionary branches, (iii) can be used in arbitrary species contexts and (iv) offers visualization of the results for further manual validation and biological interpretation. We exemplify PosiGene's performance using simulated and real data. In the simulated data approach, we determined a false positive rate <1%. With real data, we found that 68.4% of the PSGs detected by PosiGene, were shared by at least one previous study that used the same set of species. PosiGene is a user-friendly, reliable tool for reproducible genome-wide identification of PSGs and freely available at https://github.com/gengit/PosiGene.
Collapse
Affiliation(s)
- Arne Sahm
- Leibniz Institute on Aging, Fritz Lipmann Institute, 07745 Jena, Germany
| | - Martin Bens
- Leibniz Institute on Aging, Fritz Lipmann Institute, 07745 Jena, Germany
| | - Matthias Platzer
- Leibniz Institute on Aging, Fritz Lipmann Institute, 07745 Jena, Germany
| | - Karol Szafranski
- Leibniz Institute on Aging, Fritz Lipmann Institute, 07745 Jena, Germany
| |
Collapse
|
30
|
Halliwell LM, Jathoul AP, Bate JP, Worthy HL, Anderson JC, Jones DD, Murray JAH. ΔFlucs: Brighter Photinus pyralis firefly luciferases identified by surveying consecutive single amino acid deletion mutations in a thermostable variant. Biotechnol Bioeng 2017; 115:50-59. [PMID: 28921549 DOI: 10.1002/bit.26451] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Revised: 09/08/2017] [Accepted: 09/11/2017] [Indexed: 11/05/2022]
Abstract
The bright bioluminescence catalyzed by Photinus pyralis firefly luciferase (Fluc) enables a vast array of life science research such as bio imaging in live animals and sensitive in vitro diagnostics. The effectiveness of such applications is improved using engineered enzymes that to date have been constructed using amino acid substitutions. We describe ΔFlucs: consecutive single amino acid deletion mutants within six loop structures of the bright and thermostable ×11 Fluc. Deletion mutations are a promising avenue to explore new sequence and functional space and isolate novel mutant phenotypes. However, this method is often overlooked and to date there have been no surveys of the effects of consecutive single amino acid deletions in Fluc. We constructed a large semi-rational ΔFluc library and isolated significantly brighter enzymes after finding ×11 Fluc activity was largely tolerant to deletions. Targeting an "omega-loop" motif (T352-G360) significantly enhanced activity, altered kinetics, reduced Km for D-luciferin, altered emission colors, and altered substrate specificity for redshifted analog DL-infraluciferin. Experimental and in silico analyses suggested remodeling of the Ω-loop impacts on active site hydrophobicity to increase light yields. This work demonstrates the further potential of deletion mutations, which can generate useful Fluc mutants and broaden the palette of the biomedical and biotechnological bioluminescence enzyme toolbox.
Collapse
Affiliation(s)
| | - Amit P Jathoul
- School of Biosciences, University of Cardiff, Cardiff, UK
| | - Jack P Bate
- School of Biosciences, University of Cardiff, Cardiff, UK
| | | | | | - D Dafydd Jones
- School of Biosciences, University of Cardiff, Cardiff, UK
| | | |
Collapse
|
31
|
Lin M, Whitmire S, Chen J, Farrel A, Shi X, Guo JT. Effects of short indels on protein structure and function in human genomes. Sci Rep 2017; 7:9313. [PMID: 28839204 PMCID: PMC5570956 DOI: 10.1038/s41598-017-09287-x] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Accepted: 07/24/2017] [Indexed: 01/20/2023] Open
Abstract
Insertions and deletions (indels) represent the second most common type of genetic variations in human genomes. Indels can be deleterious and contribute to disease susceptibility as recent genome sequencing projects revealed a large number of indels in various cancer types. In this study, we investigated the possible effects of small coding indels on protein structure and function, and the baseline characteristics of indels in 2504 individuals of 26 populations from the 1000 Genomes Project. We found that each population has a distinct pattern in genes with small indels. Frameshift (FS) indels are enriched in olfactory receptor activity while non-frameshift (NFS) indels are enriched in transcription-related proteins. Structural analysis of NFS indels revealed that they predominantly adopt coil or disordered conformations, especially in proteins with transcription-related NFS indels. These results suggest that the annotated coding indels from the 1000 Genomes Project, while contributing to genetic variations and phenotypic diversity, generally do not affect the core protein structures and have no deleterious effect on essential biological processes. In addition, we found that a number of reference genome annotations might need to be updated due to the high prevalence of annotated homozygous indels in the general population.
Collapse
Affiliation(s)
- Maoxuan Lin
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Sarah Whitmire
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Jing Chen
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Alvin Farrel
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Xinghua Shi
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Jun-Tao Guo
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA.
| |
Collapse
|
32
|
Viability and genetic stability of potato spindle tuber viroid mutants with indels in specific loops of the rod-like secondary structure. Virus Res 2017; 240:94-100. [PMID: 28778395 DOI: 10.1016/j.virusres.2017.07.024] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Revised: 07/27/2017] [Accepted: 07/31/2017] [Indexed: 01/17/2023]
Abstract
Maintenance of the rod-like structure of potato spindle tuber viroid (PSTVd), which contains over 20 loops and bulges between double-stranded helices, is important for viroid biology. To study tolerance to modifications of the stem-loop structures and PSTVd capacity for mutation repair, we have created 6 mutants carrying 3-4 nucleotides deletions or insertions at three unique restriction sites, EagI, StyI and AvaII. Differences in the infectivity of these in vitro generated PSTVd mutants can result from where the mutations map, as well as from the extent to which the secondary structure of the molecule is affected. Deletion or insertion of 4 nucleotides at the EagI and StyI sites led to loss of infectivity. However, mutants with deletion (PSTVd-Ava-del) or insertion (PSTVd-Ava-in) of 3 nucleotides (221GAC223), at the AvaII site (loop 20) were viable but not genetically stable. In all analyzed plants, reversion to the wild type PSTVd-S23 sequence was observed for the PSTVd-Ava-in mutant a few weeks after agroinfiltration. Analysis of PSTVd-Ava-del progeny allowed the identification of 10 new sequence variants carrying various modifications, some of them having retained the original three nucleotide deletion at the AvaII site. Interestingly, other variants gained three nucleotides in the deletion site but did not revert to the original wild type sequence. The genetic stability of the progeny PSTVd-Ava-del sequence variants was evaluated in tomato leaves (early infection) and in both leaves and roots (late infection), respectively.
Collapse
|
33
|
Jackson EL, Spielman SJ, Wilke CO. Computational prediction of the tolerance to amino-acid deletion in green-fluorescent protein. PLoS One 2017; 12:e0164905. [PMID: 28369116 PMCID: PMC5378326 DOI: 10.1371/journal.pone.0164905] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2016] [Accepted: 03/21/2017] [Indexed: 01/29/2023] Open
Abstract
Proteins evolve through two primary mechanisms: substitution, where mutations alter a protein's amino-acid sequence, and insertions and deletions (indels), where amino acids are either added to or removed from the sequence. Protein structure has been shown to influence the rate at which substitutions accumulate across sites in proteins, but whether structure similarly constrains the occurrence of indels has not been rigorously studied. Here, we investigate the extent to which structural properties known to covary with protein evolutionary rates might also predict protein tolerance to indels. Specifically, we analyze a publicly available dataset of single-amino-acid deletion mutations in enhanced green fluorescent protein (eGFP) to assess how well the functional effect of deletions can be predicted from protein structure. We find that weighted contact number (WCN), which measures how densely packed a residue is within the protein's three-dimensional structure, provides the best single predictor for whether eGFP will tolerate a given deletion. We additionally find that using protein design to explicitly model deletions results in improved predictions of functional status when combined with other structural predictors. Our work suggests that structure plays fundamental role in constraining deletions at sites in proteins, and further that similar biophysical constraints influence both substitutions and deletions. This study therefore provides a solid foundation for future work to examine how protein structure influences tolerance of more complex indel events, such as insertions or large deletions.
Collapse
Affiliation(s)
- Eleisha L. Jackson
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, United States of America
- Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, United States of America
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
| | - Stephanie J. Spielman
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania, United States of America
| | - Claus O. Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, United States of America
- Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, United States of America
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
| |
Collapse
|
34
|
Liu SS, Wei X, Ji Q, Xin X, Jiang B, Liu J. A facile and efficient transposon mutagenesis method for generation of multi-codon deletions in protein sequences. J Biotechnol 2016; 227:27-34. [DOI: 10.1016/j.jbiotec.2016.03.038] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2015] [Revised: 03/17/2016] [Accepted: 03/21/2016] [Indexed: 12/17/2022]
|
35
|
Young RS. Lineage-specific genomics: Frequent birth and death in the human genome: The human genome contains many lineage-specific elements created by both sequence and functional turnover. Bioessays 2016; 38:654-63. [PMID: 27231054 PMCID: PMC4949557 DOI: 10.1002/bies.201500192] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Frequent evolutionary birth and death events have created a large quantity of biologically important, lineage‐specific DNA within mammalian genomes. The birth and death of DNA sequences is so frequent that the total number of these insertions and deletions in the human population remains unknown, although there are differences between these groups, e.g. transposable elements contribute predominantly to sequence insertion. Functional turnover – where the activity of a locus is specific to one lineage, but the underlying DNA remains conserved – can also drive birth and death. However, this does not appear to be a major driver of divergent transcriptional regulation. Both sequence and functional turnover have contributed to the birth and death of thousands of functional promoters in the human and mouse genomes. These findings reveal the pervasive nature of evolutionary birth and death and suggest that lineage‐specific regions may play an important but previously underappreciated role in human biology and disease.
Collapse
Affiliation(s)
- Robert S Young
- MRC Human Genetics Unit, MRC IGMM, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
36
|
Banaganapalli B, Mohammed K, Khan IA, Al-Aama JY, Elango R, Shaik NA. A Computational Protein Phenotype Prediction Approach to Analyze the Deleterious Mutations of Human MED12 Gene. J Cell Biochem 2016; 117:2023-35. [DOI: 10.1002/jcb.25499] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2015] [Accepted: 01/25/2016] [Indexed: 01/01/2023]
Affiliation(s)
- Babajan Banaganapalli
- Princess Al-Jawhara Al-Brahim Center of Excellence in Research of Hereditary Disorders; King Abdulaziz University; Jeddah, Kingdom of Saudi Arabia
- Department of Genetic Medicine; Faculty of Medicine; King Abdulaziz University; Jeddah, Kingdom of Saudi Arabia
| | - Kaleemuddin Mohammed
- Department of Biochemistry; Faculty of Science; King Abdulaziz University; Jeddah, Kingdom of Saudi Arabia
| | - Imran Ali Khan
- Department of Clinical Laboratory Sciences; College of Applied Medical Sciences; King saud University; Riyadh, Kingdom of Saudi Arabia
| | - Jumana Y. Al-Aama
- Princess Al-Jawhara Al-Brahim Center of Excellence in Research of Hereditary Disorders; King Abdulaziz University; Jeddah, Kingdom of Saudi Arabia
- Department of Genetic Medicine; Faculty of Medicine; King Abdulaziz University; Jeddah, Kingdom of Saudi Arabia
| | - Ramu Elango
- Princess Al-Jawhara Al-Brahim Center of Excellence in Research of Hereditary Disorders; King Abdulaziz University; Jeddah, Kingdom of Saudi Arabia
- Department of Genetic Medicine; Faculty of Medicine; King Abdulaziz University; Jeddah, Kingdom of Saudi Arabia
| | - Noor Ahmad Shaik
- Department of Genetic Medicine; Faculty of Medicine; King Abdulaziz University; Jeddah, Kingdom of Saudi Arabia
| |
Collapse
|
37
|
Tuğrul M, Paixão T, Barton NH, Tkačik G. Dynamics of Transcription Factor Binding Site Evolution. PLoS Genet 2015; 11:e1005639. [PMID: 26545200 PMCID: PMC4636380 DOI: 10.1371/journal.pgen.1005639] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2015] [Accepted: 10/09/2015] [Indexed: 11/19/2022] Open
Abstract
Evolution of gene regulation is crucial for our understanding of the phenotypic differences between species, populations and individuals. Sequence-specific binding of transcription factors to the regulatory regions on the DNA is a key regulatory mechanism that determines gene expression and hence heritable phenotypic variation. We use a biophysical model for directional selection on gene expression to estimate the rates of gain and loss of transcription factor binding sites (TFBS) in finite populations under both point and insertion/deletion mutations. Our results show that these rates are typically slow for a single TFBS in an isolated DNA region, unless the selection is extremely strong. These rates decrease drastically with increasing TFBS length or increasingly specific protein-DNA interactions, making the evolution of sites longer than ∼ 10 bp unlikely on typical eukaryotic speciation timescales. Similarly, evolution converges to the stationary distribution of binding sequences very slowly, making the equilibrium assumption questionable. The availability of longer regulatory sequences in which multiple binding sites can evolve simultaneously, the presence of “pre-sites” or partially decayed old sites in the initial sequence, and biophysical cooperativity between transcription factors, can all facilitate gain of TFBS and reconcile theoretical calculations with timescales inferred from comparative genomics. Evolution has produced a remarkable diversity of living forms that manifests in qualitative differences as well as quantitative traits. An essential factor that underlies this variability is transcription factor binding sites, short pieces of DNA that control gene expression levels. Nevertheless, we lack a thorough theoretical understanding of the evolutionary times required for the appearance and disappearance of these sites. By combining a biophysically realistic model for how cells read out information in transcription factor binding sites with model for DNA sequence evolution, we explore these timescales and ask what factors crucially affect them. We find that the emergence of binding sites from a random sequence is generically slow under point and insertion/deletion mutational mechanisms. Strong selection, sufficient genomic sequence in which the sites can evolve, the existence of partially decayed old binding sites in the sequence, as well as certain biophysical mechanisms such as cooperativity, can accelerate the binding site gain times and make them consistent with the timescales suggested by comparative analyses of genomic data.
Collapse
Affiliation(s)
- Murat Tuğrul
- Institute of Science and Technology Austria, Klosterneuburg, Austria
- * E-mail:
| | - Tiago Paixão
- Institute of Science and Technology Austria, Klosterneuburg, Austria
| | | | - Gašper Tkačik
- Institute of Science and Technology Austria, Klosterneuburg, Austria
| |
Collapse
|
38
|
Surkont J, Diekmann Y, Ryder PV, Pereira-Leal JB. Coiled-coil length: Size does matter. Proteins 2015; 83:2162-9. [PMID: 26387794 DOI: 10.1002/prot.24932] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2015] [Revised: 08/23/2015] [Accepted: 09/14/2015] [Indexed: 11/09/2022]
Abstract
Protein evolution is governed by processes that alter primary sequence but also the length of proteins. Protein length may change in different ways, but insertions, deletions and duplications are the most common. An optimal protein size is a trade-off between sequence extension, which may change protein stability or lead to acquisition of a new function, and shrinkage that decreases metabolic cost of protein synthesis. Despite the general tendency for length conservation across orthologous proteins, the propensity to accept insertions and deletions is heterogeneous along the sequence. For example, protein regions rich in repetitive peptide motifs are well known to extensively vary their length across species. Here, we analyze length conservation of coiled-coils, domains formed by an ubiquitous, repetitive peptide motif present in all domains of life, that frequently plays a structural role in the cell. We observed that, despite the repetitive nature, the length of coiled-coil domains is generally highly conserved throughout the tree of life, even when the remaining parts of the protein change, including globular domains. Length conservation is independent of primary amino acid sequence variation, and represents a conservation of domain physical size. This suggests that the conservation of domain size is due to functional constraints.
Collapse
Affiliation(s)
| | - Yoan Diekmann
- Instituto Gulbenkian de Ciência, Oeiras, 2780-156, Portugal.,Physiology Course, Marine Biological Laboratory, Woods Hole, Massachusetts, 02543
| | - Pearl V Ryder
- Physiology Course, Marine Biological Laboratory, Woods Hole, Massachusetts, 02543.,Emory University School of Medicine, Atlanta, Georgia, 30322
| | - Jose B Pereira-Leal
- Instituto Gulbenkian de Ciência, Oeiras, 2780-156, Portugal.,Physiology Course, Marine Biological Laboratory, Woods Hole, Massachusetts, 02543
| |
Collapse
|
39
|
Young RS, Hayashizaki Y, Andersson R, Sandelin A, Kawaji H, Itoh M, Lassmann T, Carninci P, Bickmore WA, Forrest AR, Taylor MS. The frequent evolutionary birth and death of functional promoters in mouse and human. Genome Res 2015; 25:1546-57. [PMID: 26228054 PMCID: PMC4579340 DOI: 10.1101/gr.190546.115] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2015] [Accepted: 07/28/2015] [Indexed: 12/04/2022]
Abstract
Promoters are central to the regulation of gene expression. Changes in gene regulation are thought to underlie much of the adaptive diversification between species and phenotypic variation within populations. In contrast to earlier work emphasizing the importance of enhancer evolution and subtle sequence changes at promoters, we show that dramatic changes such as the complete gain and loss (collectively, turnover) of functional promoters are common. Using quantitative measures of transcription initiation in both humans and mice across 52 matched tissues, we discriminate promoter sequence gains from losses and resolve the lineage of changes. We also identify expression divergence and functional turnover between orthologous promoters, finding only the latter is associated with local sequence changes. Promoter turnover has occurred at the majority (>56%) of protein-coding genes since humans and mice diverged. Tissue-restricted promoters are the most evolutionarily volatile where retrotransposition is an important, but not the sole, source of innovation. There is considerable heterogeneity of turnover rates between promoters in different tissues, but the consistency of these in both lineages suggests that the same biological systems are similarly inclined to transcriptional rewiring. The genes affected by promoter turnover show evidence of adaptive evolution. In mice, promoters are primarily lost through deletion of the promoter containing sequence, whereas in humans, many promoters appear to be gradually decaying with weak transcriptional output and relaxed selective constraint. Our results suggest that promoter gain and loss is an important process in the evolutionary rewiring of gene regulation and may be a significant source of phenotypic diversification.
Collapse
Affiliation(s)
- Robert S Young
- MRC Human Genetics Unit, MRC Institute for Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, United Kingdom
| | - Yoshihide Hayashizaki
- RIKEN Preventive Medicine and Diagnosis Innovation Program, Wako, Saitama, 351-0198, Japan
| | - Robin Andersson
- Department of Biology and Biotech Research and Innovation Centre, Copenhagen University, 2200 Copenhagen N, Denmark
| | - Albin Sandelin
- Department of Biology and Biotech Research and Innovation Centre, Copenhagen University, 2200 Copenhagen N, Denmark
| | - Hideya Kawaji
- RIKEN Preventive Medicine and Diagnosis Innovation Program, Wako, Saitama, 351-0198, Japan; RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Tsurumi-ku, Yokohama, 230-0045, Japan
| | - Masayoshi Itoh
- RIKEN Preventive Medicine and Diagnosis Innovation Program, Wako, Saitama, 351-0198, Japan; RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Tsurumi-ku, Yokohama, 230-0045, Japan
| | - Timo Lassmann
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Tsurumi-ku, Yokohama, 230-0045, Japan
| | - Piero Carninci
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Tsurumi-ku, Yokohama, 230-0045, Japan
| | | | - Wendy A Bickmore
- MRC Human Genetics Unit, MRC Institute for Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, United Kingdom
| | - Alistair R Forrest
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Tsurumi-ku, Yokohama, 230-0045, Japan; Systems Biology and Genomics, Harry Perkins Institute of Medical Research, QEII Medical Centre, Nedlands, Western Australia 6009, Australia
| | - Martin S Taylor
- MRC Human Genetics Unit, MRC Institute for Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, United Kingdom
| |
Collapse
|
40
|
Prakash A, Bateman A. Domain atrophy creates rare cases of functional partial protein domains. Genome Biol 2015; 16:88. [PMID: 25924720 PMCID: PMC4432964 DOI: 10.1186/s13059-015-0655-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Accepted: 04/15/2015] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Protein domains display a range of structural diversity, with numerous additions and deletions of secondary structural elements between related domains. We have observed a small number of cases of surprising large-scale deletions of core elements of structural domains. We propose a new concept called domain atrophy, where protein domains lose a significant number of core structural elements. RESULTS Here, we implement a new pipeline to systematically identify new cases of domain atrophy across all known protein sequences. The output of this pipeline was carefully checked by hand, which filtered out partial domain instances that were unlikely to represent true domain atrophy due to misannotations or un-annotated sequence fragments. We identify 75 cases of domain atrophy, of which eight cases are found in a three-dimensional protein structure and 67 cases have been inferred based on mapping to a known homologous structure. Domains with structural variations include ancient folds such as the TIM-barrel and Rossmann folds. Most of these domains are observed to show structural loss that does not affect their functional sites. CONCLUSION Our analysis has significantly increased the known cases of domain atrophy. We discuss specific instances of domain atrophy and see that there has often been a compensatory mechanism that helps to maintain the stability of the partial domain. Our study indicates that although domain atrophy is an extremely rare phenomenon, protein domains under certain circumstances can tolerate extreme mutations giving rise to partial, but functional, domains.
Collapse
Affiliation(s)
- Ananth Prakash
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.
| |
Collapse
|
41
|
Kwan R, Looi KS, Omary MB. Absence of keratins 8 and 18 in rodent epithelial cell lines associates with keratin gene mutation and DNA methylation: Cell line selective effects on cell invasion. Exp Cell Res 2015; 335:12-22. [PMID: 25882495 DOI: 10.1016/j.yexcr.2015.04.003] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2014] [Revised: 03/02/2015] [Accepted: 04/05/2015] [Indexed: 02/07/2023]
Abstract
Epithelial-mesenchymal transition (EMT) in carcinoma is associated with dramatic up-regulation of vimentin and down-regulation of the simple-type keratins 8 and 18 (K8/K18), but the mechanisms of these changes are poorly understood. We demonstrate that two commonly-studied murine (CT26) and rat (IEC-6) intestinal cell lines have negligible K8/K18 but high vimentin protein expression. Proteasome inhibition led to a limited increase in K18 but not K8 stabilization, thereby indicating that K8/K18 absence is not due, in large part, to increased protein turnover. CT26 and IEC-6 cells had <10% of normal K8/K18 mRNA and exhibited decreased mRNA stability, with K8 mRNA levels being higher in IEC-6 versus CT26 and K18 being higher in CT26 versus IEC-6 cells. Keratin gene sequencing showed that KRT8 in CT26 cells had a 21-nucleotide deletion while K18 in IEC-6 cells had a 9-amino acid in-frame insertion. Furthermore, the KRT8 promoter in CT26 and the KRT18 promoter in IEC-6 are hypermethylated. Inhibition of DNA methylation using 5-azacytidine increased K8 or K18 in some but all the tested rodent epithelial cell lines. Restoring K8 and K18 by lentiviral transduction reduced CT26 but not IEC-6 cell matrigel invasion. K8/K18 re-introduction also decreased E-cadherin expression in IEC-6 but not CT26 cells, suggesting that the effect of keratin expression on epithelial to mesenchymal transition is cell-line dependent. Therefore, some commonly utilized rodent epithelial cell lines, unexpectedly, manifest barely detectable keratin expression but have high levels of vimentin. In the CT26 and IEC-6 intestinal cell lines, keratin expression correlates with keratin gene insertion or deletion and with promoter methylation, which likely suppress keratin transcription and mRNA or protein stability.
Collapse
Affiliation(s)
- Raymond Kwan
- Department of Molecular & Integrative Physiology, University of Michigan Medical School, 7744 Medical Science Building II, 1301 E. Catherine, Ann Arbor, MI 48109
| | - Kok Sun Looi
- Department of Molecular & Integrative Physiology, University of Michigan Medical School, 7744 Medical Science Building II, 1301 E. Catherine, Ann Arbor, MI 48109
| | - M Bishr Omary
- Department of Molecular & Integrative Physiology, University of Michigan Medical School, 7744 Medical Science Building II, 1301 E. Catherine, Ann Arbor, MI 48109.,Ann Arbor Health System VA Medical Center
| |
Collapse
|
42
|
Peloso PL, Frost DR, Richards SJ, Rodrigues MT, Donnellan S, Matsui M, Raxworthy CJ, Biju S, Lemmon EM, Lemmon AR, Wheeler WC. The impact of anchored phylogenomics and taxon sampling on phylogenetic inference in narrow‐mouthed frogs (Anura, Microhylidae). Cladistics 2015; 32:113-140. [DOI: 10.1111/cla.12118] [Citation(s) in RCA: 73] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/04/2015] [Indexed: 02/02/2023] Open
Affiliation(s)
- Pedro L.V. Peloso
- Division of Vertebrate Zoology (Herpetology) American Museum of Natural History Central Park West at 79th Street New York NY 10024 USA
- Richard Gilder Graduate School American Museum of Natural History Central Park West at 79th Street New York NY 10024 USA
| | - Darrel R. Frost
- Division of Vertebrate Zoology (Herpetology) American Museum of Natural History Central Park West at 79th Street New York NY 10024 USA
| | - Stephen J. Richards
- Herpetology Department South Australian Museum North Terrace Adelaide SA 5000 Australia
| | - Miguel T. Rodrigues
- Departamento de Zoologia Instituto de Biociências Universidade de São Paulo, Rua do Matão Trav. 14, n 321, Cidade Universitária, Caixa Postal 11461 CEP 05422‐970 São Paulo São Paulo Brazil
| | - Stephen Donnellan
- Centre for Evolutionary Biology and Biodiversity The University of Adelaide Adelaide SA 5005 Australia
| | - Masafumi Matsui
- Graduate School of Human and Environmental Studies Kyoto University Sakyo‐ku Kyoto 606‐8501 Japan
| | - Cristopher J. Raxworthy
- Division of Vertebrate Zoology (Herpetology) American Museum of Natural History Central Park West at 79th Street New York NY 10024 USA
| | - S.D. Biju
- Systematics Lab Department of Environmental Studies University of Delhi Delhi 110 007 India
| | | | - Alan R. Lemmon
- Department of Scientific Computing Florida State University Dirac Science Library Tallahassee FL 32306‐4120 USA
| | - Ward C. Wheeler
- Division of Invertebrate Zoology American Museum of Natural History Central Park West at 79th Street New York NY 10024 USA
| |
Collapse
|
43
|
Challis D, Antunes L, Garrison E, Banks E, Evani US, Muzny D, Poplin R, Gibbs RA, Marth G, Yu F. The distribution and mutagenesis of short coding INDELs from 1,128 whole exomes. BMC Genomics 2015; 16:143. [PMID: 25765891 PMCID: PMC4352271 DOI: 10.1186/s12864-015-1333-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2014] [Accepted: 02/09/2015] [Indexed: 12/30/2022] Open
Abstract
Background Identifying insertion/deletion polymorphisms (INDELs) with high confidence has been intrinsically challenging in short-read sequencing data. Here we report our approach for improving INDEL calling accuracy by using a machine learning algorithm to combine call sets generated with three independent methods, and by leveraging the strengths of each individual pipeline. Utilizing this approach, we generated a consensus exome INDEL call set from a large dataset generated by the 1000 Genomes Project (1000G), maximizing both the sensitivity and the specificity of the calls. Results This consensus exome INDEL call set features 7,210 INDELs, from 1,128 individuals across 13 populations included in the 1000 Genomes Phase 1 dataset, with a false discovery rate (FDR) of about 7.0%. Conclusions In our study we further characterize the patterns and distributions of these exonic INDELs with respect to density, allele length, and site frequency spectrum, as well as the potential mutagenic mechanisms of coding INDELs in humans. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1333-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Danny Challis
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA. .,Present address: Monsanto Company, Ankeny, IA, 50021, USA.
| | - Lilian Antunes
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA. .,Present address: Washington University School of Medicine, Saint Louis, MO, 63110, USA.
| | - Erik Garrison
- Department of Biology, Boston College, Wellcome Trust Sanger Institute, Chestnut Hill, MA, 02467, USA.
| | - Eric Banks
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA.
| | - Uday S Evani
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA. .,Present address: New York Genome Center, New York, NY, 10013, USA.
| | - Donna Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Ryan Poplin
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA.
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Gabor Marth
- Department of Biology, Boston College, Wellcome Trust Sanger Institute, Chestnut Hill, MA, 02467, USA. .,Present address: Department of Human Genetics and Utah Center for Genetic Discovery, University of Utah School of Medicine, Salt Lake City, UT, 84112, USA.
| | - Fuli Yu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA. .,Institute of Neurology, Tianjin Medical University General Hospital, Tianjin, 300052, China.
| |
Collapse
|
44
|
Ashkenazy H, Cohen O, Pupko T, Huchon D. Indel reliability in indel-based phylogenetic inference. Genome Biol Evol 2014; 6:3199-209. [PMID: 25409663 PMCID: PMC4986452 DOI: 10.1093/gbe/evu252] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
It is often assumed that it is unlikely that the same insertion or deletion (indel) event occurred at the same position in two independent evolutionary lineages, and thus, indel-based inference of phylogeny should be less subject to homoplasy compared with standard inference which is based on substitution events. Indeed, indels were successfully used to solve debated evolutionary relationships among various taxonomical groups. However, indels are never directly observed but rather inferred from the alignment and thus indel-based inference may be sensitive to alignment errors. It is hypothesized that phylogenetic reconstruction would be more accurate if it relied only on a subset of reliable indels instead of the entire indel data. Here, we developed a method to quantify the reliability of indel characters by measuring how often they appear in a set of alternative multiple sequence alignments. Our approach is based on the assumption that indels that are consistently present in most alternative alignments are more reliable compared with indels that appear only in a small subset of these alignments. Using simulated and empirical data, we studied the impact of filtering and weighting indels by their reliability scores on the accuracy of indel-based phylogenetic reconstruction. The new method is available as a web-server at http://guidance.tau.ac.il/RELINDEL/.
Collapse
Affiliation(s)
- Haim Ashkenazy
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv, Israel
| | - Ofir Cohen
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv, Israel Present address: Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Tal Pupko
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv, Israel
| | - Dorothée Huchon
- Department of Zoology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv, Israel
| |
Collapse
|
45
|
Arpino JAJ, Rizkallah PJ, Jones DD. Structural and dynamic changes associated with beneficial engineered single-amino-acid deletion mutations in enhanced green fluorescent protein. ACTA CRYSTALLOGRAPHICA. SECTION D, BIOLOGICAL CRYSTALLOGRAPHY 2014; 70:2152-62. [PMID: 25084334 PMCID: PMC4118826 DOI: 10.1107/s139900471401267x] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/02/2014] [Accepted: 05/31/2014] [Indexed: 01/23/2023]
Abstract
Single-amino-acid deletions are a common part of the natural evolutionary landscape but are rarely sampled during protein engineering owing to limited and prejudiced molecular understanding of mutations that shorten the protein backbone. Single-amino-acid deletion variants of enhanced green fluorescent protein (EGFP) have been identified by directed evolution with the beneficial effect of imparting increased cellular fluorescence. Biophysical characterization revealed that increased functional protein production and not changes to the fluorescence parameters was the mechanism that was likely to be responsible. The structure EGFP(D190Δ) containing a deletion within a loop revealed propagated changes only after the deleted residue. The structure of EGFP(A227Δ) revealed that a `flipping' mechanism was used to adjust for residue deletion at the end of a β-strand, with amino acids C-terminal to the deletion site repositioning to take the place of the deleted amino acid. In both variants new networks of short-range and long-range interactions are generated while maintaining the integrity of the hydrophobic core. Both deletion variants also displayed significant local and long-range changes in dynamics, as evident by changes in B factors compared with EGFP. Rather than being detrimental, deletion mutations can introduce beneficial structural effects through altering core protein properties, folding and dynamics, as well as function.
Collapse
Affiliation(s)
- James A. J. Arpino
- School of Biosciences, Cardiff University, Park Place, Cardiff CF10 3AT, Wales
| | | | - D. Dafydd Jones
- School of Biosciences, Cardiff University, Park Place, Cardiff CF10 3AT, Wales
| |
Collapse
|
46
|
Arpino JAJ, Reddington SC, Halliwell LM, Rizkallah PJ, Jones DD. Random single amino acid deletion sampling unveils structural tolerance and the benefits of helical registry shift on GFP folding and structure. Structure 2014; 22:889-98. [PMID: 24856363 PMCID: PMC4058518 DOI: 10.1016/j.str.2014.03.014] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2013] [Revised: 03/08/2014] [Accepted: 03/10/2014] [Indexed: 12/01/2022]
Abstract
Altering a protein’s backbone through amino acid deletion is a common evolutionary mutational mechanism, but is generally ignored during protein engineering primarily because its effect on the folding-structure-function relationship is difficult to predict. Using directed evolution, enhanced green fluorescent protein (EGFP) was observed to tolerate residue deletion across the breadth of the protein, particularly within short and long loops, helical elements, and at the termini of strands. A variant with G4 removed from a helix (EGFPG4Δ) conferred significantly higher cellular fluorescence. Folding analysis revealed that EGFPG4Δ retained more structure upon unfolding and refolded with almost 100% efficiency but at the expense of thermodynamic stability. The EGFPG4Δ structure revealed that G4 deletion caused a beneficial helical registry shift resulting in a new polar interaction network, which potentially stabilizes a cis proline peptide bond and links secondary structure elements. Thus, deletion mutations and registry shifts can enhance proteins through structural rearrangements not possible by substitution mutations alone. Using directed evolution, the impact of amino acid deletion on EGFP is explored Loops, helices, and strand termini are especially tolerant to amino acid deletion A deletion mutant that enhances cellular production and fluorescence is identified Structure reveals that a helical registry shift creates a new polar network
Collapse
Affiliation(s)
- James A J Arpino
- School of Biosciences, Main Building, Park Place, Cardiff University, Cardiff CF10 3AT, UK
| | - Samuel C Reddington
- School of Biosciences, Main Building, Park Place, Cardiff University, Cardiff CF10 3AT, UK
| | - Lisa M Halliwell
- School of Biosciences, Main Building, Park Place, Cardiff University, Cardiff CF10 3AT, UK
| | - Pierre J Rizkallah
- School of Medicine, Cardiff University, WHRI, Main Building, Heath Park, Cardiff CF14 4XN, UK
| | - D Dafydd Jones
- School of Biosciences, Main Building, Park Place, Cardiff University, Cardiff CF10 3AT, UK.
| |
Collapse
|
47
|
Jones DD, Arpino JAJ, Baldwin AJ, Edmundson MC. Transposon-based approaches for generating novel molecular diversity during directed evolution. Methods Mol Biol 2014; 1179:159-172. [PMID: 25055777 DOI: 10.1007/978-1-4939-1053-3_11] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
This chapter introduces a set of transposon-based methods that were developed to sample trinucleotide deletion, trinucleotide replacement, and domain insertion. Each approach has a common initial step that utilizes an engineered version of the Mu transposon called MuDel. The inherent low sequence specificity of MuDel results in its random insertion into target DNA during in vitro transposition. Removal of the transposon using a type IIS restriction endonuclease generates blunt-end random breaks at a frequency of one per target gene and the concomitant loss of 3 bp. Self-ligation or insertion of another DNA cassette results in the sampling of trinucleotide deletion or trinucleotide substitution/domain insertion, respectively.
Collapse
Affiliation(s)
- D Dafydd Jones
- School of Biosciences, Cardiff University, Museum Avenue, Cardiff, CF10 3AT, UK,
| | | | | | | |
Collapse
|
48
|
Huang S, Li J, Xu A, Huang G, You L. Small Insertions Are More Deleterious than Small Deletions in Human Genomes. Hum Mutat 2013; 34:1642-9. [DOI: 10.1002/humu.22435] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2013] [Accepted: 08/22/2013] [Indexed: 11/09/2022]
Affiliation(s)
- Shengfeng Huang
- State Key Laboratory of Biocontrol; Guangdong Key Laboratory of Pharmaceutical Functional Genes; College of Life Sciences, Sun Yat-sen University; Guangzhou 510275 People's Republic of China
| | - Jie Li
- State Key Laboratory of Biocontrol; Guangdong Key Laboratory of Pharmaceutical Functional Genes; College of Life Sciences, Sun Yat-sen University; Guangzhou 510275 People's Republic of China
| | - Anlong Xu
- State Key Laboratory of Biocontrol; Guangdong Key Laboratory of Pharmaceutical Functional Genes; College of Life Sciences, Sun Yat-sen University; Guangzhou 510275 People's Republic of China
- Beijing University of Chinese Medicine, Chao-yang District; Beijing 100029 People's Republic of China
| | - Guangrui Huang
- State Key Laboratory of Biocontrol; Guangdong Key Laboratory of Pharmaceutical Functional Genes; College of Life Sciences, Sun Yat-sen University; Guangzhou 510275 People's Republic of China
| | - Leiming You
- State Key Laboratory of Biocontrol; Guangdong Key Laboratory of Pharmaceutical Functional Genes; College of Life Sciences, Sun Yat-sen University; Guangzhou 510275 People's Republic of China
| |
Collapse
|
49
|
SIFT Indel: predictions for the functional effects of amino acid insertions/deletions in proteins. PLoS One 2013; 8:e77940. [PMID: 24194902 PMCID: PMC3806772 DOI: 10.1371/journal.pone.0077940] [Citation(s) in RCA: 103] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Accepted: 09/05/2013] [Indexed: 12/02/2022] Open
Abstract
Indels in the coding regions of a gene can either cause frameshifts or amino acid insertions/deletions. Frameshifting indels are indels that have a length that is not divisible by 3 and subsequently cause frameshifts. Indels that have a length divisible by 3 cause amino acid insertions/deletions or block substitutions; we call these 3n indels. The new amino acid changes resulting from 3n indels could potentially affect protein function. Therefore, we construct a SIFT Indel prediction algorithm for 3n indels which achieves 82% accuracy, 81% sensitivity, 82% specificity, 82% precision, 0.63 MCC, and 0.87 AUC by 10-fold cross-validation. We have previously published a prediction algorithm for frameshifting indels. The rules for the prediction of 3n indels are different from the rules for the prediction of frameshifting indels and reflect the biological differences of these two different types of variations. SIFT Indel was applied to human 3n indels from the 1000 Genomes Project and the Exome Sequencing Project. We found that common variants are less likely to be deleterious than rare variants. The SIFT indel prediction algorithm for 3n indels is available at http://sift-dna.org/
Collapse
|
50
|
Kvikstad EM, Duret L. Strong heterogeneity in mutation rate causes misleading hallmarks of natural selection on indel mutations in the human genome. Mol Biol Evol 2013; 31:23-36. [PMID: 24113537 PMCID: PMC3879449 DOI: 10.1093/molbev/mst185] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Elucidating the mechanisms of mutation accumulation and fixation is critical to understand the nature of genetic variation and its contribution to genome evolution. Of particular interest is the effect of insertions and deletions (indels) on the evolution of genome landscapes. Recent population-scaled sequencing efforts provide unprecedented data for analyzing the relative impact of selection versus nonadaptive forces operating on indels. Here, we combined McDonald-Kreitman tests with the analysis of derived allele frequency spectra to investigate the dynamics of allele fixation of short (1-50 bp) indels in the human genome. Our analyses revealed apparently higher fixation probabilities for insertions than deletions. However, this fixation bias is not consistent with either selection or biased gene conversion and varies with local mutation rate, being particularly pronounced at indel hotspots. Furthermore, we identified an unprecedented number of loci with evidence for multiple indel events in the primate phylogeny. Even in nonrepetitive sequence contexts (a priori not prone to indel mutations), such loci are 60-fold more frequent than expected according to a model of uniform indel mutation rate. This provides evidence of as yet unidentified cryptic indel hotspots. We propose that indel homoplasy, at known and cryptic hotspots, produces systematic errors in determination of ancestral alleles via parsimony and advise caution interpreting classic selection tests given the strong heterogeneity in indel rates across the genome. These results will have great impact on studies seeking to infer evolutionary forces operating on indels observed in closely related species, because such mutations are traditionally presumed homoplasy-free.
Collapse
Affiliation(s)
- Erika M Kvikstad
- Laboratoire de Biométrie et Biologie Evolutive, UMR 5558, CNRS, Université Lyon 1, Villeurbanne, France
| | | |
Collapse
|