1
|
Yang Y, Braga MV, Dean MD. Insertion-Deletion Events Are Depleted in Protein Regions with Predicted Secondary Structure. Genome Biol Evol 2024; 16:evae093. [PMID: 38735759 PMCID: PMC11102076 DOI: 10.1093/gbe/evae093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 04/16/2024] [Accepted: 04/21/2024] [Indexed: 05/14/2024] Open
Abstract
A fundamental goal in evolutionary biology and population genetics is to understand how selection shapes the fate of new mutations. Here, we test the null hypothesis that insertion-deletion (indel) events in protein-coding regions occur randomly with respect to secondary structures. We identified indels across 11,444 sequence alignments in mouse, rat, human, chimp, and dog genomes and then quantified their overlap with four different types of secondary structure-alpha helices, beta strands, protein bends, and protein turns-predicted by deep-learning methods of AlphaFold2. Indels overlapped secondary structures 54% as much as expected and were especially underrepresented over beta strands, which tend to form internal, stable regions of proteins. In contrast, indels were enriched by 155% over regions without any predicted secondary structures. These skews were stronger in the rodent lineages compared to the primate lineages, consistent with population genetic theory predicting that natural selection will be more efficient in species with larger effective population sizes. Nonsynonymous substitutions were also less common in regions of protein secondary structure, although not as strongly reduced as in indels. In a complementary analysis of thousands of human genomes, we showed that indels overlapping secondary structure segregated at significantly lower frequency than indels outside of secondary structure. Taken together, our study shows that indels are selected against if they overlap secondary structure, presumably because they disrupt the tertiary structure and function of a protein.
Collapse
Affiliation(s)
- Yi Yang
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Matthew V Braga
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Matthew D Dean
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
2
|
Li J, Jiang L, Wu CI, Lu X, Fang S, Ting CT. Small Segmental Duplications in Drosophila-High Rate of Emergence and Elimination. Genome Biol Evol 2019; 11:486-496. [PMID: 30689862 PMCID: PMC6380325 DOI: 10.1093/gbe/evz011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/19/2019] [Indexed: 12/12/2022] Open
Abstract
Segmental duplications are an important class of mutations. Because a large proportion of segmental duplications may often be strongly deleterious, high frequency or fixed segmental duplications may represent only a tiny fraction of the mutational input. To understand the emergence and elimination of segmental duplications, we survey polymorphic duplications, including tandem and interspersed duplications, in natural populations of Drosophila by haploid embryo genomes. As haploid embryos are not expected to be heterozygous, the genome, sites of heterozygosity (referred to as pseudoheterozygous sites [PHS]), may likely represent recent duplications that have acquired new mutations. Among the 29 genomes of Drosophila melanogaster, we identify 2,282 polymorphic PHS duplications (linked PHS regions) in total or 154 PHS duplications per genome. Most PHS duplications are small (83.4% < 500 bp), Drosophila melanogaster lineage specific, and strain specific (72.6% singletons). The excess of the observed singleton PHS duplications deviates significantly from the neutral expectation, suggesting that most PHS duplications are strongly deleterious. In addition, these small segmental duplications are not evenly distributed in genomic regions and less common in noncoding functional element regions. The underrepresentation in RNA polymerase II binding sites and regions with active histone modifications is correlated with ages of duplications. In conclusion, small segmental duplications occur frequently in Drosophila but rapidly eliminated by natural selection.
Collapse
Affiliation(s)
- Juan Li
- Key Laboratory of Genomics and Precision Medicine, Beijing Institute of Genomics, Beijing; CAS Center for Excellence in Animal Evolution and Genetics, Kunming Institute of Zoology, Kunming, Chinese Academy of Sciences, China.,University of Chinese Academy of Sciences, Beijing, China.,Institute of Ecology and Evolutionary Biology, National Taiwan University, Taipei, Taiwan
| | - Lan Jiang
- Key Laboratory of Genomics and Precision Medicine, Beijing Institute of Genomics, Beijing; CAS Center for Excellence in Animal Evolution and Genetics, Kunming Institute of Zoology, Kunming, Chinese Academy of Sciences, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Chung-I Wu
- Key Laboratory of Genomics and Precision Medicine, Beijing Institute of Genomics, Beijing; CAS Center for Excellence in Animal Evolution and Genetics, Kunming Institute of Zoology, Kunming, Chinese Academy of Sciences, China.,Department of Ecology and Evolution, University of Chicago.,School of Life Science, Sun Yat-Sen University, Guangzhou, China
| | - Xuemei Lu
- Key Laboratory of Genomics and Precision Medicine, Beijing Institute of Genomics, Beijing; CAS Center for Excellence in Animal Evolution and Genetics, Kunming Institute of Zoology, Kunming, Chinese Academy of Sciences, China
| | - Shu Fang
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
| | - Chau-Ti Ting
- Institute of Ecology and Evolutionary Biology, National Taiwan University, Taipei, Taiwan.,Department of Life Science, Center for Biotechnology, Center for Developmental Biology and Regenerative Medicine, National Taiwan University.,Genome and Systems Biology Degree Program, National Taiwan University and Academia Sinica, Taipei, Taiwan
| |
Collapse
|
3
|
Yuan X, Zhang J, Yang L. IntSIM: An Integrated Simulator of Next-Generation Sequencing Data. IEEE Trans Biomed Eng 2016; 64:441-451. [PMID: 27164567 DOI: 10.1109/tbme.2016.2560939] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
OBJECTIVE Next-generation sequencing data has been widely used for DNA variant discovery and tumor study through computational tools. Effective simulation of such data with many realistic features is very necessary for testing existing tools and guiding the development of new tools. METHODS We present an integrated simulation system, IntSIM, to simulate common DNA variants and to generate sequencing reads for mixture genomes. IntSIM has three novel features in comparison with other simulation programs: 1) it is able to simulate both germline and somatic variants in the same sequence, 2) it deals with tumor purity so as to generate reads corresponding to heterogeneous genomes and also produce tumor-normal matched samples, and 3) it simulates correlations among SNPs, among CNVs/CNAs based on HMM models trained from real sequencing genomes, and can simulates broad and focal CNV/CNA events. RESULTS The simulation data of IntSIM can reflect characteristics observed from real data and are consistent with input parameters. The IntSIM software package is freely available at http://intsim.sourceforge.net/. CONCLUSION Based on a great number of experiments, IntSIM performs better than other program for some scenarios, such as simulation of heterozygous SNPs, CNVs/CNAs, and can achieve some functions that other programs cannot achieve. SIGNIFICANCE Simulation with IntSIM can be expected to evaluate performance of methods in detecting various types of variants, analyzing tumor samples, and especially providing a realistic assessment of effect of tumor purity on identification of somatic mutations.
Collapse
|
4
|
de Souza Freitas MT, Ríos-Velasquez CM, da Silva LG, Costa CRL, Marcelino A, Leal-Balbino TC, Balbino VDQ, Pessoa FAC. Analysis of the genetic structure of allopatric populations of Lutzomyia umbratilis using the period clock gene. Acta Trop 2016; 154:149-54. [PMID: 26655040 DOI: 10.1016/j.actatropica.2015.11.014] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2015] [Revised: 11/23/2015] [Accepted: 11/27/2015] [Indexed: 11/19/2022]
Abstract
In South America, Lutzomyia umbratilis is the main vector of Leishmania guyanensis, one of the species involved in the transmission of American tegumentary leishmaniasis. In Brazil, L. umbratilis has been recorded in the Amazon region, and an isolated population has been identified in the state of Pernambuco, Northeastern region. This study assessed the phylogeographic structure of three allopatric Brazilian populations of L. umbratilis. Samples of L. umbratilis were collected from Rio Preto da Eva (north of the Amazon River, Amazonas), from Manacapuru (south of the Amazon River), and from the isolated population in Recife, Pernambuco state. These samples were processed to obtain sequences of the period gene. Phylogenetic analysis revealed the presence of two distinct monophyletic clades: one clade comprised of the Recife and Rio Preto da Eva samples, and one clade comprised of the Manacapuru samples. Comparing the Manacapuru population with the Recife and Rio Preto da Eva populations revealed high indices of interpopulational divergence. Phylogenetic analysis indicated that geographical distance and environmental differences have not modified the ancestral relationship shared by the Recife and Rio Preto da Eva populations. Genetic similarities suggest that, in evolutionary terms, these populations are more closely related to each other than to the Manacapuru population. These results confirm the existence of an L. umbratilis species complex composed of at least two incipient species.
Collapse
Affiliation(s)
- Moises Thiago de Souza Freitas
- Departament of Genetic, Federal University of Pernambuco, Avenida Professor Moraes Rego S/N, Cidade Universitária, 50732-970 Recife, Pernambuco, Brazil
| | - Claudia Maria Ríos-Velasquez
- Laboratory of Infectious Disease Ecology in the Amazon, Instituto Leônidas e Maria Deane-Fiocruz Amazônia, Rua Terezina, 476, Adrianópolis, 69.057-070 Manaus, Amazonas, Brazil
| | - Lidiane Gomes da Silva
- Departament of Genetic, Federal University of Pernambuco, Avenida Professor Moraes Rego S/N, Cidade Universitária, 50732-970 Recife, Pernambuco, Brazil
| | - César Raimundo Lima Costa
- Departament of Genetic, Federal University of Pernambuco, Avenida Professor Moraes Rego S/N, Cidade Universitária, 50732-970 Recife, Pernambuco, Brazil
| | - Abigail Marcelino
- Departament of Genetic, Federal University of Pernambuco, Avenida Professor Moraes Rego S/N, Cidade Universitária, 50732-970 Recife, Pernambuco, Brazil
| | - Tereza Cristina Leal-Balbino
- Departament of Microbiology, Research Center Aggeu Magalhaes, Avenida Professor Moraes Rego S/N, Cidade Universitária, 50732-970 Recife, Pernambuco, Brazil
| | - Valdir de Queiroz Balbino
- Departament of Genetic, Federal University of Pernambuco, Avenida Professor Moraes Rego S/N, Cidade Universitária, 50732-970 Recife, Pernambuco, Brazil.
| | - Felipe Arley Costa Pessoa
- Laboratory of Infectious Disease Ecology in the Amazon, Instituto Leônidas e Maria Deane-Fiocruz Amazônia, Rua Terezina, 476, Adrianópolis, 69.057-070 Manaus, Amazonas, Brazil.
| |
Collapse
|
5
|
Khan T, Douglas GM, Patel P, Nguyen Ba AN, Moses AM. Polymorphism Analysis Reveals Reduced Negative Selection and Elevated Rate of Insertions and Deletions in Intrinsically Disordered Protein Regions. Genome Biol Evol 2015; 7:1815-26. [PMID: 26047845 PMCID: PMC4494057 DOI: 10.1093/gbe/evv105] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Intrinsically disordered protein regions are abundant in eukaryotic proteins and lack stable tertiary structures and enzymatic functions. Previous studies of disordered region evolution based on interspecific alignments have revealed an increased propensity for indels and rapid rates of amino acid substitution. How disordered regions are maintained at high abundance in the proteome and across taxa, despite apparently weak evolutionary constraints, remains unclear. Here, we use single nucleotide and indel polymorphism data in yeast and human populations to survey the population variation within disordered regions. First, we show that single nucleotide polymorphisms in disordered regions are under weaker negative selection compared with more structured protein regions and have a higher proportion of neutral non-synonymous sites. We also confirm previous findings that nonframeshifting indels are much more abundant in disordered regions relative to structured regions. We find that the rate of nonframeshifting indel polymorphism in intrinsically disordered regions resembles that of noncoding DNA and pseudogenes, and that large indels segregate in disordered regions in the human population. Our survey of polymorphism confirms patterns of evolution in disordered regions inferred based on longer evolutionary comparisons.
Collapse
Affiliation(s)
- Tahsin Khan
- Department of Cell & Systems Biology, University of Toronto, Ontario, Canada
| | - Gavin M Douglas
- Department of Ecology & Evolutionary Biology, University of Toronto, Ontario, Canada
| | - Priyenbhai Patel
- Department of Cell & Systems Biology, University of Toronto, Ontario, Canada
| | - Alex N Nguyen Ba
- Department of Cell & Systems Biology, University of Toronto, Ontario, Canada
| | - Alan M Moses
- Department of Cell & Systems Biology, University of Toronto, Ontario, Canada Department of Ecology & Evolutionary Biology, University of Toronto, Ontario, Canada Centre for the Analysis of Genome Evolution and Function, University of Toronto, Ontario, Canada
| |
Collapse
|
6
|
Park L. Ancestral alleles in the human genome based on population sequencing data. PLoS One 2015; 10:e0128186. [PMID: 26020928 PMCID: PMC4447449 DOI: 10.1371/journal.pone.0128186] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2015] [Accepted: 04/23/2015] [Indexed: 12/03/2022] Open
Abstract
Ancestral allele information is useful for genetics studies. Previously, the identification of ancestral alleles was primarily based on sequence alignments between species. Alternative ways to identify ancestral alleles were proposed in this study based on population sequencing data. The methods described here utilized the diversity between haplotypes harboring ancestral and newly emerged alleles. Simulations showed that these methods were reliable for identifying ancestral alleles when the variants had not aged too greatly. Application to the human genome sequencing data suggested the role of indels in maintaining the GC content in the human genome. The deletion-to-insertion ratios and GC proportions were correlated depending on the sizes of insertions and deletions in the direction of increasing GC content. There were GC-biased fixations in single base-pair insertions and AT-biased fixations in single base-pair deletions in the results based on the proposed methods. In the current study, GC-biased gene conversions in nucleotide substitutions were very slight or insignificant. In the variants of several quantitative trait loci (QTLs), slight GC-biased gene conversion was observed in nucleotide substitutions. For the QTL indels, insertions were observed more often than deletions, and deletion-biased fixation was observed, providing new insights into the evolution of functional genes.
Collapse
Affiliation(s)
- Leeyoung Park
- Natural Science Research Institute, Yonsei University, Seoul, Korea
| |
Collapse
|