1
|
Vaglietti S, Boggio Bozzo S, Ghirardi M, Fiumara F. Divergent evolution of low-complexity regions in the vertebrate CPEB protein family. FRONTIERS IN BIOINFORMATICS 2025; 5:1491735. [PMID: 40182702 PMCID: PMC11965684 DOI: 10.3389/fbinf.2025.1491735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Accepted: 01/28/2025] [Indexed: 04/05/2025] Open
Abstract
The cytoplasmic polyadenylation element-binding proteins (CPEBs) are a family of translational regulators involved in multiple biological processes, including memory-related synaptic plasticity. In vertebrates, four paralogous genes (CPEB1-4) encode proteins with phylogenetically conserved C-terminal RNA-binding domains and variable N-terminal regions (NTRs). The CPEB NTRs are characterized by low-complexity regions (LCRs), including homopolymeric amino acid repeats (AARs), and have been identified as mediators of liquid-liquid phase separation (LLPS) and prion-like aggregation. After their appearance following gene duplication, the four paralogous CPEB proteins functionally diverged in terms of activation mechanisms and modes of mRNA binding. The paralog-specific NTRs may have contributed substantially to such functional diversification but their evolutionary history remains largely unexplored. Here, we traced the evolution of vertebrate CPEBs and their LCRs/AARs focusing on primary sequence composition, complexity, repetitiveness, and their possible functional impact on LLPS propensity and prion-likeness. We initially defined these composition- and function-related quantitative parameters for the four human CPEB paralogs and then systematically analyzed their evolutionary variation across more than 500 species belonging to nine major clades of different stem age, from Chondrichthyes to Euarchontoglires, along the vertebrate lineage. We found that the four CPEB proteins display highly divergent, paralog-specific evolutionary trends in composition- and function-related parameters, primarily driven by variation in their LCRs/AARs and largely related to clade stem ages. These findings shed new light on the molecular and functional evolution of LCRs in the CPEB protein family, in both quantitative and qualitative terms, highlighting the emergence of CPEB2 as a proline-rich prion-like protein in younger vertebrate clades, including Primates.
Collapse
Affiliation(s)
| | | | | | - Ferdinando Fiumara
- “Rita Levi-Montalcini” Department of Neuroscience, University of Turin, Turin, Italy
| |
Collapse
|
2
|
Honorato-Mauer J, Shah NN, Maihofer AX, Zai CC, Belangero S, Nievergelt CM, Psychiatric Genomics Consortium for PTSD Ancestry Working Group, Santoro M, Atkinson EG. Characterizing features affecting local ancestry inference performance in admixed populations. Am J Hum Genet 2025; 112:224-234. [PMID: 39753130 PMCID: PMC11866949 DOI: 10.1016/j.ajhg.2024.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2024] [Revised: 12/06/2024] [Accepted: 12/06/2024] [Indexed: 02/09/2025] Open
Abstract
In recent years, significant efforts have been made to improve methods for genomic studies of admixed populations using local ancestry inference (LAI). Accurate LAI is crucial to ensure that downstream analyses accurately reflect the genetic ancestry of research participants. Here, we test analytic strategies for LAI to provide guidelines for optimal accuracy, focusing on admixed populations reflective of Latin America's primary continental ancestries-African (AFR), Amerindigenous (AMR), and European (EUR). Simulating linkage-disequilibrium-informed admixed haplotypes under a variety of 2- and 3-way admixture models, we implemented a standard LAI pipeline, testing the impact of reference panel composition, DNA data type, demography, and software parameters to quantify ancestry-specific LAI accuracy. We observe that across all models, AMR tracts have notably reduced LAI accuracy as compared to EUR and AFR tracts, with true positive rate means for AMR ranging from 88% to 94%, EUR from 96% to 99%, and AFR from 98% to 99%. When LAI miscalls occurred, they most frequently erroneously called EUR ancestry in true AMR sites. Concerning reference panel curation, we find that using a reference panel well matched to the target population, even with a smaller sample size, was accurate and the most computationally efficient. Imputation did not harm LAI performance in our tests; rather, we observed that higher variant density improved accuracy. While directly responsive to admixed Latin American cohort compositions, these trends are broadly useful for informing best practices for LAI across admixed populations. Our findings reinforce the need for the inclusion of more underrepresented populations in sequencing efforts to improve reference panels.
Collapse
Affiliation(s)
- Jessica Honorato-Mauer
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Nirav N Shah
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Adam X Maihofer
- Department of Psychiatry, School of Medicine, University of California at San Diego, La Jolla, CA 92093, USA
| | - Clement C Zai
- Department of Psychiatry, Institute of Medical Science, Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, USA
| | - Sintia Belangero
- Department of Morphology and Genetics, Universidade Federal de São Paulo, São Paulo 04023-062, Brazil
| | - Caroline M Nievergelt
- Department of Psychiatry, School of Medicine, University of California at San Diego, La Jolla, CA 92093, USA
| | | | - Marcos Santoro
- Department of Biochemistry, Molecular Biology Division, Universidade Federal de São Paulo, São Paulo 04023-062, Brazil.
| | - Elizabeth G Atkinson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; The Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX 77030, USA.
| |
Collapse
|
3
|
Honorato-Mauer J, Shah NN, Maihofer AX, Zai CC, Belangero S, Nievergelt CM, Santoro M, Atkinson E. Characterizing features affecting local ancestry inference performance in admixed populations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.26.609770. [PMID: 39253486 PMCID: PMC11383044 DOI: 10.1101/2024.08.26.609770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
In recent years, significant efforts have been made to improve methods for genomic studies of admixed populations using Local Ancestry Inference (LAI). Accurate LAI is crucial to ensure downstream analyses reflect the genetic ancestry of research participants accurately. Here, we test analytic strategies for LAI to provide guidelines for optimal accuracy, focusing on admixed populations reflective of Latin America's primary continental ancestries - African (AFR), Amerindigenous (AMR), and European (EUR). Simulating LD-informed admixed haplotypes under a variety of 2 and 3-way admixture models, we implemented a standard LAI pipeline, testing three reference panel compositions to quantify their overall and ancestry-specific accuracy. We examined LAI miscall frequencies and true positive rates (TPR) across simulation models and continental ancestries. AMR tracts have notably reduced LAI accuracy as compared to EUR and AFR tracts in all comparisons, with TPR means for AMR ranging from 88-94%, EUR from 96-99% and AFR 98-99%. When LAI miscalls occurred, they most frequently erroneously called European ancestry in true Amerindigenous sites. Using a reference panel well-matched to the target population, even with a lower sample size, LAI produced true-positive estimates that were not statistically different from a high sample size but mismatched reference, while being more computationally efficient. While directly responsive to admixed Latin American cohort compositions, these trends are broadly useful for informing best practices for LAI across other admixed populations. Our findings reinforce the need for inclusion of more underrepresented populations in sequencing efforts to improve reference panels.
Collapse
Affiliation(s)
- Jessica Honorato-Mauer
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Nirav N Shah
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Adam X Maihofer
- Department of Psychiatry, School of Medicine, University of California at San Diego, La Jolla, CA 92093, USA
| | - Clement C Zai
- Department of Psychiatry, Institute of Medical Science, Laboratory Medicine and Pathobiology, University of Toronto
| | - Sintia Belangero
- Department of Morphology and Genetics, Universidade Federal de São Paulo, São Paulo, 04023-062, Brazil
| | - Caroline M Nievergelt
- Department of Psychiatry, School of Medicine, University of California at San Diego, La Jolla, CA 92093, USA
| | - Marcos Santoro
- Department of Biochemistry, Molecular Biology Division, Universidade Federal de São Paulo, São Paulo, 04023-062, Brazil
| | - Elizabeth Atkinson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
- The Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX 77030, USA
| |
Collapse
|
4
|
Teekas L, Sharma S, Vijay N. Terminal regions of a protein are a hotspot for low complexity regions and selection. Open Biol 2024; 14:230439. [PMID: 38862022 DOI: 10.1098/rsob.230439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 05/13/2024] [Indexed: 06/13/2024] Open
Abstract
Volatile low complexity regions (LCRs) are a novel source of adaptive variation, functional diversification and evolutionary novelty. An interplay of selection and mutation governs the composition and length of low complexity regions. High %GC and mutations provide length variability because of mechanisms like replication slippage. Owing to the complex dynamics between selection and mutation, we need a better understanding of their coexistence. Our findings underscore that positively selected sites (PSS) and low complexity regions prefer the terminal regions of genes, co-occurring in most Tetrapoda clades. We observed that positively selected sites within a gene have position-specific roles. Central-positively selected site genes primarily participate in defence responses, whereas terminal-positively selected site genes exhibit non-specific functions. Low complexity region-containing genes in the Tetrapoda clade exhibit a significantly higher %GC and lower ω (dN/dS: non-synonymous substitution rate/synonymous substitution rate) compared with genes without low complexity regions. This lower ω implies that despite providing rapid functional diversity, low complexity region-containing genes are subjected to intense purifying selection. Furthermore, we observe that low complexity regions consistently display ubiquitous prevalence at lower purity levels, but exhibit a preference for specific positions within a gene as the purity of the low complexity region stretch increases, implying a composition-dependent evolutionary role. Our findings collectively contribute to the understanding of how genetic diversity and adaptation are shaped by the interplay of selection and low complexity regions in the Tetrapoda clade.
Collapse
Affiliation(s)
- Lokdeep Teekas
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal , Bhauri, Madhya Pradesh, India
| | - Sandhya Sharma
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal , Bhauri, Madhya Pradesh, India
| | - Nagarjun Vijay
- Computational Evolutionary Genomics Lab, Department of Biological Sciences, IISER Bhopal , Bhauri, Madhya Pradesh, India
| |
Collapse
|
5
|
Dickson ZW, Golding GB. Evolution of Transcript Abundance is Influenced by Indels in Protein Low Complexity Regions. J Mol Evol 2024; 92:153-168. [PMID: 38485789 DOI: 10.1007/s00239-024-10158-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 01/24/2024] [Indexed: 04/02/2024]
Abstract
Protein Protein low complexity regions (LCRs) are compositionally biased amino acid sequences, many of which have significant evolutionary impacts on the proteins which contain them. They are mutationally unstable experiencing higher rates of indels and substitutions than higher complexity regions. LCRs also impact the expression of their proteins, likely through multiple effects along the path from gene transcription, through translation, and eventual protein degradation. It has been observed that proteins which contain LCRs are associated with elevated transcript abundance (TAb), despite having lower protein abundance. We have gathered and integrated human data to investigate the co-evolution of TAb and LCRs through ancestral reconstructions and model inference using an approximate Bayesian calculation based method. We observe that on short evolutionary timescales TAb evolution is significantly impacted by changes in LCR length, with insertions driving TAb down. But in contrast, the observed data is best explained by indel rates in LCRs which are unaffected by shifts in TAb. Our work demonstrates a coupling between LCR and TAb evolution, and the utility of incorporating multiple responses into evolutionary analyses.
Collapse
Affiliation(s)
| | - G Brian Golding
- Department of Biology, McMaster University, Hamilton, ON, Canada
| |
Collapse
|
6
|
Rich KD, Srivastava S, Muthye VR, Wasmuth JD. Identification of potential molecular mimicry in pathogen-host interactions. PeerJ 2023; 11:e16339. [PMID: 37953771 PMCID: PMC10637249 DOI: 10.7717/peerj.16339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 10/02/2023] [Indexed: 11/14/2023] Open
Abstract
Pathogens have evolved sophisticated strategies to manipulate host signaling pathways, including the phenomenon of molecular mimicry, where pathogen-derived biomolecules imitate host biomolecules. In this study, we resurrected, updated, and optimized a sequence-based bioinformatics pipeline to identify potential molecular mimicry candidates between humans and 32 pathogenic species whose proteomes' 3D structure predictions were available at the start of this study. We observed considerable variation in the number of mimicry candidates across pathogenic species, with pathogenic bacteria exhibiting fewer candidates compared to fungi and protozoans. Further analysis revealed that the candidate mimicry regions were enriched in solvent-accessible regions, highlighting their potential functional relevance. We identified a total of 1,878 mimicked regions in 1,439 human proteins, and clustering analysis indicated diverse target proteins across pathogen species. The human proteins containing mimicked regions revealed significant associations between these proteins and various biological processes, with an emphasis on host extracellular matrix organization and cytoskeletal processes. However, immune-related proteins were underrepresented as targets of mimicry. Our findings provide insights into the broad range of host-pathogen interactions mediated by molecular mimicry and highlight potential targets for further investigation. This comprehensive analysis contributes to our understanding of the complex mechanisms employed by pathogens to subvert host defenses and we provide a resource to assist researchers in the development of novel therapeutic strategies.
Collapse
Affiliation(s)
- Kaylee D. Rich
- Faculty of Veterinary Medicine, University of Calgary, Calgary, Alberta, Canada
- Host-Parasite Interactions Research Training Network, University of Calgary, Calgary, Alberta, Canada
| | - Shruti Srivastava
- Faculty of Veterinary Medicine, University of Calgary, Calgary, Alberta, Canada
- Host-Parasite Interactions Research Training Network, University of Calgary, Calgary, Alberta, Canada
| | - Viraj R. Muthye
- Faculty of Veterinary Medicine, University of Calgary, Calgary, Alberta, Canada
- Host-Parasite Interactions Research Training Network, University of Calgary, Calgary, Alberta, Canada
| | - James D. Wasmuth
- Faculty of Veterinary Medicine, University of Calgary, Calgary, Alberta, Canada
- Host-Parasite Interactions Research Training Network, University of Calgary, Calgary, Alberta, Canada
| |
Collapse
|
7
|
Singh AK, Amar I, Ramadasan H, Kappagantula KS, Chavali S. Proteins with amino acid repeats constitute a rapidly evolvable and human-specific essentialome. Cell Rep 2023; 42:112811. [PMID: 37453061 DOI: 10.1016/j.celrep.2023.112811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 05/30/2023] [Accepted: 06/29/2023] [Indexed: 07/18/2023] Open
Abstract
Protein products of essential genes, indispensable for organismal survival, are highly conserved and bring about fundamental functions. Interestingly, proteins that contain amino acid homorepeats that tend to evolve rapidly are enriched in eukaryotic essentialomes. Why are proteins with hypermutable homorepeats enriched in conserved and functionally vital essential proteins? We solve this functional versus evolutionary paradox by demonstrating that human essential proteins with homorepeats bring about crosstalk across biological processes through high interactability and have distinct regulatory functions affecting expansive global regulation. Importantly, essential proteins with homorepeats rapidly diverge with the amino acid substitutions frequently affecting functional sites, likely facilitating rapid adaptability. Strikingly, essential proteins with homorepeats influence human-specific embryonic and brain development, implying that the presence of homorepeats could contribute to the emergence of human-specific processes. Thus, we propose that homorepeat-containing essential proteins affecting species-specific traits can be potential intervention targets across pathologies, including cancers and neurological disorders.
Collapse
Affiliation(s)
- Anjali K Singh
- Department of Biology, Indian Institute of Science Education and Research (IISER) Tirupati, Tirupati 517507, Andhra Pradesh, India
| | - Ishita Amar
- Department of Biology, Indian Institute of Science Education and Research (IISER) Tirupati, Tirupati 517507, Andhra Pradesh, India
| | - Harikrishnan Ramadasan
- Department of Biology, Indian Institute of Science Education and Research (IISER) Tirupati, Tirupati 517507, Andhra Pradesh, India
| | - Keertana S Kappagantula
- Department of Biology, Indian Institute of Science Education and Research (IISER) Tirupati, Tirupati 517507, Andhra Pradesh, India
| | - Sreenivas Chavali
- Department of Biology, Indian Institute of Science Education and Research (IISER) Tirupati, Tirupati 517507, Andhra Pradesh, India.
| |
Collapse
|
8
|
Mier P, Andrade-Navarro MA. Regions with two amino acids in protein sequences: a step forward from homorepeats into the low complexity landscape. Comput Struct Biotechnol J 2022; 20:5516-5523. [PMID: 36249567 PMCID: PMC9550522 DOI: 10.1016/j.csbj.2022.09.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 09/07/2022] [Accepted: 09/07/2022] [Indexed: 11/17/2022] Open
Abstract
Low complexity regions (LCRs) differ in amino acid composition from the background provided by the corresponding proteomes. The simplest LCRs are homorepeats (or polyX), regions composed of mostly-one amino acid type. Extensive research has been done to characterize homorepeats, and their taxonomic, functional and structural features depend on the amino acid type and sequence context. From them, the next step towards the study of LCRs are the regions composed of two types of amino acids, which we call polyXY. We classify polyXY in three categories based on the arrangement of the two amino acid types ‘X’ and ‘Y’: direpeats (e.g. ‘XYXYXY’), joined (e.g. ‘XXXYYY’) and shuffled (e.g. ‘XYYXXY’). We developed a script to search for polyXY, and located them in a comprehensive set of 20,340 reference proteomes. These results are available in a dedicated web server called XYs, in which the user can also submit their own protein datasets to detect polyXY. We studied the distribution of polyXY types by amino acid pair XY and category, and show that polyXY in Eukaryota are mainly located within intrinsically disordered regions. Our study provides a first step towards the characterization of polyXY as protein motifs.
Collapse
Affiliation(s)
- Pablo Mier
- Corresponding author at: Hanns-Dieter-Hüsch-Weg 15 55118 Mainz (Germany).
| | | |
Collapse
|
9
|
Dickson ZW, Golding GB. Low complexity regions in mammalian proteins are associated with low protein abundance and high transcript abundance. Mol Biol Evol 2022; 39:6575407. [PMID: 35482425 PMCID: PMC9070799 DOI: 10.1093/molbev/msac087] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Low Complexity Regions (LCRs) are present in a surprisingly large number of eukaryotic proteins. These highly repetitive and compositionally biased sequences are often structurally disordered, bind promiscuously, and evolve rapidly. Frequently studied in terms of evolutionary dynamics, little is known about how LCRs affect the expression of the proteins which contain them. It would be expected that rapidly evolving LCRs are unlikely to be tolerated in strongly conserved, highly abundant proteins, leading to lower overall abundance in proteins which contain LCRs. To test this hypothesis and examine the associations of protein abundance and transcript abundance with the presence of LCRs, we have integrated high-throughput data from across mammals. We have found that LCRs are indeed associated with reduced protein abundance, but are also associated with elevated transcript abundance. These associations are qualitatively consistent across 12 human tissues and nine mammalian species. The differential impacts of LCRs on abundance at the protein and transcript level are not explained by differences in either protein degradation rates or the inefficiency of translation for LCR containing proteins. We suggest that rapidly evolving LCRs are a source of selective pressure on the regulatory mechanisms which maintain steady-state protein abundance levels.
Collapse
Affiliation(s)
- Zachery W Dickson
- Department of Biology, McMaster University, Hamilton, Ontario, Canada
| | - G Brian Golding
- Department of Biology, McMaster University, Hamilton, Ontario, Canada
| |
Collapse
|
10
|
Homopeptide and homocodon levels across fungi are coupled to GC/AT-bias and intrinsic disorder, with unique behaviours for some amino acids. Sci Rep 2021; 11:10025. [PMID: 33976321 PMCID: PMC8113271 DOI: 10.1038/s41598-021-89650-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 04/22/2021] [Indexed: 11/09/2022] Open
Abstract
Homopeptides (runs of one amino-acid type) are evolutionarily important since they are prone to expand/contract during DNA replication, recombination and repair. To gain insight into the genomic/proteomic traits driving their variation, we analyzed how homopeptides and homocodons (which are pure codon repeats) vary across 405 Dikarya, and probed their linkage to genome GC/AT bias and other factors. We find that amino-acid homopeptide frequencies vary diversely between clades, with the AT-rich Saccharomycotina trending distinctly. As organisms evolve, homocodon and homopeptide numbers are majorly coupled to GC/AT-bias, exhibiting a bi-furcated correlation with degree of AT- or GC-bias. Mid-GC/AT genomes tend to have markedly fewer simply because they are mid-GC/AT. Despite these trends, homopeptides tend to be GC-biased relative to other parts of coding sequences, even in AT-rich organisms, indicating they absorb AT bias less or are inherently more GC-rich. The most frequent and most variable homopeptide amino acids favour intrinsic disorder, and there are an opposing correlation and anti-correlation versus homopeptide levels for intrinsic disorder and structured-domain content respectively. Specific homopeptides show unique behaviours that we suggest are linked to inherent slippage probabilities during DNA replication and recombination, such as poly-glutamine, which is an evolutionarily very variable homopeptide with a codon repertoire unbiased for GC/AT, and poly-lysine whose homocodons are overwhelmingly made from the codon AAG.
Collapse
|
11
|
Gruca A, Ziemska-Legiecka J, Jarnot P, Sarnowska E, Sarnowski TJ, Grynberg M. Common low complexity regions for SARS-CoV-2 and human proteomes as potential multidirectional risk factor in vaccine development. BMC Bioinformatics 2021; 22:182. [PMID: 33832440 PMCID: PMC8027979 DOI: 10.1186/s12859-021-04017-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 02/01/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND The rapid spread of the COVID-19 demands immediate response from the scientific communities. Appropriate countermeasures mean thoughtful and educated choice of viral targets (epitopes). There are several articles that discuss such choices in the SARS-CoV-2 proteome, other focus on phylogenetic traits and history of the Coronaviridae genome/proteome. However none consider viral protein low complexity regions (LCRs). Recently we created the first methods that are able to compare such fragments. RESULTS We show that five low complexity regions (LCRs) in three proteins (nsp3, S and N) encoded by the SARS-CoV-2 genome are highly similar to regions from human proteome. As many as 21 predicted T-cell epitopes and 27 predicted B-cell epitopes overlap with the five SARS-CoV-2 LCRs similar to human proteins. Interestingly, replication proteins encoded in the central part of viral RNA are devoid of LCRs. CONCLUSIONS Similarity of SARS-CoV-2 LCRs to human proteins may have implications on the ability of the virus to counteract immune defenses. The vaccine targeted LCRs may potentially be ineffective or alternatively lead to autoimmune diseases development. These findings are crucial to the process of selection of new epitopes for drugs or vaccines which should omit such regions.
Collapse
Affiliation(s)
- Aleksandra Gruca
- Department of Computer Networks and Systems, Silesian University of Technology, Gliwice, Poland
| | | | - Patryk Jarnot
- Department of Computer Networks and Systems, Silesian University of Technology, Gliwice, Poland
| | - Elzbieta Sarnowska
- Department of Molecular and Translational Oncology, Maria Sklodowska-Curie National Research Institute of Oncology, Warsaw, Poland
| | - Tomasz J Sarnowski
- Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland
| | - Marcin Grynberg
- Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland.
| |
Collapse
|
12
|
The Conservation of Low Complexity Regions in Bacterial Proteins Depends on the Pathogenicity of the Strain and Subcellular Location of the Protein. Genes (Basel) 2021; 12:genes12030451. [PMID: 33809982 PMCID: PMC8004648 DOI: 10.3390/genes12030451] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 03/10/2021] [Accepted: 03/17/2021] [Indexed: 12/23/2022] Open
Abstract
Low complexity regions (LCRs) in proteins are characterized by amino acid frequencies that differ from the average. These regions evolve faster and tend to be less conserved between homologs than globular domains. They are not common in bacteria, as compared to their prevalence in eukaryotes. Studying their conservation could help provide hypotheses about their function. To obtain the appropriate evolutionary focus for this rapidly evolving feature, here we study the conservation of LCRs in bacterial strains and compare their high variability to the closeness of the strains. For this, we selected 20 taxonomically diverse bacterial species and obtained the completely sequenced proteomes of two strains per species. We calculated all orthologous pairs for each of the 20 strain pairs. Per orthologous pair, we computed the conservation of two types of LCRs: compositionally biased regions (CBRs) and homorepeats (polyX). Our results show that, in bacteria, Q-rich CBRs are the most conserved, while A-rich CBRs and polyA are the most variable. LCRs have generally higher conservation when comparing pathogenic strains. However, this result depends on protein subcellular location: LCRs accumulate in extracellular and outer membrane proteins, with conservation increased in the extracellular proteins of pathogens, and decreased for polyX in the outer membrane proteins of pathogens. We conclude that these dependencies support the functional importance of LCRs in host–pathogen interactions.
Collapse
|
13
|
Adelson RP, Renton AE, Li W, Barzilai N, Atzmon G, Goate AM, Davies P, Freudenberg-Hua Y. Empirical design of a variant quality control pipeline for whole genome sequencing data using replicate discordance. Sci Rep 2019; 9:16156. [PMID: 31695094 PMCID: PMC6834861 DOI: 10.1038/s41598-019-52614-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Accepted: 10/18/2019] [Indexed: 12/29/2022] Open
Abstract
The success of next-generation sequencing depends on the accuracy of variant calls. Few objective protocols exist for QC following variant calling from whole genome sequencing (WGS) data. After applying QC filtering based on Genome Analysis Tool Kit (GATK) best practices, we used genotype discordance of eight samples that were sequenced twice each to evaluate the proportion of potentially inaccurate variant calls. We designed a QC pipeline involving hard filters to improve replicate genotype concordance, which indicates improved accuracy of genotype calls. Our pipeline analyzes the efficacy of each filtering step. We initially applied this strategy to well-characterized variants from the ClinVar database, and subsequently to the full WGS dataset. The genome-wide biallelic pipeline removed 82.11% of discordant and 14.89% of concordant genotypes, and improved the concordance rate from 98.53% to 99.69%. The variant-level read depth filter most improved the genome-wide biallelic concordance rate. We also adapted this pipeline for triallelic sites, given the increasing proportion of multiallelic sites as sample sizes increase. For triallelic sites containing only SNVs, the concordance rate improved from 97.68% to 99.80%. Our QC pipeline removes many potentially false positive calls that pass in GATK, and may inform future WGS studies prior to variant effect analysis.
Collapse
Affiliation(s)
- Robert P Adelson
- Litwin-Zucker Center for Alzheimer's Disease, The Feinstein Institute for Medical Research, Northwell Health, Manhasset, New York, 11030, USA
| | - Alan E Renton
- Ronald M. Loeb Center for Alzheimer's Disease and Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
| | - Wentian Li
- Robert S. Boas Center for Genomics & Human Genetics, The Feinstein Institute for Medical Research, Northwell Health, Manhasset, New York, 11030, USA
| | - Nir Barzilai
- Robert S. Boas Center for Genomics & Human Genetics, The Feinstein Institute for Medical Research, Northwell Health, Manhasset, New York, 11030, USA
| | - Gil Atzmon
- Institute for Aging Research, Albert Einstein College of Medicine, Bronx, New York, 10461, USA
- Faculty of Natural Sciences, University of Haifa, Haifa, 31905, Israel
| | - Alison M Goate
- Ronald M. Loeb Center for Alzheimer's Disease and Departments of Neuroscience, Genetics and Genomic Sciences, and Neurology, Icahn School of Medicine at Mount Sinai, New York, New York, 10029, USA
| | - Peter Davies
- Litwin-Zucker Center for Alzheimer's Disease, The Feinstein Institute for Medical Research, Northwell Health, Manhasset, New York, 11030, USA
| | - Yun Freudenberg-Hua
- Litwin-Zucker Center for Alzheimer's Disease, The Feinstein Institute for Medical Research, Northwell Health, Manhasset, New York, 11030, USA.
- Division of Geriatric Psychiatry, Zucker Hillside Hospital, Northwell Health, Glen Oaks, New York, 11004, USA.
| |
Collapse
|
14
|
Redmond SN, MacInnis BM, Bopp S, Bei AK, Ndiaye D, Hartl DL, Wirth DF, Volkman SK, Neafsey DE. De Novo Mutations Resolve Disease Transmission Pathways in Clonal Malaria. Mol Biol Evol 2019; 35:1678-1689. [PMID: 29722884 PMCID: PMC5995194 DOI: 10.1093/molbev/msy059] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Detecting de novo mutations in viral and bacterial pathogens enables researchers to reconstruct detailed networks of disease transmission and is a key technique in genomic epidemiology. However, these techniques have not yet been applied to the malaria parasite, Plasmodium falciparum, in which a larger genome, slower generation times, and a complex life cycle make them difficult to implement. Here, we demonstrate the viability of de novo mutation studies in P. falciparum for the first time. Using a combination of sequencing, library preparation, and genotyping methods that have been optimized for accuracy in low-complexity genomic regions, we have detected de novo mutations that distinguish nominally identical parasites from clonal lineages. Despite its slower evolutionary rate compared with bacterial or viral species, de novo mutation can be detected in P. falciparum across timescales of just 1–2 years and evolutionary rates in low-complexity regions of the genome can be up to twice that detected in the rest of the genome. The increased mutation rate allows the identification of separate clade expansions that cannot be found using previous genomic epidemiology approaches and could be a crucial tool for mapping residual transmission patterns in disease elimination campaigns and reintroduction scenarios.
Collapse
Affiliation(s)
- Seth N Redmond
- Broad Institute of MIT and Harvard, Cambridge, MA.,Harvard T.H. Chan School of Public Health, Boston, MA
| | - Bronwyn M MacInnis
- Broad Institute of MIT and Harvard, Cambridge, MA.,Harvard T.H. Chan School of Public Health, Boston, MA
| | - Selina Bopp
- Broad Institute of MIT and Harvard, Cambridge, MA.,Harvard T.H. Chan School of Public Health, Boston, MA
| | - Amy K Bei
- Harvard T.H. Chan School of Public Health, Boston, MA.,Department of Parasitology, Faculty of Medicine and Pharmacy, Cheikh Anta Diop University, Dakar, Senegal
| | - Daouda Ndiaye
- Harvard T.H. Chan School of Public Health, Boston, MA.,Department of Parasitology, Faculty of Medicine and Pharmacy, Cheikh Anta Diop University, Dakar, Senegal
| | - Daniel L Hartl
- Broad Institute of MIT and Harvard, Cambridge, MA.,Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA
| | - Dyann F Wirth
- Broad Institute of MIT and Harvard, Cambridge, MA.,Harvard T.H. Chan School of Public Health, Boston, MA
| | - Sarah K Volkman
- Broad Institute of MIT and Harvard, Cambridge, MA.,Harvard T.H. Chan School of Public Health, Boston, MA.,Department of Nursing, School of Nursing and Health Sciences, Simmons College, Boston, MA, 02115
| | - Daniel E Neafsey
- Broad Institute of MIT and Harvard, Cambridge, MA.,Harvard T.H. Chan School of Public Health, Boston, MA
| |
Collapse
|
15
|
Owati A, Agindotan B, Burrows M. First microsatellite markers developed and applied for the genetic diversity study and population structure of Didymella pisi associated with ascochyta blight of dry pea in Montana. Fungal Biol 2019; 123:384-392. [PMID: 31053327 DOI: 10.1016/j.funbio.2019.02.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Revised: 01/30/2019] [Accepted: 02/14/2019] [Indexed: 11/17/2022]
Abstract
Didymella pisi is the predominant causal pathogen of ascochyta blight of dry pea causing yield losses in Montana, where 415 000 acres were planted to dry pea in 2018. Thirty-three microsatellite markers were developed for dry pea pathogenic fungus, Didymella pisi, these markers were used to analyze genetic diversity and population structure of 205 isolates from four different geographical regions of Montana. These loci produced a total of 216 alleles with an average of 1.63 alleles per microsatellite marker. The polymorphic information content values ranged from 0.020 to 0.990 with an average of 0.323. The average observed heterozygosity across all loci varied from 0.000 to 0.018. The gene diversity among the loci ranged from 0.003 to 0.461. Unweighted Neighbor-joining and population structure analysis grouped these 205 isolates into two major sub-groups. The clusters did not match the geographic origin of the isolates. Analysis of molecular variance showed 85 % of the total variation within populations and only 15 % among populations. There was moderate genetic variation in the total populations (PhiPT = 0.153). Information obtained from this study could be useful as a base to design strategies for improved management such as breeding for resistance to ascochyta blight of dry pea in Montana.
Collapse
Affiliation(s)
- Ayodeji Owati
- Department of Plant Sciences and Plant Pathology, Montana State University, Bozeman, MT, 59717, USA
| | - Bright Agindotan
- Department of Plant Sciences and Plant Pathology, Montana State University, Bozeman, MT, 59717, USA
| | - Mary Burrows
- Department of Plant Sciences and Plant Pathology, Montana State University, Bozeman, MT, 59717, USA.
| |
Collapse
|
16
|
Constraints and consequences of the emergence of amino acid repeats in eukaryotic proteins. Nat Struct Mol Biol 2017; 24:765-777. [PMID: 28805808 DOI: 10.1038/nsmb.3441] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Accepted: 06/23/2017] [Indexed: 12/21/2022]
Abstract
Proteins with amino acid homorepeats have the potential to be detrimental to cells and are often associated with human diseases. Why, then, are homorepeats prevalent in eukaryotic proteomes? In yeast, homorepeats are enriched in proteins that are essential and pleiotropic and that buffer environmental insults. The presence of homorepeats increases the functional versatility of proteins by mediating protein interactions and facilitating spatial organization in a repeat-dependent manner. During evolution, homorepeats are preferentially retained in proteins with stringent proteostasis, which might minimize repeat-associated detrimental effects such as unregulated phase separation and protein aggregation. Their presence facilitates rapid protein divergence through accumulation of amino acid substitutions, which often affect linear motifs and post-translational-modification sites. These substitutions may result in rewiring protein interaction and signaling networks. Thus, homorepeats are distinct modules that are often retained in stringently regulated proteins. Their presence facilitates rapid exploration of the genotype-phenotype landscape of a population, thereby contributing to adaptation and fitness.
Collapse
|
17
|
Ponte I, Romero D, Yero D, Suau P, Roque A. Complex Evolutionary History of the Mammalian Histone H1.1-H1.5 Gene Family. Mol Biol Evol 2017; 34:545-558. [PMID: 28100789 PMCID: PMC5400378 DOI: 10.1093/molbev/msw241] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
H1 is involved in chromatin higher-order structure and gene regulation. H1 has a tripartite structure. The central domain is stably folded in solution, while the N- and C-terminal domains are intrinsically disordered. The terminal domains are encoded by DNA of low sequence complexity, and are thus prone to short insertions/deletions (indels). We have examined the evolution of the H1.1-H1.5 gene family from 27 mammalian species. Multiple sequence alignment has revealed a strong preferential conservation of the number and position of basic residues among paralogs, suggesting that overall H1 basicity is under a strong purifying selection. The presence of a conserved pattern of indels, ancestral to the splitting of mammalian orders, in the N- and C-terminal domains of the paralogs, suggests that slippage may have favored the rapid divergence of the subtypes and that purifying selection has maintained this pattern because it is associated with function. Evolutionary analyses have found evidences of positive selection events in H1.1, both before and after the radiation of mammalian orders. Positive selection ancestral to mammalian radiation involved changes at specific sites that may have contributed to the low relative affinity of H1.1 for chromatin. More recent episodes of positive selection were detected at codon positions encoding amino acids of the C-terminal domain of H1.1, which may modulate the folding of the CTD. The detection of putative recombination points in H1.1-H1.5 subtypes suggests that this process may has been involved in the acquisition of the tripartite H1 structure.
Collapse
Affiliation(s)
- Inma Ponte
- Departamento de Bioquímica y Biología Molecular, Facultad de Biociencias, Universidad Autónoma de Barcelona, Barcelona, Spain
| | - Devani Romero
- Departamento de Bioquímica y Biología Molecular, Facultad de Biociencias, Universidad Autónoma de Barcelona, Barcelona, Spain
| | - Daniel Yero
- Instituto de Biotecnología y de Biomedicina (IBB) y Departamento de Genética y Microbiología, Universidad Autónoma de Barcelona, Barcelona, Spain
| | - Pedro Suau
- Departamento de Bioquímica y Biología Molecular, Facultad de Biociencias, Universidad Autónoma de Barcelona, Barcelona, Spain
| | - Alicia Roque
- Departamento de Bioquímica y Biología Molecular, Facultad de Biociencias, Universidad Autónoma de Barcelona, Barcelona, Spain
| |
Collapse
|
18
|
Keyfi F, Abbaszadegan MR, Rolfs A, Orolicki S, Moghaddassian M, Varasteh A. Identification of a novel deletion in the MMAA gene in two Iranian siblings with vitamin B12-responsive methylmalonic acidemia. Cell Mol Biol Lett 2016; 21:4. [PMID: 28536607 PMCID: PMC5415723 DOI: 10.1186/s11658-016-0005-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2015] [Accepted: 12/23/2015] [Indexed: 11/25/2022] Open
Abstract
Background Adenosylcobalamin (vitamin B12) is a coenzyme required for the activity of methylmalonyl-CoA mutase. Defects in this enzyme are a cause of methylmalonic acidemia (MMA). Methylmalonic acidemia, cblA type, is an inborn error of vitamin B12 metabolism that occurs due to mutations in the MMAA gene. MMAA encodes the enzyme which is involved in translocation of cobalamin into the mitochondria. Methods One family with two MMA-affected children, one unaffected child, and their parents were studied. The two affected children were diagnosed by urine organic acid analysis using gas chromatography-mass spectrometry. MMAA was analyzed by PCR and sequencing of its coding region. Results A homozygous deletion in exon 4 of MMAA, c.674delA, was found in both affected children. This deletion causes a nucleotide frame shift resulting in a change from asparagine to methionine at amino acid 225 (p.N225M) and a truncated protein which loses the ArgK conserved domain site. mRNA expression analysis of MMAA confirmed these results. Conclusion We demonstrate that the deletion in exon 4 of the MMAA gene (c.674 delA) is a pathogenic allele via a nucleotide frame shift resulting in a stop codon and termination of protein synthesis 38 nucleotides (12 amino acids) downstream of the deletion.
Collapse
Affiliation(s)
- Fatemeh Keyfi
- Immunology Research Center, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.,Pardis Clinical and Genetic Laboratory, Mashhad, Iran
| | - Mohammad Reza Abbaszadegan
- Division of Human Genetics, Immunology Research Center, Avicenna Research Institute, Mashhad University of Medical Sciences, Mashhad, Iran.,Pardis Clinical and Genetic Laboratory, Mashhad, Iran
| | - Arndt Rolfs
- Director of the Albrecht Kossel Institute for Neuroregeneration, University of Rostock, Rostock, Germany.,Chief Medical Director, Centogene AG, Rostock, Germany
| | | | - Morteza Moghaddassian
- Division of Human Genetics, Immunology Research Center, Avicenna Research Institute, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Abdolreza Varasteh
- Allergy Research Center, Mashhad University of Medical Sciences, Mashhad, Iran.,Pardis Clinical and Genetic Laboratory, Mashhad, Iran
| |
Collapse
|