1
|
Kress A, Poch O, Lecompte O, Thompson JD. Real or fake? Measuring the impact of protein annotation errors on estimates of domain gain and loss events. FRONTIERS IN BIOINFORMATICS 2023; 3:1178926. [PMID: 37151482 PMCID: PMC10158824 DOI: 10.3389/fbinf.2023.1178926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 04/05/2023] [Indexed: 05/09/2023] Open
Abstract
Protein annotation errors can have significant consequences in a wide range of fields, ranging from protein structure and function prediction to biomedical research, drug discovery, and biotechnology. By comparing the domains of different proteins, scientists can identify common domains, classify proteins based on their domain architecture, and highlight proteins that have evolved differently in one or more species or clades. However, genome-wide identification of different protein domain architectures involves a complex error-prone pipeline that includes genome sequencing, prediction of gene exon/intron structures, and inference of protein sequences and domain annotations. Here we developed an automated fact-checking approach to distinguish true domain loss/gain events from false events caused by errors that occur during the annotation process. Using genome-wide ortholog sets and taking advantage of the high-quality human and Saccharomyces cerevisiae genome annotations, we analyzed the domain gain and loss events in the predicted proteomes of 9 non-human primates (NHP) and 20 non-S. cerevisiae fungi (NSF) as annotated in the Uniprot and Interpro databases. Our approach allowed us to quantify the impact of errors on estimates of protein domain gains and losses, and we show that domain losses are over-estimated ten-fold and three-fold in the NHP and NSF proteins respectively. This is in line with previous studies of gene-level losses, where issues with genome sequencing or gene annotation led to genes being falsely inferred as absent. In addition, we show that insistent protein domain annotations are a major factor contributing to the false events. For the first time, to our knowledge, we show that domain gains are also over-estimated by three-fold and two-fold respectively in NHP and NSF proteins. Based on our more accurate estimates, we infer that true domain losses and gains in NHP with respect to humans are observed at similar rates, while domain gains in the more divergent NSF are observed twice as frequently as domain losses with respect to S. cerevisiae. This study highlights the need to critically examine the scientific validity of protein annotations, and represents a significant step toward scalable computational fact-checking methods that may 1 day mitigate the propagation of wrong information in protein databases.
Collapse
|
2
|
Lees-Miller JP, Cobban A, Katsonis P, Bacolla A, Tsutakawa SE, Hammel M, Meek K, Anderson DW, Lichtarge O, Tainer JA, Lees-Miller SP. Uncovering DNA-PKcs ancient phylogeny, unique sequence motifs and insights for human disease. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2021; 163:87-108. [PMID: 33035590 PMCID: PMC8021618 DOI: 10.1016/j.pbiomolbio.2020.09.010] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/19/2020] [Revised: 09/12/2020] [Accepted: 09/29/2020] [Indexed: 01/26/2023]
Abstract
DNA-dependent protein kinase catalytic subunit (DNA-PKcs) is a key member of the phosphatidylinositol-3 kinase-like (PIKK) family of protein kinases with critical roles in DNA-double strand break repair, transcription, metastasis, mitosis, RNA processing, and innate and adaptive immunity. The absence of DNA-PKcs from many model organisms has led to the assumption that DNA-PKcs is a vertebrate-specific PIKK. Here, we find that DNA-PKcs is widely distributed in invertebrates, fungi, plants, and protists, and that threonines 2609, 2638, and 2647 of the ABCDE cluster of phosphorylation sites are highly conserved amongst most Eukaryotes. Furthermore, we identify highly conserved amino acid sequence motifs and domains that are characteristic of DNA-PKcs relative to other PIKKs. These include residues in the Forehead domain and a novel motif we have termed YRPD, located in an α helix C-terminal to the ABCDE phosphorylation site loop. Combining sequence with biochemistry plus structural data on human DNA-PKcs unveils conserved sequence and conformational features with functional insights and implications. The defined generally progressive DNA-PKcs sequence diversification uncovers conserved functionality supported by Evolutionary Trace analysis, suggesting that for many organisms both functional sites and evolutionary pressures remain identical due to fundamental cell biology. The mining of cancer genomic data and germline mutations causing human inherited disease reveal that robust DNA-PKcs activity in tumors is detrimental to patient survival, whereas germline mutations compromising function are linked to severe immunodeficiency and neuronal degeneration. We anticipate that these collective results will enable ongoing DNA-PKcs functional analyses with biological and medical implications.
Collapse
Affiliation(s)
- James P Lees-Miller
- Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, Alberta, T2N 4N1, Canada
| | - Alexander Cobban
- Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, Alberta, T2N 4N1, Canada
| | - Panagiotis Katsonis
- Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Albino Bacolla
- Departments of Cancer Biology and of Molecular and Cellular Oncology, University of Texas MD Anderson Cancer Center, 6767 Bertner Avenue, Houston, TX, 77030, USA
| | - Susan E Tsutakawa
- Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Michal Hammel
- Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Katheryn Meek
- College of Veterinary Medicine, Department of Microbiology & Molecular Genetics, And Department of Pathobiology & Diagnostic Investigation, Michigan State University, East Lansing, MI, 48824, USA
| | - Dave W Anderson
- Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, Alberta, T2N 4N1, Canada
| | - Olivier Lichtarge
- Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - John A Tainer
- Departments of Cancer Biology and of Molecular and Cellular Oncology, University of Texas MD Anderson Cancer Center, 6767 Bertner Avenue, Houston, TX, 77030, USA; Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA.
| | - Susan P Lees-Miller
- Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, Alberta, T2N 4N1, Canada.
| |
Collapse
|
3
|
Wang W, Qu Q, Chen J. Identification, expression analysis, and antibacterial activity of Apolipoprotein A-I from amphioxus (Branchiostoma belcheri). Comp Biochem Physiol B Biochem Mol Biol 2019; 238:110329. [DOI: 10.1016/j.cbpb.2019.110329] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Revised: 06/25/2019] [Accepted: 08/22/2019] [Indexed: 12/29/2022]
|
4
|
Wilbrandt J, Misof B, Panfilio KA, Niehuis O. Repertoire-wide gene structure analyses: a case study comparing automatically predicted and manually annotated gene models. BMC Genomics 2019; 20:753. [PMID: 31623555 PMCID: PMC6798390 DOI: 10.1186/s12864-019-6064-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Accepted: 08/27/2019] [Indexed: 02/06/2023] Open
Abstract
Background The location and modular structure of eukaryotic protein-coding genes in genomic sequences can be automatically predicted by gene annotation algorithms. These predictions are often used for comparative studies on gene structure, gene repertoires, and genome evolution. However, automatic annotation algorithms do not yet correctly identify all genes within a genome, and manual annotation is often necessary to obtain accurate gene models and gene sets. As manual annotation is time-consuming, only a fraction of the gene models in a genome is typically manually annotated, and this fraction often differs between species. To assess the impact of manual annotation efforts on genome-wide analyses of gene structural properties, we compared the structural properties of protein-coding genes in seven diverse insect species sequenced by the i5k initiative. Results Our results show that the subset of genes chosen for manual annotation by a research community (3.5–7% of gene models) may have structural properties (e.g., lengths and exon counts) that are not necessarily representative for a species’ gene set as a whole. Nonetheless, the structural properties of automatically generated gene models are only altered marginally (if at all) through manual annotation. Major correlative trends, for example a negative correlation between genome size and exonic proportion, can be inferred from either the automatically predicted or manually annotated gene models alike. Vice versa, some previously reported trends did not appear in either the automatic or manually annotated gene sets, pointing towards insect-specific gene structural peculiarities. Conclusions In our analysis of gene structural properties, automatically predicted gene models proved to be sufficiently reliable to recover the same gene-repertoire-wide correlative trends that we found when focusing on manually annotated gene models only. We acknowledge that analyses on the individual gene level clearly benefit from manual curation. However, as genome sequencing and annotation projects often differ in the extent of their manual annotation and curation efforts, our results indicate that comparative studies analyzing gene structural properties in these genomes can nonetheless be justifiable and informative. Electronic supplementary material The online version of this article (10.1186/s12864-019-6064-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jeanne Wilbrandt
- Center for molecular Biodiversity Research, Zoological Research Museum Alexander Koenig (ZFMK), Adenauerallee 160, 53113, Bonn, Germany. .,Present address: Hoffmann Research Group, Leibniz Institute on Aging - Fritz Lipmann Institute, Beutenbergstraße 11, 07745, Jena, Germany.
| | - Bernhard Misof
- Center for molecular Biodiversity Research, Zoological Research Museum Alexander Koenig (ZFMK), Adenauerallee 160, 53113, Bonn, Germany
| | - Kristen A Panfilio
- School of Life Sciences, University of Warwick, Gibbet Hill Campus, Coventry, CV4 7AL, UK
| | - Oliver Niehuis
- Evolutionary Biology and Ecology, Institute of Biology I (Zoology), Albert Ludwig University, Hauptstr. 1, 79104, Freiburg, Germany
| |
Collapse
|
5
|
Ji J, Ramos-Vicente D, Navas-Pérez E, Herrera-Úbeda C, Lizcano JM, Garcia-Fernàndez J, Escrivà H, Bayés À, Roher N. Characterization of the TLR Family in Branchiostoma lanceolatum and Discovery of a Novel TLR22-Like Involved in dsRNA Recognition in Amphioxus. Front Immunol 2018; 9:2525. [PMID: 30450099 PMCID: PMC6224433 DOI: 10.3389/fimmu.2018.02525] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2018] [Accepted: 10/12/2018] [Indexed: 01/09/2023] Open
Abstract
Toll-like receptors (TLRs) are important for raising innate immune responses in both invertebrates and vertebrates. Amphioxus belongs to an ancient chordate lineage which shares key features with vertebrates. The genomic research on TLR genes in Branchiostoma floridae and Branchiostoma belcheri reveals the expansion of TLRs in amphioxus. However, the repertoire of TLRs in Branchiostoma lanceolatum has not been studied and the functionality of amphioxus TLRs has not been reported. We have identified from transcriptomic data 30 new putative TLRs in B. lanceolatum and all of them are transcribed in adult amphioxus. Phylogenetic analysis showed that the repertoire of TLRs consists of both non-vertebrate and vertebrate-like TLRs. It also indicated a lineage-specific expansion in orthologous clusters of the vertebrate TLR11 family. We did not detect any representatives of the vertebrate TLR1, TLR3, TLR4, TLR5 and TLR7 families. To gain insight into these TLRs, we studied in depth a particular TLR highly similar to a B. belcheri gene annotated as bbtTLR1. The phylogenetic analysis of this novel BlTLR showed that it clusters with the vertebrate TLR11 family and it might be more related to TLR13 subfamily according to similar domain architecture. Transient and stable expression in HEK293 cells showed that the BlTLR localizes on the plasma membrane, but it did not respond to the most common mammalian TLR ligands. However, when the ectodomain of BlTLR is fused to the TIR domain of human TLR2, the chimeric protein could indeed induce NF-κB transactivation in response to the viral ligand Poly I:C, also indicating that in amphioxus, specific accessory proteins are needed for downstream activation. Based on the phylogenetic, subcellular localization and functional analysis, we propose that the novel BlTLR might be classified as an antiviral receptor sharing at least partly the functions performed by vertebrate TLR22. TLR22 is thought to be viral teleost-specific TLR but here we demonstrate that teleosts and amphioxus TLR22-like probably shared a common ancestor. Additional functional studies with other lancelet TLR genes will enrich our understanding of the immune response in amphioxus and will provide a unique perspective on the evolution of the immune system.
Collapse
Affiliation(s)
- Jie Ji
- Department of Cell Biology, Animal Physiology and Immunology, Institute of Biotechnology and Biomedicine (IBB), Universitat Autònoma de Barcelona, Bellaterra, Spain
| | - David Ramos-Vicente
- Department of Cell Biology, Animal Physiology and Immunology, Institute of Biotechnology and Biomedicine (IBB), Universitat Autònoma de Barcelona, Bellaterra, Spain.,Molecular Physiology of the Synapse Laboratory, Biomedical Research Institute Sant Pau (IIB Sant Pau), Barcelona, Spain
| | - Enrique Navas-Pérez
- Department of Genetics, School of Biology and Institute of Biomedicine (IBUB), University of Barcelona, Barcelona, Spain
| | - Carlos Herrera-Úbeda
- Department of Genetics, School of Biology and Institute of Biomedicine (IBUB), University of Barcelona, Barcelona, Spain
| | - José Miguel Lizcano
- Department of Biochemistry and Molecular Biology, Institute of Neurosciences, Universitat Autònoma de Barcelona, Bellaterra, Spain
| | - Jordi Garcia-Fernàndez
- Department of Genetics, School of Biology and Institute of Biomedicine (IBUB), University of Barcelona, Barcelona, Spain
| | - Hector Escrivà
- CNRS, Biologie Intégrative des Organismes Marins, BIOM, Sorbonne Université, Banyuls-sur-Mer, France
| | - Àlex Bayés
- Department of Cell Biology, Animal Physiology and Immunology, Institute of Biotechnology and Biomedicine (IBB), Universitat Autònoma de Barcelona, Bellaterra, Spain.,Molecular Physiology of the Synapse Laboratory, Biomedical Research Institute Sant Pau (IIB Sant Pau), Barcelona, Spain
| | - Nerea Roher
- Department of Cell Biology, Animal Physiology and Immunology, Institute of Biotechnology and Biomedicine (IBB), Universitat Autònoma de Barcelona, Bellaterra, Spain
| |
Collapse
|
6
|
Bányai L, Kerekes K, Trexler M, Patthy L. Morphological Stasis and Proteome Innovation in Cephalochordates. Genes (Basel) 2018; 9:genes9070353. [PMID: 30013013 PMCID: PMC6071037 DOI: 10.3390/genes9070353] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Revised: 07/11/2018] [Accepted: 07/11/2018] [Indexed: 11/16/2022] Open
Abstract
Lancelets, extant representatives of basal chordates, are prototypic examples of evolutionary stasis; they preserved a morphology and body-plan most similar to the fossil chordates from the early Cambrian. Such a low level of morphological evolution is in harmony with a low rate of amino acid substitution; cephalochordate proteins were shown to evolve slower than those of the slowest evolving vertebrate, the elephant shark. Surprisingly, a study comparing the predicted proteomes of Chinese amphioxus, Branchiostoma belcheri and the Florida amphioxus, Branchiostoma floridae has led to the conclusion that the rate of creation of novel domain combinations is orders of magnitude greater in lancelets than in any other Metazoa, a finding that contradicts the notion that high rates of protein innovation are usually associated with major evolutionary innovations. Our earlier studies on a representative sample of proteins have provided evidence suggesting that the differences in the domain architectures of predicted proteins of these two lancelet species reflect annotation errors, rather than true innovations. In the present work, we have extended these studies to include a larger sample of genes and two additional lancelet species, Asymmetron lucayanum and Branchiostoma lanceolatum. These analyses have confirmed that the domain architecture differences of orthologous proteins of the four lancelet species are because of errors of gene prediction, the error rate in the given species being inversely related to the quality of the transcriptome dataset that was used to aid gene prediction.
Collapse
Affiliation(s)
- László Bányai
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, H-1117 Budapest, Hungary.
| | - Krisztina Kerekes
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, H-1117 Budapest, Hungary.
| | - Mária Trexler
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, H-1117 Budapest, Hungary.
| | - László Patthy
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, H-1117 Budapest, Hungary.
| |
Collapse
|
7
|
Abstract
One central goal of genome biology is to understand how the usage of the genome differs between organisms. Our knowledge of genome composition, needed for downstream inferences, is critically dependent on gene annotations, yet problems associated with gene annotation and assembly errors are usually ignored in comparative genomics. Here, we analyze the genomes of 68 species across 12 animal phyla and some single-cell eukaryotes for general trends in genome composition and transcription, taking into account problems of gene annotation. We show that, regardless of genome size, the ratio of introns to intergenic sequence is comparable across essentially all animals, with nearly all deviations dominated by increased intergenic sequence. Genomes of model organisms have ratios much closer to 1:1, suggesting that the majority of published genomes of nonmodel organisms are underannotated and consequently omit substantial numbers of genes, with likely negative impact on evolutionary interpretations. Finally, our results also indicate that most animals transcribe half or more of their genomes arguing against differences in genome usage between animal groups, and also suggesting that the transcribed portion is more dependent on genome size than previously thought.
Collapse
Affiliation(s)
- Warren R Francis
- Department of Earth and Environmental Sciences, Paleontology and Geobiology, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Gert Wörheide
- Department of Earth and Environmental Sciences, Paleontology and Geobiology, Ludwig-Maximilians-Universität München, Munich, Germany.,GeoBio-Center, Ludwig-Maximilians-Universität München, Munich, Germany.,Bavarian State Collection for Paleontology and Geology, Munich, Germany
| |
Collapse
|
8
|
Pantzartzi CN, Pergner J, Kozmik Z. The role of transposable elements in functional evolution of amphioxus genome: the case of opsin gene family. Sci Rep 2018; 8:2506. [PMID: 29410521 PMCID: PMC5802833 DOI: 10.1038/s41598-018-20683-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Accepted: 01/22/2018] [Indexed: 11/30/2022] Open
Abstract
Transposable elements (TEs) are able to jump to new locations (transposition) in the genome, usually after replication. They constitute the so-called selfish or junk DNA and take over large proportions of some genomes. Due to their ability to move around they can change the DNA landscape of genomes and are therefore a rich source of innovation in genes and gene regulation. Surge of sequence data in the past years has significantly facilitated large scale comparative studies. Cephalochordates have been regarded as a useful proxy to ancestral chordate condition partially due to the comparatively slow evolutionary rate at morphological and genomic level. In this study, we used opsin gene family from three Branchiostoma species as a window into cephalochordate genome evolution. We compared opsin complements in terms of family size, gene structure and sequence allowing us to identify gene duplication and gene loss events. Furthermore, analysis of the opsin containing genomic loci showed that they are populated by TEs. In summary, we provide evidence of the way transposable elements may have contributed to the evolution of opsin gene family and to the shaping of cephalochordate genomes in general.
Collapse
Affiliation(s)
- Chrysoula N Pantzartzi
- Laboratory of Eye Biology, Institute of Molecular Genetics of the ASCR, v.v.i., Division BIOCEV, Prumyslová 595, 252 50, Vestec, Czech Republic
| | - Jiri Pergner
- Department of Transcriptional Regulation, Institute of Molecular Genetics of the ASCR, v.v.i., Videnska 1083, 14220, Prague 4, Czech Republic
| | - Zbynek Kozmik
- Laboratory of Eye Biology, Institute of Molecular Genetics of the ASCR, v.v.i., Division BIOCEV, Prumyslová 595, 252 50, Vestec, Czech Republic. .,Department of Transcriptional Regulation, Institute of Molecular Genetics of the ASCR, v.v.i., Videnska 1083, 14220, Prague 4, Czech Republic.
| |
Collapse
|
9
|
Gerdol M, Venier P, Edomi P, Pallavicini A. Diversity and evolution of TIR-domain-containing proteins in bivalves and Metazoa: New insights from comparative genomics. DEVELOPMENTAL AND COMPARATIVE IMMUNOLOGY 2017; 70:145-164. [PMID: 28109746 DOI: 10.1016/j.dci.2017.01.014] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2016] [Revised: 01/13/2017] [Accepted: 01/17/2017] [Indexed: 06/06/2023]
Abstract
The Toll/interleukin-1 receptor (TIR) domain has a fundamental role in the innate defence response of plants, vertebrate and invertebrate animals. Mostly found in the cytosolic side of membrane-bound receptor proteins, it mediates the intracellular signalling upon pathogen recognition via heterotypic interactions. Although a number of TIR-domain-containing (TIR-DC) proteins have been characterized in vertebrates, their evolutionary relationships and functional role in protostomes are still largely unknown. Due to the high abundance and diversity of TIR-DC proteins in bivalve molluscs, we investigated this class of marine invertebrates as a case study. The analysis of the available genomic and transcriptomic data allowed the identification of over 400 full-length sequences and their classification in protein families based on sequence homology and domain organization. In addition to TLRs and MyD88 adaptors, bivalves possess a surprisingly large repertoire of intracellular TIR-DC proteins, which are conserved across a broad range of metazoan taxa. Overall, we report the expansion and diversification of TIR-DC proteins in several invertebrate lineages and the identification of many novel protein families possibly involved in both immune-related signalling and embryonic development.
Collapse
Affiliation(s)
- Marco Gerdol
- University of Trieste, Department of Life Sciences, Via Licio Giorgieri 5, 34127 Trieste, Italy.
| | - Paola Venier
- University of Padova, Department of Biology, Via Ugo Bassi 58/B, 35131 Padova, Italy.
| | - Paolo Edomi
- University of Trieste, Department of Life Sciences, Via Licio Giorgieri 5, 34127 Trieste, Italy.
| | - Alberto Pallavicini
- University of Trieste, Department of Life Sciences, Via Licio Giorgieri 5, 34127 Trieste, Italy.
| |
Collapse
|