1
|
Look4LTRs: a Long terminal repeat retrotransposon detection tool capable of cross species studies and discovering recently nested repeats. Mob DNA 2024; 15:8. [PMID: 38627766 PMCID: PMC11020628 DOI: 10.1186/s13100-024-00317-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Accepted: 03/08/2024] [Indexed: 04/20/2024] Open
Abstract
Plant genomes include large numbers of transposable elements. One particular type of these elements is flanked by two Long Terminal Repeats (LTRs) and can translocate using RNA. Such elements are known as LTR-retrotransposons; they are the most abundant type of transposons in plant genomes. They have many important functions involving gene regulation and the rise of new genes and pseudo genes in response to severe stress. Additionally, LTR-retrotransposons have several applications in biotechnology. Due to the abundance and the importance of LTR-retrotransposons, multiple computational tools have been developed for their detection. However, none of these tools take advantages of the availability of related genomes; they process one chromosome at a time. Further, recently nested LTR-retrotransposons (multiple elements of the same family are inserted into each other) cannot be annotated accurately - or cannot be annotated at all - by the currently available tools. Motivated to overcome these two limitations, we built Look4LTRs, which can annotate LTR-retrotransposons in multiple related genomes simultaneously and discover recently nested elements. The methodology of Look4LTRs depends on techniques imported from the signal-processing field, graph algorithms, and machine learning with a minimal use of alignment algorithms. Four plant genomes were used in developing Look4LTRs and eight plant genomes for evaluating it in contrast to three related tools. Look4LTRs is the fastest while maintaining better or comparable F1 scores (the harmonic average of recall and precision) to those obtained by the other tools. Our results demonstrate the added benefit of annotating LTR-retrotransposons in multiple related genomes simultaneously and the ability to discover recently nested elements. Expert human manual examination of six elements - not included in the ground truth - revealed that three elements belong to known families and two elements are likely from new families. With respect to examining recently nested LTR-retrotransposons, three out of five were confirmed to be valid elements. Look4LTRs - with its speed, accuracy, and novel features - represents a true advancement in the annotation of LTR-retrotransposons, opening the door to many studies focused on understanding their functions in plants.
Collapse
|
2
|
MicroAnnot: A Dedicated Workflow for Accurate Microsporidian Genome Annotation. Int J Mol Sci 2024; 25:880. [PMID: 38255958 PMCID: PMC10815200 DOI: 10.3390/ijms25020880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 12/29/2023] [Accepted: 01/04/2024] [Indexed: 01/24/2024] Open
Abstract
With nearly 1700 species, Microsporidia represent a group of obligate intracellular eukaryotes with veterinary, economic and medical impacts. To help understand the biological functions of these microorganisms, complete genome sequencing is routinely used. Nevertheless, the proper prediction of their gene catalogue is challenging due to their taxon-specific evolutionary features. As innovative genome annotation strategies are needed to obtain a representative snapshot of the overall lifestyle of these parasites, the MicroAnnot tool, a dedicated workflow for microsporidian sequence annotation using data from curated databases of accurately annotated microsporidian genes, has been developed. Furthermore, specific modules have been implemented to perform small gene (<300 bp) and transposable element identification. Finally, functional annotation was performed using the signature-based InterProScan software. MicroAnnot's accuracy has been verified by the re-annotation of four microsporidian genomes for which structural annotation had previously been validated. With its comparative approach and transcriptional signal identification method, MicroAnnot provides an accurate prediction of translation initiation sites, an efficient identification of transposable elements, as well as high specificity and sensitivity for microsporidian genes, including those under 300 bp.
Collapse
|
3
|
Recent Bioinformatic Progress to Identify Epigenetic Changes Associated to Transposable Elements. Front Genet 2022; 13:891194. [PMID: 35646069 PMCID: PMC9140218 DOI: 10.3389/fgene.2022.891194] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Accepted: 04/25/2022] [Indexed: 11/13/2022] Open
Abstract
Transposable elements (TEs) are recognized for their great impact on the functioning and evolution of their host genomes. They are associated to various deleterious effects, which has led to the evolution of regulatory epigenetic mechanisms to control their activity. Despite these negative effects, TEs are also important actors in the evolution of genomes by promoting genetic diversity and new regulatory elements. Consequently, it is important to study the epigenetic modifications associated to TEs especially at a locus-specific level to determine their individual influence on gene functioning. To this aim, this short review presents the current bioinformatic tools to achieve this task.
Collapse
|
4
|
Erratum to: The Transposable Element Environment of Human Genes Differs According to Their Duplication Status and Essentiality. Genome Biol Evol 2021; 13:6368393. [PMID: 34508264 PMCID: PMC8433420 DOI: 10.1093/gbe/evab175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
|
5
|
Massive colonization of protein-coding exons by selfish genetic elements in Paramecium germline genomes. PLoS Biol 2021; 19:e3001309. [PMID: 34324490 PMCID: PMC8354472 DOI: 10.1371/journal.pbio.3001309] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 08/10/2021] [Accepted: 06/04/2021] [Indexed: 11/18/2022] Open
Abstract
Ciliates are unicellular eukaryotes with both a germline genome and a somatic genome in the same cytoplasm. The somatic macronucleus (MAC), responsible for gene expression, is not sexually transmitted but develops from a copy of the germline micronucleus (MIC) at each sexual generation. In the MIC genome of Paramecium tetraurelia, genes are interrupted by tens of thousands of unique intervening sequences called internal eliminated sequences (IESs), which have to be precisely excised during the development of the new MAC to restore functional genes. To understand the evolutionary origin of this peculiar genomic architecture, we sequenced the MIC genomes of 9 Paramecium species (from approximately 100 Mb in Paramecium aurelia species to >1.5 Gb in Paramecium caudatum). We detected several waves of IES gains, both in ancestral and in more recent lineages. While the vast majority of IESs are single copy in present-day genomes, we identified several families of mobile IESs, including nonautonomous elements acquired via horizontal transfer, which generated tens to thousands of new copies. These observations provide the first direct evidence that transposable elements can account for the massive proliferation of IESs in Paramecium. The comparison of IESs of different evolutionary ages indicates that, over time, IESs shorten and diverge rapidly in sequence while they acquire features that allow them to be more efficiently excised. We nevertheless identified rare cases of IESs that are under strong purifying selection across the aurelia clade. The cases examined contain or overlap cellular genes that are inactivated by excision during development, suggesting conserved regulatory mechanisms. Similar to the evolution of introns in eukaryotes, the evolution of Paramecium IESs highlights the major role played by selfish genetic elements in shaping the complexity of genome architecture and gene expression. A comparative genomics study of nine Paramecium species reveals successful invasion of genes by transposable elements in their germline genomes, showing that the internal eliminated sequences (IESs) followed an evolutionary trajectory remarkably similar to that of spliceosomal introns.
Collapse
|
6
|
The Transposable Element Environment of Human Genes Differs According to Their Duplication Status and Essentiality. Genome Biol Evol 2021; 13:6273345. [PMID: 33973013 PMCID: PMC8155550 DOI: 10.1093/gbe/evab062] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/17/2021] [Indexed: 12/13/2022] Open
Abstract
Transposable elements (TEs) are major components of eukaryotic genomes and represent approximately 45% of the human genome. TEs can be important sources of novelty in genomes and there is increasing evidence that TEs contribute to the evolution of gene regulation in mammals. Gene duplication is an evolutionary mechanism that also provides new genetic material and opportunities to acquire new functions. To investigate how duplicated genes are maintained in genomes, here, we explored the TE environment of duplicated and singleton genes. We found that singleton genes have more short-interspersed nuclear elements and DNA transposons in their vicinity than duplicated genes, whereas long-interspersed nuclear elements and long-terminal repeat retrotransposons have accumulated more near duplicated genes. We also discovered that this result is highly associated with the degree of essentiality of the genes with an unexpected accumulation of short-interspersed nuclear elements and DNA transposons around the more-essential genes. Our results underline the importance of taking into account the TE environment of genes to better understand how duplicated genes are maintained in genomes.
Collapse
|
7
|
An Overview of Duplicated Gene Detection Methods: Why the Duplication Mechanism Has to Be Accounted for in Their Choice. Genes (Basel) 2020; 11:E1046. [PMID: 32899740 PMCID: PMC7565063 DOI: 10.3390/genes11091046] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 09/01/2020] [Accepted: 09/02/2020] [Indexed: 12/11/2022] Open
Abstract
Gene duplication is an important evolutionary mechanism allowing to provide new genetic material and thus opportunities to acquire new gene functions for an organism, with major implications such as speciation events. Various processes are known to allow a gene to be duplicated and different models explain how duplicated genes can be maintained in genomes. Due to their particular importance, the identification of duplicated genes is essential when studying genome evolution but it can still be a challenge due to the various fates duplicated genes can encounter. In this review, we first describe the evolutionary processes allowing the formation of duplicated genes but also describe the various bioinformatic approaches that can be used to identify them in genome sequences. Indeed, these bioinformatic approaches differ according to the underlying duplication mechanism. Hence, understanding the specificity of the duplicated genes of interest is a great asset for tool selection and should be taken into account when exploring a biological question.
Collapse
|
8
|
On the Importance to Acknowledge Transposable Elements in Epigenomic Analyses. Genes (Basel) 2019; 10:genes10040258. [PMID: 30935103 PMCID: PMC6523952 DOI: 10.3390/genes10040258] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2019] [Revised: 03/27/2019] [Accepted: 03/27/2019] [Indexed: 12/21/2022] Open
Abstract
Eukaryotic genomes comprise a large proportion of repeated sequences, an important fraction of which are transposable elements (TEs). TEs are mobile elements that have a significant impact on genome evolution and on gene functioning. Although some TE insertions could provide adaptive advantages to species, transposition is a highly mutagenic event that has to be tightly controlled to ensure its viability. Genomes have evolved sophisticated mechanisms to control TE activity, the most important being epigenetic silencing. However, the epigenetic control of TEs can also affect genes located nearby that can become epigenetically regulated. It has been proposed that the combination of TE mobilization and the induced changes in the epigenetic landscape could allow a rapid phenotypic adaptation to global environmental changes. In this review, we argue the crucial need to take into account the repeated part of genomes when studying the global impact of epigenetic modifications on an organism. We emphasize more particularly why it is important to carefully consider TEs and what bioinformatic tools can be used to do so.
Collapse
|
9
|
Does the Presence of Transposable Elements Impact the Epigenetic Environment of Human Duplicated Genes? Genes (Basel) 2019; 10:genes10030249. [PMID: 30917603 PMCID: PMC6470583 DOI: 10.3390/genes10030249] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Revised: 03/22/2019] [Accepted: 03/22/2019] [Indexed: 02/07/2023] Open
Abstract
Epigenetic modifications have an important role to explain part of the intra- and inter-species variation in gene expression. They also have a role in the control of transposable elements (TEs) whose activity may have a significant impact on genome evolution by promoting various mutations, which are expected to be mostly deleterious. A change in the local epigenetic landscape associated with the presence of TEs is expected to affect the expression of neighboring genes since these modifications occurring at TE sequences can spread to neighboring sequences. In this work, we have studied how the epigenetic modifications of genes are conserved and what the role of TEs is in this conservation. For that, we have compared the conservation of the epigenome associated with human duplicated genes and the differential presence of TEs near these genes. Our results show higher epigenome conservation of duplicated genes from the same family when they share similar TE environment, suggesting a role for the differential presence of TEs in the evolutionary divergence of duplicates through variation in the epigenetic landscape.
Collapse
|
10
|
Population-specific dynamics and selection patterns of transposable element insertions in European natural populations. Mol Ecol 2019; 28:1506-1522. [PMID: 30506554 PMCID: PMC6849870 DOI: 10.1111/mec.14963] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Revised: 10/30/2018] [Accepted: 11/05/2018] [Indexed: 01/02/2023]
Abstract
Transposable elements (TEs) are ubiquitous sequences in genomes of virtually all species. While TEs have been investigated for several decades, only recently we have the opportunity to study their genome-wide population dynamics. Most of the studies so far have been restricted either to the analysis of the insertions annotated in the reference genome or to the analysis of a limited number of populations. Taking advantage of the European Drosophila population genomics consortium (DrosEU) sequencing data set, we have identified and measured the dynamics of TEs in a large sample of European Drosophila melanogaster natural populations. We showed that the mobilome landscape is population-specific and highly diverse depending on the TE family. In contrast with previous studies based on SNP variants, no geographical structure was observed for TE abundance or TE divergence in European populations. We further identified de novo individual insertions using two available programs and, as expected, most of the insertions were present at low frequencies. Nevertheless, we identified a subset of TEs present at high frequencies and located in genomic regions with a high recombination rate. These TEs are candidates for being the target of positive selection, although neutral processes should be discarded before reaching any conclusion on the type of selection acting on them. Finally, parallel patterns of association between the frequency of TE insertions and several geographical and temporal variables were found between European and North American populations, suggesting that TEs can be potentially implicated in the adaptation of populations across continents.
Collapse
|
11
|
TEtools facilitates big data expression analysis of transposable elements and reveals an antagonism between their activity and that of piRNA genes. Nucleic Acids Res 2018; 45:e17. [PMID: 28204592 PMCID: PMC5389681 DOI: 10.1093/nar/gkw953] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2016] [Revised: 09/29/2016] [Accepted: 10/11/2016] [Indexed: 11/24/2022] Open
Abstract
Over recent decades, substantial efforts have been made to understand the interactions between host genomes and transposable elements (TEs). The impact of TEs on the regulation of host genes is well known, with TEs acting as platforms of regulatory sequences. Nevertheless, due to their repetitive nature it is considerably hard to integrate TE analysis into genome-wide studies. Here, we developed a specific tool for the analysis of TE expression: TEtools. This tool takes into account the TE sequence diversity of the genome, it can be applied to unannotated or unassembled genomes and is freely available under the GPL3 (https://github.com/l-modolo/TEtools). TEtools performs the mapping of RNA-seq data obtained from classical mRNAs or small RNAs onto a list of TE sequences and performs differential expression analyses with statistical relevance. Using this tool, we analyzed TE expression from five Drosophila wild-type strains. Our data show for the first time that the activity of TEs is strictly linked to the activity of the genes implicated in the piwi-interacting RNA biogenesis and therefore fits an arms race scenario between TE sequences and host control genes.
Collapse
|
12
|
Evolutionary history of LTR-retrotransposons among 20 Drosophila species. Mob DNA 2017; 8:7. [PMID: 28465726 PMCID: PMC5408442 DOI: 10.1186/s13100-017-0090-3] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2017] [Accepted: 04/21/2017] [Indexed: 12/26/2022] Open
Abstract
Background The presence of transposable elements (TEs) in genomes is known to explain in part the variations of genome sizes among eukaryotes. Even among closely related species, the variation of TE amount may be striking, as for example between the two sibling species, Drosophila melanogaster and D. simulans. However, not much is known concerning the TE content and dynamics among other Drosophila species. The sequencing of several Drosophila genomes, covering the two subgenus Sophophora and Drosophila, revealed a large variation of the repeat content among these species but no much information is known concerning their precise TE content. The identification of some consensus sequences of TEs from the various sequenced Drosophila species allowed to get an idea concerning their variety in term of diversity of superfamilies but the used classification remains very elusive and ambiguous. Results We choose to focus on LTR-retrotransposons because they represent the most widely represented class of TEs in the Drosophila genomes. In this work, we describe for the first time the phylogenetic relationship of each LTR-retrotransposon family described in 20 Drosophila species, compute their proportion in their respective genomes and identify several new cases of horizontal transfers. Conclusion All these results allow us to have a clearer view on the evolutionary history of LTR retrotransposons among Drosophila that seems to be mainly driven by vertical transmissions although the implications of horizontal transfers, losses and intra-specific diversification are clearly also at play. Electronic supplementary material The online version of this article (doi:10.1186/s13100-017-0090-3) contains supplementary material, which is available to authorized users.
Collapse
|
13
|
Simulation-based estimation of branching models for LTR retrotransposons. Bioinformatics 2017; 33:320-326. [PMID: 28011770 DOI: 10.1093/bioinformatics/btw622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Accepted: 09/23/2016] [Indexed: 11/12/2022] Open
Abstract
Motivation LTR retrotransposons are mobile elements that are able, like retroviruses, to copy and move inside eukaryotic genomes. In the present work, we propose a branching model for studying the propagation of LTR retrotransposons in these genomes. This model allows us to take into account both the positions and the degradation level of LTR retrotransposons copies. In our model, the duplication rate is also allowed to vary with the degradation level. Results Various functions have been implemented in order to simulate their spread and visualization tools are proposed. Based on these simulation tools, we have developed a first method to evaluate the parameters of this propagation model. We applied this method to the study of the spread of the transposable elements ROO, GYPSY and DM412 on a chromosome of Drosophila melanogaster . Availability and Implementation Our proposal has been implemented using Python software. Source code is freely available on the web at https://github.com/SergeMOULIN/retrotransposons-spread . Contact serge.moulin@univ-fcomte.fr. Supplementary information are available at Bioinformatics online.
Collapse
|
14
|
The transposable element environment of human genes is associated with histone and expression changes in cancer. BMC Genomics 2016; 17:588. [PMID: 27506777 PMCID: PMC4979156 DOI: 10.1186/s12864-016-2970-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2016] [Accepted: 07/27/2016] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Only 2 % of the human genome code for proteins. Among the remaining 98 %, transposable elements (TEs) represent millions of sequences. TEs have an impact on genome evolution by promoting mutations. Especially, TEs possess their own regulatory sequences and can alter the expression pattern of neighboring genes. Since they can potentially be harmful, TE activity is regulated by epigenetic mechanisms. These mechanisms participate in the modulation of gene expression and can be associated with some human diseases resulting from gene expression deregulation. The fact that the TE silencing can be removed in cancer could explain a part of the changes in gene expression. Indeed, epigenetic modifications associated locally with TE sequences could impact neighboring genes since these modifications can spread to adjacent sequences. RESULTS We compared the histone enrichment, TE neighborhood, and expression divergence of human genes between a normal and a cancer conditions. We show that the presence of TEs near genes is associated with greater changes in histone enrichment and that differentially expressed genes harbor larger histone enrichment variation related to the presence of particular TEs. CONCLUSIONS Taken together, these results suggest that the presence of TEs near genes could favor important variation in gene expression when the cell environment is modified.
Collapse
|
15
|
A call for benchmarking transposable element annotation methods. Mob DNA 2015; 6:13. [PMID: 26244060 PMCID: PMC4524446 DOI: 10.1186/s13100-015-0044-6] [Citation(s) in RCA: 65] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2015] [Accepted: 07/22/2015] [Indexed: 12/31/2022] Open
Abstract
DNA derived from transposable elements (TEs) constitutes large parts of the genomes of complex eukaryotes, with major impacts not only on genomic research but also on how organisms evolve and function. Although a variety of methods and tools have been developed to detect and annotate TEs, there are as yet no standard benchmarks-that is, no standard way to measure or compare their accuracy. This lack of accuracy assessment calls into question conclusions from a wide range of research that depends explicitly or implicitly on TE annotation. In the absence of standard benchmarks, toolmakers are impeded in improving their tools, annotators cannot properly assess which tools might best suit their needs, and downstream researchers cannot judge how accuracy limitations might impact their studies. We therefore propose that the TE research community create and adopt standard TE annotation benchmarks, and we call for other researchers to join the authors in making this long-overdue effort a success.
Collapse
|
16
|
UrQt: an efficient software for the Unsupervised Quality trimming of NGS data. BMC Bioinformatics 2015; 16:137. [PMID: 25924884 PMCID: PMC4450468 DOI: 10.1186/s12859-015-0546-8] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2014] [Accepted: 03/20/2015] [Indexed: 11/25/2022] Open
Abstract
Background Quality control is a necessary step of any Next Generation Sequencing analysis. Although customary, this step still requires manual interventions to empirically choose tuning parameters according to various quality statistics. Moreover, current quality control procedures that provide a “good quality” data set, are not optimal and discard many informative nucleotides. To address these drawbacks, we present a new quality control method, implemented in UrQt software, for Unsupervised Quality trimming of Next Generation Sequencing reads. Results Our trimming procedure relies on a well-defined probabilistic framework to detect the best segmentation between two segments of unreliable nucleotides, framing a segment of informative nucleotides. Our software only requires one user-friendly parameter to define the minimal quality threshold (phred score) to consider a nucleotide to be informative, which is independent of both the experiment and the quality of the data. This procedure is implemented in C++ in an efficient and parallelized software with a low memory footprint. We tested the performances of UrQt compared to the best-known trimming programs, on seven RNA and DNA sequencing experiments and demonstrated its optimality in the resulting tradeoff between the number of trimmed nucleotides and the quality objective. Conclusions By finding the best segmentation to delimit a segment of good quality nucleotides, UrQt greatly increases the number of reads and of nucleotides that can be retained for a given quality objective. UrQt source files, binary executables for different operating systems and documentation are freely available (under the GPLv3) at the following address: https://lbbe.univ-lyon1.fr/-UrQt-.html. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0546-8) contains supplementary material, which is available to authorized users.
Collapse
|
17
|
A new genome-wide method to track horizontally transferred sequences: application to Drosophila. Genome Biol Evol 2015; 6:416-32. [PMID: 24497602 PMCID: PMC3942030 DOI: 10.1093/gbe/evu026] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Because of methodological breakthroughs and the availability of an increasing amount of whole-genome sequence data, horizontal transfers (HTs) in eukaryotes have received much attention recently. Contrary to similar analyses in prokaryotes, most studies in eukaryotes usually investigate particular sequences corresponding to transposable elements (TEs), neglecting the other components of the genome. We present a new methodological framework for the genome-wide detection of all putative horizontally transferred sequences between two species that requires no prior knowledge of the transferred sequences. This method provides a broader picture of HTs in eukaryotes by fully exploiting complete-genome sequence data. In contrast to previous genome-wide approaches, we used a well-defined statistical framework to control for the number of false positives in the results, and we propose two new validation procedures to control for confounding factors. The first validation procedure relies on a comparative analysis with other species of the phylogeny to validate HTs for the nonrepeated sequences detected, whereas the second one built upon the study of the dynamics of the detected TEs. We applied our method to two closely related Drosophila species, Drosophila melanogaster and D. simulans, in which we discovered 10 new HTs in addition to all the HTs previously detected in different studies, which underscores our method’s high sensitivity and specificity. Our results favor the hypothesis of multiple independent HTs of TEs while unraveling a small portion of the network of HTs in the Drosophila phylogeny.
Collapse
|
18
|
Exploiting the architecture and the features of the microsporidian genomes to investigate diversity and impact of these parasites on ecosystems. Heredity (Edinb) 2014; 114:441-9. [PMID: 25182222 DOI: 10.1038/hdy.2014.78] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2014] [Revised: 07/16/2014] [Accepted: 07/21/2014] [Indexed: 12/16/2022] Open
Abstract
Fungal species play extremely important roles in ecosystems. Clustered at the base of the fungal kingdom are Microsporidia, a group of obligate intracellular eukaryotes infecting multiple animal lineages. Because of their large host spectrum and their implications in host population regulation, they influence food webs, and accordingly, ecosystem structure and function. Unfortunately, their ecological role is not well understood. Present also as highly resistant spores in the environment, their characterisation requires special attention. Different techniques based on direct isolation and/or molecular approaches can be considered to elucidate their role in the ecosystems, but integrating environmental and genomic data (for example, genome architecture, core genome, transcriptional and translational signals) is crucial to better understand the diversity and adaptive capacities of Microsporidia. Here, we review the current status of Microsporidia in trophic networks; the various genomics tools that could be used to ensure identification and evaluate diversity and abundance of these organisms; and how these tools could be used to explore the microsporidian life cycle in different environments. Our understanding of the evolution of these widespread parasites is currently impaired by limited sampling, and we have no doubt witnessed but a small subset of their diversity.
Collapse
|
19
|
Microsporidian genomes harbor a diverse array of transposable elements that demonstrate an ancestry of horizontal exchange with metazoans. Genome Biol Evol 2014; 6:2289-300. [PMID: 25172905 PMCID: PMC4202319 DOI: 10.1093/gbe/evu178] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Microsporidian genomes are the leading models to understand the streamlining in response to a pathogenic lifestyle; they are gene-poor and often possess small genomes. In this study, we show a feature of microsporidian genomes that contrasts this pattern of genome reduction. Specifically, genome investigations targeted at Anncaliia algerae, a human pathogen with a genome size of 23 Mb, revealed the presence of a hitherto undetected diversity in transposable elements (TEs). A total of 240 TE families per genome were identified, exceeding that found in many free-living fungi, and searches of microsporidian species revealed that these mobile elements represent a significant portion of their coding repertoire. Their phylogenetic analysis revealed that many cases of ancestry involve recent and bidirectional horizontal transfers with metazoans. The abundance and horizontal transfer origin of microsporidian TEs highlight a novel dimension of genome evolution in these intracellular pathogens, demonstrating that factors beyond reduction are at play in their diversification.
Collapse
|
20
|
Hijacking of host cellular functions by an intracellular parasite, the microsporidian Anncaliia algerae. PLoS One 2014; 9:e100791. [PMID: 24967735 PMCID: PMC4072689 DOI: 10.1371/journal.pone.0100791] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2014] [Accepted: 05/29/2014] [Indexed: 11/18/2022] Open
Abstract
Intracellular pathogens including bacteria, viruses and protozoa hijack host cell functions to access nutrients and to bypass cellular defenses and immune responses. These strategies have been acquired through selective pressure and allowed pathogens to reach an appropriate cellular niche for their survival and growth. To get new insights on how parasites hijack host cellular functions, we developed a SILAC (Stable Isotope Labeling by Amino Acids in Cell culture) quantitative proteomics workflow. Our study focused on deciphering the cross-talk in a host-parasite association, involving human foreskin fibroblasts (HFF) and the microsporidia Anncaliia algerae, a fungus related parasite with an obligate intracellular lifestyle and a strong host dependency. The host-parasite cross-talk was analyzed at five post-infection times 1, 6, 12 and 24 hours post-infection (hpi) and 8 days post-infection (dpi). A significant up-regulation of four interferon-induced proteins with tetratricopeptide repeats IFIT1, IFIT2, IFIT3 and MX1 was observed at 8 dpi suggesting a type 1 interferon (IFN) host response. Quantitative alteration of host proteins involved in biological functions such as signaling (STAT1, Ras) and reduction of the translation activity (EIF3) confirmed a host type 1 IFN response. Interestingly, the SILAC approach also allowed the detection of 148 A. algerae proteins during the kinetics of infection. Among these proteins many are involved in parasite proliferation, and an over-representation of putative secreted effectors proteins was observed. Finally our survey also suggests that A. algerae could use a transposable element as a lure strategy to escape the host innate immune system.
Collapse
|
21
|
Abstract
The non-long terminal repeat (LTR) retrotransposon I, which belongs to the I superfamily of non-LTR retrotransposons, is well known in Drosophila because it transposes at a high frequency in the female germline cells in I-R hybrid dysgenic crosses of Drosophila melanogaster. Here, we report the occurrence and the upregulation of an I-like element in the hybrids of two sister species belonging to the repleta group of the genus Drosophila, D. mojavensis, and D. arizonae. These two species display variable degrees of pre- and postzygotic isolation, depending on the geographic origin of the strains. We took advantage of these features to explore the transposable element (TE) dynamics in interspecific crosses. We fully characterized the copies of this TE family in the D. mojavensis genome and identified at least one complete copy. We showed that this element is transcriptionally active in the ovaries and testes of both species and in their hybrids. Moreover, we showed that this element is upregulated in hybrid males, which could be associated with the male-sterile phenotype.
Collapse
|
22
|
Abstract
Background Of the different bioinformatic methods used to recover transposable elements (TEs) in genome sequences, one of the most commonly used procedures is the homology-based method proposed by the RepeatMasker program. RepeatMasker generates several output files, including the .out file, which provides annotations for all detected repeats in a query sequence. However, a remaining challenge consists of identifying the different copies of TEs that correspond to the identified hits. This step is essential for any evolutionary/comparative analysis of the different copies within a family. Different possibilities can lead to multiple hits corresponding to a unique copy of an element, such as the presence of large deletions/insertions or undetermined bases, and distinct consensus corresponding to a single full-length sequence (like for long terminal repeat (LTR)-retrotransposons). These possibilities must be taken into account to determine the exact number of TE copies. Results We have developed a perl tool that parses the RepeatMasker .out file to better determine the number and positions of TE copies in the query sequence, in addition to computing quantitative information for the different families. To determine the accuracy of the program, we tested it on several RepeatMasker .out files corresponding to two organisms (Drosophila melanogaster and Homo sapiens) for which the TE content has already been largely described and which present great differences in genome size, TE content, and TE families. Conclusions Our tool provides access to detailed information concerning the TE content in a genome at the family level from the .out file of RepeatMasker. This information includes the exact position and orientation of each copy, its proportion in the query sequence, and its quality compared to the reference element. In addition, our tool allows a user to directly retrieve the sequence of each copy and obtain the same detailed information at the family level when a local library with incomplete TE class/subclass information was used with RepeatMasker. We hope that this tool will be helpful for people working on the distribution and evolution of TEs within genomes.
Collapse
|
23
|
Subcellular localization of ENS-1/ERNI in chick embryonic stem cells. PLoS One 2014; 9:e92039. [PMID: 24643087 PMCID: PMC3958431 DOI: 10.1371/journal.pone.0092039] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2013] [Accepted: 02/19/2014] [Indexed: 11/18/2022] Open
Abstract
The protein of retroviral origin ENS-1/ERNI plays a major role during neural plate development in chick embryos by controlling the activity of the epigenetic regulator HP1γ, but its function in the earlier developmental stages is still unknown. ENS-1/ERNI promoter activity is down-regulated upon differentiation but the resulting protein expression has never been examined. In this study, we present the results obtained with custom-made antibodies to gain further insights into ENS-1 protein expression in Chicken embryonic stem cells (CES) and during their differentiation. First, we show that ENS-1 controls the activity of HP1γ in CES and we examined the context of its interaction with HP1γ. By combining immunofluorescence and western blot analysis we show that ENS-1 is localized in the cytoplasm and in the nucleus, in agreement with its role on gene's promoter activity. During differentiation, ENS-1 decreases in the cytoplasm but not in the nucleus. More precisely, three distinct forms of the ENS-1 protein co-exist in the nucleus and are differently regulated during differentiation, revealing a new level of control of the protein ENS-1. In silico analysis of the Ens-1 gene copies and the sequence of their corresponding proteins indicate that this pattern is compatible with at least three potential regulation mechanisms, each accounting only partially. The results obtained with the anti-ENS-1 antibodies presented here reveal that the regulation of ENS-1 expression in CES is more complex than expected, providing new tracks to explore the integration of ENS-1 in CES cells regulatory networks.
Collapse
|
24
|
A comparative analysis of the amounts and dynamics of transposable elements in natural populations of Drosophila melanogaster and Drosophila simulans. JOURNAL OF ENVIRONMENTAL RADIOACTIVITY 2012; 113:83-86. [PMID: 22659421 DOI: 10.1016/j.jenvrad.2012.04.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/02/2012] [Revised: 03/23/2012] [Accepted: 04/04/2012] [Indexed: 06/01/2023]
Abstract
Genes are important in defining genetic variability, but they do not constitute the largest component of genomes, which in most organisms contain large amounts of various repeated sequences including transposable elements (TEs), which have been shown to account for most of the genome size. TEs contribute to genetic diversity by their mutational potential as a result of their ability to insert into genes or gene regulator regions, to promote chromosomal rearrangements, and to interfere with gene networks. Also, TEs may be activated by environmental stresses (such as temperature or radiation) that interfere with epigenetic regulation systems, and makes them powerful mutation agents in nature. To understand the relationship between genotype and phenotype, we need to analyze the portions of the genome corresponding to TEs in great detail, and to decipher their relationships with the genes. For this purpose, we carried out comparative analyses of various natural populations of the closely-related species Drosophila melanogaster and Drosophila simulans, which differ with regard to their TE amounts as well as their ecology and population size.
Collapse
|
25
|
The endogenous retrovirus ENS-1 provides active binding sites for transcription factors in embryonic stem cells that specify extra embryonic tissue. Retrovirology 2012; 9:21. [PMID: 22420414 PMCID: PMC3362752 DOI: 10.1186/1742-4690-9-21] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2011] [Accepted: 03/15/2012] [Indexed: 01/01/2023] Open
Abstract
Background Long terminal repeats (LTR) from endogenous retroviruses (ERV) are source of binding sites for transcription factors which affect the host regulatory networks in different cell types, including pluripotent cells. The embryonic epiblast is made of pluripotent cells that are subjected to opposite transcriptional regulatory networks to give rise to distinct embryonic and extraembryonic lineages. To assess the transcriptional contribution of ERV to early developmental processes, we have characterized in vitro and in vivo the regulation of ENS-1, a host adopted and developmentally regulated ERV that is expressed in chick embryonic stem cells. Results We show that Ens-1 LTR activity is controlled by two transcriptional pathways that drive pluripotent cells to alternative developmental fates. Indeed, both Nanog that maintains pluripotency and Gata4 that induces differentiation toward extraembryonic endoderm independently activate the LTR. Ets coactivators are required to support Gata factors' activity thus preventing inappropriate activation before epigenetic silencing occurs during differentiation. Consistent with their expression patterns during chick embryonic development, Gata4, Nanog and Ets1 are recruited on the LTR in embryonic stem cells; in the epiblast the complementary expression of Nanog and Gata/Ets correlates with the Ens-1 gene expression pattern; and Ens-1 transcripts are also detected in the hypoblast, an extraembryonic tissue expressing Gata4 and Ets2, but not Nanog. Accordingly, over expression of Gata4 in embryos induces an ectopic expression of Ens-1. Conclusion Our results show that Ens-1 LTR have co-opted conditions required for the emergence of extraembryonic tissues from pluripotent epiblasts cells. By providing pluripotent cells with intact binding sites for Gata, Nanog, or both, Ens-1 LTR may promote distinct transcriptional networks in embryonic stem cells subpopulations and prime the separation between embryonic and extraembryonic fates.
Collapse
|
26
|
The French way of life of mobile DNA. Mob Genet Elements 2011; 1:89-91. [DOI: 10.4161/mge.1.2.17455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2011] [Accepted: 07/12/2011] [Indexed: 11/19/2022] Open
|
27
|
Genes devoid of full-length transposable element insertions are involved in development and in the regulation of transcription in human and closely related species. J Mol Evol 2010; 71:180-91. [PMID: 20798934 DOI: 10.1007/s00239-010-9376-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2010] [Accepted: 07/26/2010] [Indexed: 02/04/2023]
Abstract
Transposable elements (TEs) are major components of mammalian genomes, and their impact on genome evolution is now well established. In recent years several findings have shown that they are associated with the expression level and function of genes. In this study, we analyze the relationships between human genes and full-length TE copies in terms of three factors (gene function, expression level, and selective pressure). We classified human genes according to their TE density, and found that TE-free genes are involved in important functions such as development, transcription, and the regulation of transcription, whereas TE-rich genes are involved in functions such as transport and metabolism. This trend is conserved through evolution. We show that this could be explained by a stronger selection pressure acting on both the coding and non-coding regions of TE-free genes than on those of TE-rich genes. The higher level of expression found for TE-rich genes in tumor and immune system tissues suggests that TEs play an important role in gene regulation.
Collapse
|
28
|
The evolutionary dynamics of the Helena retrotransposon revealed by sequenced Drosophila genomes. BMC Evol Biol 2009; 9:174. [PMID: 19624823 PMCID: PMC3087515 DOI: 10.1186/1471-2148-9-174] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2009] [Accepted: 07/22/2009] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Several studies have shown that genomes contain a mixture of transposable elements, some of which are still active and others ancient relics that have degenerated. This is true for the non-LTR retrotransposon Helena, of which only degenerate sequences have been shown to be present in some species (Drosophila melanogaster), whereas putatively active sequences are present in others (D. simulans). Combining experimental and population analyses with the sequence analysis of the 12 Drosophila genomes, we have investigated the evolution of Helena, and propose a possible scenario for the evolution of this element. RESULTS We show that six species of Drosophila have the Helena transposable element at different stages of its evolution. The copy number is highly variable among these species, but most of them are truncated at the 5' ends and also harbor several internal deletions and insertions suggesting that they are inactive in all species, except in D. mojavensis in which quantitative RT-PCR experiments have identified a putative active copy. CONCLUSION Our data suggest that Helena was present in the common ancestor of the Drosophila genus, which has been vertically transmitted to the derived lineages, but that it has been lost in some of them. The wide variation in copy number and sequence degeneration in the different species suggest that the evolutionary dynamics of Helena depends on the genomic environment of the host species.
Collapse
|
29
|
Identification of expressed transposable element insertions in the sequenced genome of Drosophila melanogaster. Gene 2009; 439:55-62. [PMID: 19332112 DOI: 10.1016/j.gene.2009.03.015] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2009] [Revised: 03/12/2009] [Accepted: 03/16/2009] [Indexed: 01/20/2023]
Abstract
Transposable elements (TEs) are major components of most genomes, and their impact on genome evolution is now well documented. However, the way they affect the transcriptome is still not clearly established. Using the sequenced genome of Drosophila melanogaster and EST libraries, we describe here the TE insertions that are unequivocally transcribed, and we have determined their location in the sequenced genome. We show that most TE families are transcribed, and we have specifically identified 69 expressed TE insertions, half of which are located inside genes, mostly within introns and 5'UTRs.
Collapse
|
30
|
Genomic environment influences the dynamics of the tirant LTR retrotransposon in Drosophila. FASEB J 2009; 23:1482-9. [PMID: 19141532 DOI: 10.1096/fj.08-123513] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Combining genome sequence analysis and functional analysis, we show that some full-length copies of tirant are present in heterochromatic regions in Drosophila simulans and that when tested in vitro, these copies have a functional promoter. However, when inserted in heterochromatic regions, tirant copies are inactive in vivo, and only transcription of euchromatic copies can be detected. Thus, our data indicate that the localization of the element is a hallmark of its activity in vivo and raise the question of genomic invasions by transposable elements and the importance of their genomic integration sites.
Collapse
|
31
|
Losing helena: the extinction of a drosophila line-like element. BMC Genomics 2008; 9:149. [PMID: 18377637 PMCID: PMC2330053 DOI: 10.1186/1471-2164-9-149] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2007] [Accepted: 03/31/2008] [Indexed: 11/25/2022] Open
Abstract
Background Transposable elements (TEs) are major players in evolution. We know that they play an essential role in genome size determination, but we still have an incomplete understanding of the processes involved in their amplification and elimination from genomes and populations. Taking advantage of differences in the amount and distribution of the Long Interspersed Nuclear Element (LINE), helena in Drosophila melanogaster and D. simulans, we analyzed the DNA sequences of copies of this element in samples of various natural populations of these two species. Results In situ hybridization experiments revealed that helena is absent from the chromosome arms of D. melanogaster, while it is present in the chromosome arms of D. simulans, which is an unusual feature for a TE in these species. Molecular analyses showed that the helena sequences detected in D. melanogaster were all deleted copies, which diverged from the canonical element. Natural populations of D. simulans have several copies, a few of them full-length, but most of them internally deleted. Conclusion Overall, our data suggest that a mechanism that induces internal deletions in the helena sequences is active in the D. simulans genome.
Collapse
|
32
|
Maintenance in the Chicken Genome of the Retroviral-like cENS Gene Family Specifically Expressed in Early Embryos. J Mol Evol 2007; 65:215-27. [PMID: 17671751 DOI: 10.1007/s00239-007-9001-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2006] [Accepted: 05/18/2007] [Indexed: 02/05/2023]
Abstract
Embryonic stem (ES) cells are important developmental cells that appear very early during development and subsequently give rise to all the cell lineages of the future adult organism. In these cells a limited subset of transcription factors is expressed that are well conserved among species and essential for the fate of the stem cell. The transcriptome analysis of ES cells from chicken has revealed a gene family, cENS, that is specifically expressed in ES cells and in early embryos and is repressed during the differentiation process. This family is characterized by displaying retroviral structures and shares no homology with other species' genes. These characteristics are probably not restricted to the chicken genome and raise the question of whether similar genes are present and have been maintained in other species. We have examined the different copies of this gene in the sequenced chicken genome to investigate its dynamics and its evolution. We have distinguished two groups of cENS-related copies. The first group, resulting from recent transposition events, contains the transcribed ENS-1 and ENS-3 plus copies subjected to negative selection pressures. The second group contains degenerate copies that were integrated into the genome earlier. Comparison with copies previously isolated from three Galliformes showed that they are also subjected to selection pressures. We also detected numerous solo-LTRs containing the ENS-1 promoter that may control the expression of host genes. Taken together, these findings suggest a function sustained by a neogene of retroviral origin during the early stages of chicken development.
Collapse
|
33
|
Influence of the transposable element neighborhood on human gene expression in normal and tumor tissues. Gene 2007; 396:303-11. [PMID: 17490832 DOI: 10.1016/j.gene.2007.04.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2006] [Revised: 03/16/2007] [Accepted: 04/02/2007] [Indexed: 11/16/2022]
Abstract
Transposable elements (TEs) are genomic sequences able to replicate themselves, and to move from one chromosomal position to another within the genome. Many TEs contain their own regulatory regions, which means that they may influence the expression of neighboring genes. TEs may also be activated and transcribed in various cancers. We therefore tested whether gene expression in normal and tumor tissues is influenced by the neighboring TEs. To do this, we associated all human genes to the nearest TEs. We analyzed the expression of these genes in normal and tumor tissues using SAGE and EST data, and related this to the presence and type of TEs in their vicinity. We confirmed that TEs tend to be located in antisense orientation relative to their hosting genes. We found that the average number of tissues where a gene is expressed varies depending on the type of TEs located near the gene, and that the difference in expression level between normal and tumor tissues is greatest for genes that host SINE elements. This deregulation increases with the number of SINE copies in the gene vicinity. This suggests that SINE elements might contribute to the cascade of gene deregulation in cancer cells.
Collapse
|
34
|
Abstract
Pseudogenes are now known to be a regular feature of bacterial genomes and are found in particularly high numbers within the genomes of recently emerged bacterial pathogens. As most pseudogenes are recognized by sequence alignments, we use newly available genomic sequences to identify the pseudogenes in 11 genomes from 4 bacterial genera, each of which contains at least 1 human pathogen. The numbers of pseudogenes range from 27 in Staphylococcus aureus MW2 to 337 in Yersinia pestis CO92 (e.g. 1–8% of the annotated genes in the genome). Most pseudogenes are formed by small frameshifting indels, but because stop codons are A + T-rich, the two low-G + C Gram-positive taxa (Streptococcus and Staphylococcus) have relatively high fractions of pseudogenes generated by nonsense mutations when compared with more G + C-rich genomes. Over half of the pseudogenes are produced from genes whose original functions were annotated as ‘hypothetical’ or ‘unknown’; however, several broadly distributed genes involved in nucleotide processing, repair or replication have become pseudogenes in one of the sequenced Vibrio vulnificus genomes. Although many of our comparisons involved closely related strains with broadly overlapping gene inventories, each genome contains a largely unique set of pseudogenes, suggesting that pseudogenes are formed and eliminated relatively rapidly from most bacterial genomes.
Collapse
|
35
|
Abstract
Even in lieu of a dependable species concept for asexual organisms, the classification of bacteria into discrete taxonomic units is considered to be obstructed by the potential for lateral gene transfer (LGT) among lineages at virtually all phylogenetic levels. In most bacterial genomes, large proportions of genes are introduced by LGT, as indicated by their compositional features and/or phylogenetic distributions, and there is also clear evidence of LGT between very distantly related organisms. By adopting a whole-genome approach, which examined the history of every gene in numerous bacterial genomes, we show that LGT does not hamper phylogenetic reconstruction at many of the shallower taxonomic levels. Despite the high levels of gene acquisition, the only taxonomic group for which appreciable amounts of homologous recombination were detected was within bacterial species. Taken as a whole, the results derived from the analysis of complete gene inventories support several of the current means to recognize and define bacterial species.
Collapse
|
36
|
Abstract
Explaining the diversity of gene repertoires has been a major problem in modern evolutionary biology. In eukaryotes, this diversity is believed to result mainly from gene duplication and loss, but in prokaryotes, lateral gene transfer (LGT) can also contribute substantially to genome contents. To determine the histories of gene inventories, we conducted an exhaustive analysis of gene phylogenies for all gene families in a widely sampled group, the γ-Proteobacteria. We show that, although these bacterial genomes display striking differences in gene repertoires, most gene families having representatives in several species have congruent histories. Other than the few vast multigene families, gene duplication has contributed relatively little to the contents of these genomes; instead, LGT, over time, provides most of the diversity in genomic repertoires. Most such acquired genes are lost, but the majority of those that persist in genomes are transmitted strictly vertically. Although our analyses are limited to the γ-Proteobacteria, these results resolve a long-standing paradox—i.e., the ability to make robust phylogenetic inferences in light of substantial LGT. Lateral gene transfer, rather than duplication, is responsible for most gene diversity present in gamma-Protobacteria; however, these genes are then vertically transmitted and have little impact on gene phylogenies
Collapse
|
37
|
Abstract
The resolution of the complete sequences of several hemiascomycete genomes provides new insights into the ways that yeast genomes change in size and in gene contents. These genomes provide evidence of whole-genome duplication occurring before the divergence of Saccharomyces cerevisiae and Candida glabrata, followed by massive gene loss that restored diploidy. The pattern of genome evolution in yeast differs from that in bacteria apparently as a result of stronger selective constraints on bacterial chromosomes.
Collapse
|
38
|
Abstract
Because bacterial chromosomes are tightly packed with genes and were traditionally viewed as being optimized for size and replication speed, it was not surprising that the early annotations of sequenced bacterial genomes reported few, if any, pseudogenes. But because pseudogenes are generally recognized by comparisons with their functional counterparts, as more genome sequences accumulated, many bacterial pathogens were found to harbor large numbers of truncated, inactivated, and degraded genes. Because the mutational events that inactivate genes occur continuously in all genomes, we investigated whether the rarity of pseudogenes in some bacteria was attributable to properties inherent to the organism or to the failure to recognize pseudogenes. By developing a program suite (called Psi-Phi, for Psi-gene Finder) that applies a comparative method to identify pseudogenes (attributable both to misannotation and to nonrecognition), we analyzed the pseudogene inventories in the sequenced members of the Escherichia coli/Shigella clade. This approach recovered hundreds of previously unrecognized pseudogenes and showed that pseudogenes are a regular feature of bacterial genomes, even in those whose original annotations registered no truncated or otherwise inactivated genes. In Shigella flexneri 2a, large proportions of pseudogenes are generated by nonsense mutations and IS element insertions, events that seldom produce the pseudogenes present in the other genomes examined. Almost all (>95%) pseudogenes are restricted to only one of the genomes and are of relatively recent origin, suggesting that these bacteria possess active mechanisms to eliminate nonfunctional genes.
Collapse
|
39
|
Abstract
Oikopleura dioica is a pelagic tunicate with a very small genome and a very short life cycle. In order to investigate the intron-exon organizations in Oikopleura, we have isolated and characterized ribosomal protein EF-1alpha, Hox, and alpha-tubulin genes. Their intron positions have been compared with those of the same genes from various invertebrates and vertebrates, including four species with entirely sequenced genomes. Oikopleura genes, like Caenorhabditis genes, have introns at a large number of nonconserved positions, which must originate from late insertions or intron sliding of ancient insertions. Both species exhibit hypervariable intron-exon organization within their alpha-tubulin gene family. This is due to localization of most nonconserved intron positions in single members of this gene family. The hypervariability and divergence of intron positions in Oikopleura and Caenorhabditis may be related to the predominance of short introns, the processing of which is not very dependent upon the exonic environment compared to large introns. Also, both species have an undermethylated genome, and the control of methylation-induced point mutations imposes a control on exon size, at least in vertebrate genes. That introns placed at such variable positions in Oikopleura or C. elegans may serve a specific purpose is not easy to infer from our current knowledge and hypotheses on intron functions. We propose that new introns are retained in species with very short life cycles, because illegitimate exchanges including gene conversion are repressed. We also speculate that introns placed at gene-specific positions may contribute to suppressing these exchanges and thereby favor their own persistence.
Collapse
|
40
|
Abstract
Communication among bacterial cells through quorum-sensing (QS) systems is used to regulate ecologically and medically important traits, including virulence to hosts. QS is widespread in bacteria; it has been demonstrated experimentally in diverse phylogenetic groups, and homologs to the implicated genes have been discovered in a large proportion of sequenced bacterial genomes. The widespread distribution of the underlying gene families (LuxI/R and LuxS) raises the questions of how often QS genes have been transferred among bacterial lineages and the extent to which genes in the same QS system exchange partners or coevolve. Phylogenetic analyses of the relevant gene families show that the genes annotated as LuxI/R inducer and receptor elements comprise two families with virtually no homology between them and with one family restricted to the gamma-Proteobacteria and the other more widely distributed. Within bacterial phyla, trees for the LuxS and the two LuxI/R families show broad agreement with the ribosomal RNA tree, suggesting that these systems have been continually present during the evolution of groups such as the Proteobacteria and the Firmicutes. However, lateral transfer can be inferred for some genes (e.g., from Firmicutes to some distantly related lineages for LuxS). In general, the inducer/receptor elements in the LuxI/R systems have evolved together with little exchange of partners, although loss or replacement of partners has occurred in several lineages of gamma-Proteobacteria, the group for which sampling is most intensive in current databases. For instance, in Pseudomonas aeruginosa, a transferred QS system has been incorporated into the pathway of a native one. Gene phylogenies for the main LuxI/R family in Pseudomonas species imply a complex history of lateral transfer, ancestral duplication, and gene loss within the genus.
Collapse
|
41
|
From gene trees to organismal phylogeny in prokaryotes: the case of the gamma-Proteobacteria. PLoS Biol 2003; 1:E19. [PMID: 12975657 PMCID: PMC193605 DOI: 10.1371/journal.pbio.0000019] [Citation(s) in RCA: 364] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2003] [Accepted: 07/30/2003] [Indexed: 11/21/2022] Open
Abstract
The rapid increase in published genomic sequences for bacteria presents the first opportunity to reconstruct evolutionary events on the scale of entire genomes. However, extensive lateral gene transfer (LGT) may thwart this goal by preventing the establishment of organismal relationships based on individual gene phylogenies. The group for which cases of LGT are most frequently documented and for which the greatest density of complete genome sequences is available is the gamma-Proteobacteria, an ecologically diverse and ancient group including free-living species as well as pathogens and intracellular symbionts of plants and animals. We propose an approach to multigene phylogeny using complete genomes and apply it to the case of the gamma-Proteobacteria. We first applied stringent criteria to identify a set of likely gene orthologs and then tested the compatibilities of the resulting protein alignments with several phylogenetic hypotheses. Our results demonstrate phylogenetic concordance among virtually all (203 of 205) of the selected gene families, with each of the exceptions consistent with a single LGT event. The concatenated sequences of the concordant families yield a fully resolved phylogeny. This topology also received strong support in analyses aimed at excluding effects of heterogeneity in nucleotide base composition across lineages. Our analysis indicates that single-copy orthologous genes are resistant to horizontal transfer, even in ancient bacterial groups subject to high rates of LGT. This gene set can be identified and used to yield robust hypotheses for organismal phylogenies, thus establishing a foundation for reconstructing the evolutionary transitions, such as gene transfer, that underlie diversity in genome content and organization.
Collapse
|
42
|
Sequence divergence within transposable element families in the Drosophila melanogaster genome. Genome Res 2003; 13:1889-96. [PMID: 12869581 PMCID: PMC403780 DOI: 10.1101/gr.827603] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The availability of the sequenced Drosophila melanogaster genome provides an opportunity to study sequence variation between copies within transposable element families. In this study,we analyzed the 624 copies of 22 transposable element (TE) families (14 LTR retrotransposons, five non-LTR retrotransposons, and three transposons). LTR and non-LTR retrotransposons possessed far fewer divergent elements than the transposons,suggesting that the difference depends on the transposition mechanism. However,there was not a continuous range of divergence of the copies in each class,which were either very similar to the canonical elements,or very divergent from them. This sequence homogeneity among TE family copies matches the theoretical models of the dynamics of these repeated sequences. The sequenced Drosophila genome thus appears to be composed of a mixture of TEs that are still active and of ancient relics that have degenerated and the distribution of which along the chromosomes results from natural selection. This clearly demonstrates that the TEs are highly active within the genome,suggesting that the genetic variability of the Drosophila genome is still being renewed by the action of TEs.
Collapse
|
43
|
The source of laterally transferred genes in bacterial genomes. Genome Biol 2003; 4:R57. [PMID: 12952536 PMCID: PMC193657 DOI: 10.1186/gb-2003-4-9-r57] [Citation(s) in RCA: 148] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2003] [Revised: 06/11/2003] [Accepted: 07/04/2003] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Laterally transferred genes have often been identified on the basis of compositional features that distinguish them from ancestral genes in the genome. These genes are usually A+T-rich, arguing either that there is a bias towards acquiring genes from donor organisms having low G+C contents or that genes acquired from organisms of similar genomic base compositions go undetected in these analyses. RESULTS By examining the genome contents of closely related, fully sequenced bacteria, we uncovered genes confined to a single genome and examined the sequence features of these acquired genes. The analysis shows that few transfer events are overlooked by compositional analyses. Most observed lateral gene transfers do not correspond to free exchange of regular genes among bacterial genomes, but more probably represent the constituents of phages or other selfish elements. CONCLUSIONS Although bacteria tend to acquire large amounts of DNA, the origin of these genes remains obscure. We have shown that contrary to what is often supposed, their composition cannot be explained by a previous genomic context. In contrast, these genes fit the description of recently described genes in lambdoid phages, named 'morons'. Therefore, results from genome content and compositional approaches to detect lateral transfers should not be cited as evidence for genetic exchange between distantly related bacteria.
Collapse
|
44
|
|
45
|
Codon usage by transposable elements and their host genes in five species. J Mol Evol 2002; 54:625-37. [PMID: 11965435 DOI: 10.1007/s00239-001-0059-0] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2001] [Accepted: 10/29/2001] [Indexed: 10/28/2022]
Abstract
We compared the codon usage of sequences of transposable elements (TEs) with that of host genes from the species Drosophila melanogaster, Arabidopsis thaliana, Caenorhabditis elegans, Saccharomyces cerevisiae, and Homo sapiens. Factorial correspondence analysis showed that, regardless of the base composition of the genome, the TEs differed from the genes of their host species by their AT-richness. In all species, the percentage of A + T on the third codon position of the TEs was higher than that on the first codon position and lower than that in the noncoding DNA of the genomes. This indicates that the codon choice is not simply the outcome of mutational bias but is also subject to selection constraints. A tendency toward higher A + T on the third position than on the first position was also found in the host genes of A. thaliana, C. elegans, and S. cerevisiae but not in those of D. melanogaster and H. sapiens. This strongly suggests that the AT choice is a host-independent characteristic common to all TEs. The codon usage of TEs generally appeared to be different from the mean of the host genes. In the AT-rich genomes of Arabidopsis thaliana, Caenorhabditis elegans, and Saccharomyces cerevisiae, the codon usage bias of TEs was similar to that of weakly expressed genes. In the GC-rich genome of D. melanogaster, however, the bias in codon usage of the TEs clearly differed from that of weakly expressed genes. These findings suggest that selection acts on TEs and that TEs may display specific behavior within the host genomes.
Collapse
|
46
|
Is the evolution of transposable elements modular? Genetica 2000; 107:15-25. [PMID: 10952194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/17/2023]
Abstract
The evolution of transposable element structures can be analyzed in populations and species and by comparing the functional domains in the main classes of elements. We begin with a synthesis of what we know about the evolution of the mariner elements in the Drosophilidae family in terms of populations and species. We suggest that internal deletion does not occur at random, but appears to frequently occur between short internal repeats. We compared the functional domains of the DNA and/or amino acid sequences to detect similarities between the main classes of elements. This included the gag, reverse transcriptase, and envelope genes of retrotransposons and retroviruses, and the integrases of retrotransposons and retroviruses, and transposases of class II elements. We find that each domain can have its own evolutionary history. Thus, the evolution of transposable elements can be seen to be modular.
Collapse
|
47
|
|
48
|
Abstract
Retroviruses and long terminal repeat (LTR) retrotransposons share a common structural organization. The main difference between these retroelements is the presence of a functional envelope (env) gene in retroviruses, which is absent or nonfunctional in LTR retrotransposons. Several similarities between these two groups of retroelements have been detected for the reverse transcriptase, gag, and integrase domains. Assuming that each of these domains shares a common ancestral sequence, several hypotheses could account for the emergence of retroviruses from LTR retrotransposons. In this context, the positions of elements such as gypsy and the members of the Ty3 subfamily are not clear, since they are classified as retroviruses but phylogenetically they are assigned to the LTR retrotransposon group. We compared the env gene products of these retroelements and identified two similar motifs in retroviruses and LTR retrotransposons. These two regions do not occur in the same order. If we assume that they are derived from the same ancestral sequence, this could result from independent acquisition of the various domains rather than the single acquisition of the whole env gene. However, we cannot exclude the possibility that the env gene was reorganized after being acquired. Trees based on these regions show that these two groups of elements are clearly distinguished. These trees are similar to those obtained from reverse transcriptase or integrase. In trees based on reverse transcriptase, the retroviruses with complete or partial env genes can be distinguished from the other LTR retrotransposons.
Collapse
|
49
|
|