1
|
Liao X, Zhu W, Zhou J, Li H, Xu X, Zhang B, Gao X. Repetitive DNA sequence detection and its role in the human genome. Commun Biol 2023; 6:954. [PMID: 37726397 PMCID: PMC10509279 DOI: 10.1038/s42003-023-05322-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 09/04/2023] [Indexed: 09/21/2023] Open
Abstract
Repetitive DNA sequences playing critical roles in driving evolution, inducing variation, and regulating gene expression. In this review, we summarized the definition, arrangement, and structural characteristics of repeats. Besides, we introduced diverse biological functions of repeats and reviewed existing methods for automatic repeat detection, classification, and masking. Finally, we analyzed the type, structure, and regulation of repeats in the human genome and their role in the induction of complex diseases. We believe that this review will facilitate a comprehensive understanding of repeats and provide guidance for repeat annotation and in-depth exploration of its association with human diseases.
Collapse
Affiliation(s)
- Xingyu Liao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Wufei Zhu
- Department of Endocrinology, Yichang Central People's Hospital, The First College of Clinical Medical Science, China Three Gorges University, 443000, Yichang, P.R. China
| | - Juexiao Zhou
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Haoyang Li
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Xiaopeng Xu
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Bin Zhang
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia.
| |
Collapse
|
2
|
Miniature Inverted-Repeat Transposable Elements (MITEs) in the Two Lepidopteran Genomes of Helicoverpa armigera and Helicoverpa zea. INSECTS 2022; 13:insects13040313. [PMID: 35447755 PMCID: PMC9033116 DOI: 10.3390/insects13040313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 03/10/2022] [Accepted: 03/20/2022] [Indexed: 02/04/2023]
Abstract
Simple Summary Miniature inverted-repeat transposable elements (MITEs) are non-autonomous transposable elements that play important roles in genome organization and evolution. Helicoverpa armigera and Helicoverpa zea shows a high number of reported cases of insecticide resistance worldwide, having evolved resistance against pyrethroids, organophosphates, carbamates, organochlorines, and recently to macrocyclic lactone spinosad and several Bacillus thuringiensis toxins. In the present study, we conducted a genome screening of MITEs in the H. armigera and H. zea genomes using bioinformatics approaches, and the results revealed a total of 3570 and 7405 MITE sequences in the H. armigera and H. zea genomes, respectively. Among these MITEs, we highlighted eleven MITE insertions in the H. armigera defensome genes and only one MITE insertion in those of H. zea. Abstract Miniature inverted-repeat transposable elements MITEs are ubiquitous, non-autonomous class II transposable elements. The moths, Helicoverpa armigera and Helicoverpa zea, are recognized as the two most serious pest species within the genus. Moreover, these pests have the ability to develop insecticide resistance. In the present study, we conducted a genome-wide analysis of MITEs present in H. armigera and H. zea genomes using the bioinformatics tool, MITE tracker. Overall, 3570 and 7405 MITE sequences were identified in H. armigera and H. zea genomes, respectively. Comparative analysis of identified MITE sequences in the two genomes led to the identification of 18 families, comprising 140 MITE members in H. armigera and 161 MITE members in H. zea. Based on target site duplication (TSD) sequences, the identified families were classified into three superfamilies (PIF/harbinger, Tc1/mariner and CACTA). Copy numbers varied from 6 to 469 for each MITE family. Finally, the analysis of MITE insertion sites in defensome genes showed intronic insertions of 11 MITEs in the cytochrome P450, ATP-binding cassette transporter (ABC) and esterase genes in H. armigera whereas for H. zea, only one MITE was retrieved in the ABC-C2 gene. These insertions could thus be involved in the insecticide resistance observed in these pests.
Collapse
|
3
|
Finding and Characterizing Repeats in Plant Genomes. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2443:327-385. [PMID: 35037215 DOI: 10.1007/978-1-0716-2067-0_18] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Plant genomes contain a particularly high proportion of repeated structures of various types. This chapter proposes a guided tour of the available software that can help biologists to scan automatically for these repeats in sequence data or check hypothetical models intended to characterize their structures. Since transposable elements (TEs) are a major source of repeats in plants, many methods have been used or developed for this broad class of sequences. They are representative of the range of tools available for other classes of repeats and we have provided two sections on this topic (for the analysis of genomes or directly of sequenced reads), as well as a selection of the main existing software. It may be hard to keep up with the profusion of proposals in this dynamic field and the rest of the chapter is devoted to the foundations of an efficient search for repeats and more complex patterns. We first introduce the key concepts of the art of indexing and mapping or querying sequences. We end the chapter with the more prospective issue of building models of repeat families. We present the Machine Learning approach first, seeking to build predictors automatically for some families of ET, from a set of sequences known to belong to this family. A second approach, the linguistic (or syntactic) approach, allows biologists to describe themselves and check the validity of models of their favorite repeat family.
Collapse
|
4
|
Liao X, Li M, Hu K, Wu FX, Gao X, Wang J. A sensitive repeat identification framework based on short and long reads. Nucleic Acids Res 2021; 49:e100. [PMID: 34214175 PMCID: PMC8464074 DOI: 10.1093/nar/gkab563] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Revised: 06/08/2021] [Accepted: 06/18/2021] [Indexed: 12/11/2022] Open
Abstract
Numerous studies have shown that repetitive regions in genomes play indispensable roles in the evolution, inheritance and variation of living organisms. However, most existing methods cannot achieve satisfactory performance on identifying repeats in terms of both accuracy and size, since NGS reads are too short to identify long repeats whereas SMS (Single Molecule Sequencing) long reads are with high error rates. In this study, we present a novel identification framework, LongRepMarker, based on the global de novo assembly and k-mer based multiple sequence alignment for precisely marking long repeats in genomes. The major characteristics of LongRepMarker are as follows: (i) by introducing barcode linked reads and SMS long reads to assist the assembly of all short paired-end reads, it can identify the repeats to a greater extent; (ii) by finding the overlap sequences between assemblies or chomosomes, it locates the repeats faster and more accurately; (iii) by using the multi-alignment unique k-mers rather than the high frequency k-mers to identify repeats in overlap sequences, it can obtain the repeats more comprehensively and stably; (iv) by applying the parallel alignment model based on the multi-alignment unique k-mers, the efficiency of data processing can be greatly optimized and (v) by taking the corresponding identification strategies, structural variations that occur between repeats can be identified. Comprehensive experimental results show that LongRepMarker can achieve more satisfactory results than the existing de novo detection methods (https://github.com/BioinformaticsCSU/LongRepMarker).
Collapse
Affiliation(s)
- Xingyu Liao
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, P.R. China
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, P.R. China
| | - Kang Hu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, P.R. China
| | - Fang-Xiang Wu
- Department of Mechanical Engineering and Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N5A9, Canada
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, P.R. China
| |
Collapse
|
5
|
Kundu S, Ray MD, Sharma A. Interplay between genome organization and epigenomic alterations of pericentromeric DNA in cancer. J Genet Genomics 2021; 48:184-197. [PMID: 33840602 DOI: 10.1016/j.jgg.2021.02.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2020] [Revised: 02/07/2021] [Accepted: 02/20/2021] [Indexed: 12/16/2022]
Abstract
In eukaryotic genome biology, the genomic organization inside the three-dimensional (3D) nucleus is highly complex, and whether this organization governs gene expression is poorly understood. Nuclear lamina (NL) is a filamentous meshwork of proteins present at the lining of inner nuclear membrane that serves as an anchoring platform for genome organization. Large chromatin domains termed as lamina-associated domains (LADs), play a major role in silencing genes at the nuclear periphery. The interaction of the NL and genome is dynamic and stochastic. Furthermore, many genes change their positions during developmental processes or under disease conditions such as cancer, to activate certain sorts of genes and/or silence others. Pericentromeric heterochromatin (PCH) is mostly in the silenced region within the genome, which localizes at the nuclear periphery. Studies show that several genes located at the PCH are aberrantly expressed in cancer. The interesting question is that despite being localized in the pericentromeric region, how these genes still manage to overcome pericentromeric repression. Although epigenetic mechanisms control the expression of the pericentromeric region, recent studies about genome organization and genome-nuclear lamina interaction have shed light on a new aspect of pericentromeric gene regulation through a complex and coordinated interplay between epigenomic remodeling and genomic organization in cancer.
Collapse
Affiliation(s)
- Subhadip Kundu
- Laboratory of Chromatin and Cancer Epigenetics, Department of Biochemistry, All India Institute of Medical Sciences, Ansari Nagar, New Delhi 110029, India
| | - M D Ray
- Department of Surgical Oncology, IRCH, All India Institute of Medical Sciences, Ansari Nagar, New Delhi 110029, India
| | - Ashok Sharma
- Laboratory of Chromatin and Cancer Epigenetics, Department of Biochemistry, All India Institute of Medical Sciences, Ansari Nagar, New Delhi 110029, India.
| |
Collapse
|
6
|
Pan J, Luo X, Bian J, Shao T, Li C, Zhao T, Zhang S, Zhou F, Wang G. Identification of Genomic Islands in Synechococcus sp. WH8102 Using Genomic Barcode and Whole-Genome Microarray Analysis. Curr Bioinform 2021. [DOI: 10.2174/1574893615666200121160615] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Synechococcus sp. WH8102 is one of the most abundant photosynthetic organisms in many ocean regions.
Objective:
The aim of this study is to identify genomic islands (GIs) in Synechococcus sp. WH8102 with integrated methods.
Methods:
We have applied genomic barcode to identify the GIs in Synechococcus sp. WH8102, which could make genomic regions of different origins visually apparent. The gene expression data of the predicted GIs was analyzed through microarray data which was collected for functional analysis of the relevant genes.
Results:
Seven GIs were identified in Synechococcus sp. WH8102. Most of them are involved in cell surface modification, photosynthesis and drug resistance. In addition, our analysis also revealed the functions of these GIs, which could be used for in-depth study on the evolution of this strain.
Conclusion:
Genomic barcodes provide us with a comprehensive and intuitive view of the target genome. We can use it to understand the intrinsic characteristics of the whole genome and identify GIs or other similar elements.
Collapse
Affiliation(s)
- Jiahui Pan
- Department of Pathogenbiology, The Key Laboratory of Zoonosis, Chinese Ministry of Education, College of Basic Medicine, Jilin University, Changchun, 130021,China
| | - Xizi Luo
- Department of Pathogenbiology, The Key Laboratory of Zoonosis, Chinese Ministry of Education, College of Basic Medicine, Jilin University, Changchun, 130021,China
| | - Jiang Bian
- Jilin Provincial Center for Disease Control and Prevention, Changchun, 130062,China
| | - Tong Shao
- Department of Pathogenbiology, The Key Laboratory of Zoonosis, Chinese Ministry of Education, College of Basic Medicine, Jilin University, Changchun, 130021,China
| | - Chaoying Li
- Department of Pathogenbiology, The Key Laboratory of Zoonosis, Chinese Ministry of Education, College of Basic Medicine, Jilin University, Changchun, 130021,China
| | - Tingting Zhao
- Department of Pathogenbiology, The Key Laboratory of Zoonosis, Chinese Ministry of Education, College of Basic Medicine, Jilin University, Changchun, 130021,China
| | - Shiwei Zhang
- Department of Pathogenbiology, The Key Laboratory of Zoonosis, Chinese Ministry of Education, College of Basic Medicine, Jilin University, Changchun, 130021,China
| | - Fengfeng Zhou
- Department of Pathogenbiology, The Key Laboratory of Zoonosis, Chinese Ministry of Education, College of Basic Medicine, Jilin University, Changchun, 130021,China
| | - Guoqing Wang
- Department of Pathogenbiology, The Key Laboratory of Zoonosis, Chinese Ministry of Education, College of Basic Medicine, Jilin University, Changchun, 130021,China
| |
Collapse
|
7
|
MiteFinderII: a novel tool to identify miniature inverted-repeat transposable elements hidden in eukaryotic genomes. BMC Med Genomics 2018; 11:101. [PMID: 30453969 PMCID: PMC6245586 DOI: 10.1186/s12920-018-0418-y] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Background Miniature inverted-repeat transposable element (MITE) is a type of class II non-autonomous transposable element playing a crucial role in the process of evolution in biology. There is an urgent need to develop bioinformatics tools to effectively identify MITEs on a whole genome-wide scale. However, most of currently existing tools suffer from low ability to deal with large eukaryotic genomes. Methods In this paper, we proposed a novel tool MiteFinderII, which was adapted from our previous algorithm MiteFinder, to efficiently detect MITEs from genomics sequences. It has six major steps: (1) build K-mer Index and search for inverted repeats; (2) filtration of inverted repeats with low complexity; (3) merger of inverted repeats; (4) filtration of candidates with low score; (5) selection of final MITE sequences; (6) selection of representative sequences. Results To test the performance, MiteFinderII and three other existing algorithms were applied to identify MITEs on the whole genome of oryza sativa. Results suggest that MiteFinderII outperforms existing popular tools in terms of both specificity and recall. Additionally, it is much faster and more memory-efficient than other tools in the detection. Conclusion MiteFinderII is an accurate and effective tool to detect MITEs hidden in eukaryotic genomes. The source code is freely accessible at the website: https://github.com/screamer/miteFinder.
Collapse
|
8
|
Bm-muted , orthologous to mouse muted and encoding a subunit of the BLOC-1 complex, is responsible for the otm translucent mutation of the silkworm Bombyx mori. Gene 2017; 629:92-100. [DOI: 10.1016/j.gene.2017.07.071] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2017] [Revised: 07/08/2017] [Accepted: 07/27/2017] [Indexed: 11/18/2022]
|
9
|
Ge R, Mai G, Zhang R, Wu X, Wu Q, Zhou F. MUSTv2: An Improved De Novo Detection Program for Recently Active Miniature Inverted Repeat Transposable Elements (MITEs). J Integr Bioinform 2017; 14:/j/jib.ahead-of-print/jib-2017-0029/jib-2017-0029.xml. [PMID: 28796642 PMCID: PMC6042816 DOI: 10.1515/jib-2017-0029] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2017] [Accepted: 05/08/2017] [Indexed: 11/15/2022] Open
Abstract
Background Miniature inverted repeat transposable element (MITE) is a short transposable element, carrying no protein-coding regions. However, its high proliferation rate and sequence-specific insertion preference renders it as a good genetic tool for both natural evolution and experimental insertion mutagenesis. Recently active MITE copies are those with clear signals of Terminal Inverted Repeats (TIRs) and Direct Repeats (DRs), and are recently translocated into their current sites. Their proliferation ability renders them good candidates for the investigation of genomic evolution. Results This study optimizes the C++ code and running pipeline of the MITE Uncovering SysTem (MUST) by assuming no prior knowledge of MITEs required from the users, and the current version, MUSTv2, shows significantly increased detection accuracy for recently active MITEs, compared with similar programs. The running speed is also significantly increased compared with MUSTv1. We prepared a benchmark dataset, the simulated genome with 150 MITE copies for researchers who may be of interest. Conclusions MUSTv2 represents an accurate detection program of recently active MITE copies, which is complementary to the existing template-based MITE mapping programs. We believe that the release of MUSTv2 will greatly facilitate the genome annotation and structural analysis of the bioOMIC big data researchers.
Collapse
|
10
|
Han MJ, Zhou QZ, Zhang HH, Tong X, Lu C, Zhang Z, Dai F. iMITEdb: the genome-wide landscape of miniature inverted-repeat transposable elements in insects. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw148. [PMID: 28025339 PMCID: PMC5199201 DOI: 10.1093/database/baw148] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/13/2016] [Revised: 09/19/2016] [Accepted: 10/18/2016] [Indexed: 01/23/2023]
Abstract
Miniature inverted-repeat transposable elements (MITEs) have attracted much attention due to their widespread occurrence and high copy numbers in eukaryotic genomes. However, the systematic knowledge about MITEs in insects and other animals is still lacking. In this study, we identified 6012 MITE families from 98 insect species genomes. Comparison of these MITEs with known MITEs in the NCBI non-redundant database and Repbase showed that 5701(∼95%) of 6012 MITE families are novel. The abundance of MITEs varies drastically among different insect species, and significantly correlates with genome size. In general, larger genomes contain more MITEs than small genomes. Furthermore, all identified MITEs were included in a newly constructed database (iMITEdb) (http://gene.cqu.edu.cn/iMITEdb/), which has functions such as browse, search, BLAST and download. Overall, our results not only provide insight on insect MITEs but will also improve assembly and annotation of insect genomes. More importantly, the results presented in this study will promote studies of MITEs function, evolution and application in insects. Database URL: http://gene.cqu.edu.cn/iMITEdb/
Collapse
Affiliation(s)
- Min-Jin Han
- State Key Laboratory of Silkworm Genome Biology, Key Laboratory for Sericulture Functional Genomics and Biotechnology of Agricultural Ministry, Southwest University, Chongqing 400715, China
| | - Qiu-Zhong Zhou
- Laboratory of Evolutionary and Functional Genomics, School of Life Sciences, Chongqing University, Chongqing 401331, China
| | - Hua-Hao Zhang
- College of Pharmacy and Life Science, Jiujiang University, Jiujiang 332000, China
| | - Xiaoling Tong
- State Key Laboratory of Silkworm Genome Biology, Key Laboratory for Sericulture Functional Genomics and Biotechnology of Agricultural Ministry, Southwest University, Chongqing 400715, China
| | - Cheng Lu
- State Key Laboratory of Silkworm Genome Biology, Key Laboratory for Sericulture Functional Genomics and Biotechnology of Agricultural Ministry, Southwest University, Chongqing 400715, China
| | - Ze Zhang
- Laboratory of Evolutionary and Functional Genomics, School of Life Sciences, Chongqing University, Chongqing 401331, China
| | - Fangyin Dai
- State Key Laboratory of Silkworm Genome Biology, Key Laboratory for Sericulture Functional Genomics and Biotechnology of Agricultural Ministry, Southwest University, Chongqing 400715, China
| |
Collapse
|
11
|
Jiang SH, Li GY, Xiong XM. Novel miniature inverted-repeat transposable elements derived from novel CACTA transposons were discovered in the genome of the ant Camponotus floridanus. Genes Genomics 2016. [DOI: 10.1007/s13258-016-0464-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
12
|
Ye C, Ji G, Liang C. detectMITE: A novel approach to detect miniature inverted repeat transposable elements in genomes. Sci Rep 2016; 6:19688. [PMID: 26795595 PMCID: PMC4726161 DOI: 10.1038/srep19688] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Accepted: 12/14/2015] [Indexed: 12/27/2022] Open
Abstract
Miniature inverted repeat transposable elements (MITEs) are prevalent in eukaryotic genomes, including plants and animals. Classified as a type of non-autonomous DNA transposable elements, they play important roles in genome organization and evolution. Comprehensive and accurate genome-wide detection of MITEs in various eukaryotic genomes can improve our understanding of their origins, transposition processes, regulatory mechanisms, and biological relevance with regard to gene structures, expression, and regulation. In this paper, we present a new MATLAB-based program called detectMITE that employs a novel numeric calculation algorithm to replace conventional string matching algorithms in MITE detection, adopts the Lempel-Ziv complexity algorithm to filter out MITE candidates with low complexity, and utilizes the powerful clustering program CD-HIT to cluster similar MITEs into MITE families. Using the rice genome as test data, we found that detectMITE can more accurately, comprehensively, and efficiently detect MITEs on a genome-wide scale than other popular MITE detection tools. Through comparison with the potential MITEs annotated in Repbase, the widely used eukaryotic repeat database, detectMITE has been shown to find known and novel MITEs with a complete structure and full-length copies in the genome. detectMITE is an open source tool (https://sourceforge.net/projects/detectmite).
Collapse
Affiliation(s)
- Congting Ye
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China.,Department of Biology, Miami University, Oxford, Ohio 45056, USA
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China.,Innovation Center for Cell Biology, Xiamen University, Xiamen, Fujian 361102, China
| | - Chun Liang
- Department of Biology, Miami University, Oxford, Ohio 45056, USA
| |
Collapse
|
13
|
Abstract
Plant genomes contain a particularly high proportion of repeated structures of various types. This chapter proposes a guided tour of available software that can help biologists to look for these repeats and check some hypothetical models intended to characterize their structures. Since transposable elements are a major source of repeats in plants, many methods have been used or developed for this large class of sequences. They are representative of the range of tools available for other classes of repeats and we have provided a whole section on this topic as well as a selection of the main existing software. In order to better understand how they work and how repeats may be efficiently found in genomes, it is necessary to look at the technical issues involved in the large-scale search of these structures. Indeed, it may be hard to keep up with the profusion of proposals in this dynamic field and the rest of the chapter is devoted to the foundations of the search for repeats and more complex patterns. The second section introduces the key concepts that are useful for understanding the current state of the art in playing with words, applied to genomic sequences. This can be seen as the first stage of a very general approach called linguistic analysis that is interested in the analysis of natural or artificial texts. Words, the lexical level, correspond to simple repeated entities in texts or strings. In fact, biologists need to represent more complex entities where a repeat family is built on more abstract structures, including direct or inverted small repeats, motifs, composition constraints as well as ordering and distance constraints between these elementary blocks. In terms of linguistics, this corresponds to the syntactic level of a language. The last section introduces concepts and practical tools that can be used to reach this syntactic level in biological sequence analysis.
Collapse
Affiliation(s)
- Jacques Nicolas
- Dyliss Team, Irisa/Inria Centre de Rennes Bretagne Atlantique, Campus de Beaulieu, 35510, Rennes cedex, France.
| | - Pierre Peterlongo
- Irisa/Inria Centre de Rennes Bretagne Atlantique, Campus de Beaulieu, 35510, Rennes cedex, France
| | - Sébastien Tempel
- LCB, CNRS UMR 7283, 31 Chemin Joseph Aiguier, 13402, Marseille cedex 20, France
| |
Collapse
|
14
|
Genome of Rhodnius prolixus, an insect vector of Chagas disease, reveals unique adaptations to hematophagy and parasite infection. Proc Natl Acad Sci U S A 2015; 112:14936-41. [PMID: 26627243 DOI: 10.1073/pnas.1506226112] [Citation(s) in RCA: 255] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Rhodnius prolixus not only has served as a model organism for the study of insect physiology, but also is a major vector of Chagas disease, an illness that affects approximately seven million people worldwide. We sequenced the genome of R. prolixus, generated assembled sequences covering 95% of the genome (∼ 702 Mb), including 15,456 putative protein-coding genes, and completed comprehensive genomic analyses of this obligate blood-feeding insect. Although immune-deficiency (IMD)-mediated immune responses were observed, R. prolixus putatively lacks key components of the IMD pathway, suggesting a reorganization of the canonical immune signaling network. Although both Toll and IMD effectors controlled intestinal microbiota, neither affected Trypanosoma cruzi, the causal agent of Chagas disease, implying the existence of evasion or tolerance mechanisms. R. prolixus has experienced an extensive loss of selenoprotein genes, with its repertoire reduced to only two proteins, one of which is a selenocysteine-based glutathione peroxidase, the first found in insects. The genome contained actively transcribed, horizontally transferred genes from Wolbachia sp., which showed evidence of codon use evolution toward the insect use pattern. Comparative protein analyses revealed many lineage-specific expansions and putative gene absences in R. prolixus, including tandem expansions of genes related to chemoreception, feeding, and digestion that possibly contributed to the evolution of a blood-feeding lifestyle. The genome assembly and these associated analyses provide critical information on the physiology and evolution of this important vector species and should be instrumental for the development of innovative disease control methods.
Collapse
|
15
|
Identification, characterization and diversification of non-autonomous hAT transposons and unknown insertions in Brassica. Genes Genomics 2015. [DOI: 10.1007/s13258-015-0324-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
16
|
Girgis HZ. Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale. BMC Bioinformatics 2015. [PMID: 26206263 PMCID: PMC4513396 DOI: 10.1186/s12859-015-0654-5] [Citation(s) in RCA: 119] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Background With rapid advancements in technology, the sequences of thousands of species’ genomes are becoming available. Within the sequences are repeats that comprise significant portions of genomes. Successful annotations thus require accurate discovery of repeats. As species-specific elements, repeats in newly sequenced genomes are likely to be unknown. Therefore, annotating newly sequenced genomes requires tools to discover repeats de-novo. However, the currently available de-novo tools have limitations concerning the size of the input sequence, ease of use, sensitivities to major types of repeats, consistency of performance, speed, and false positive rate. Results To address these limitations, I designed and developed Red, applying Machine Learning. Red is the first repeat-detection tool capable of labeling its training data and training itself automatically on an entire genome. Red is easy to install and use. It is sensitive to both transposons and simple repeats; in contrast, available tools such as RepeatScout and ReCon are sensitive to transposons, and WindowMasker to simple repeats. Red performed consistently well on seven genomes; the other tools performed well only on some genomes. Red is much faster than RepeatScout and ReCon and has a much lower false positive rate than WindowMasker. On human genes with five or more copies, Red was more specific than RepeatScout by a wide margin. When tested on genomes of unusual nucleotide compositions, Red located repeats with high sensitivities and maintained moderate false positive rates. Red outperformed the related tools on a bacterial genome. Red identified 46,405 novel repetitive segments in the human genome. Finally, Red is capable of processing assembled and unassembled genomes. Conclusions Red’s innovative methodology and its excellent performance on seven different genomes represent a valuable advancement in the field of repeats discovery. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0654-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hani Z Girgis
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, 20894, MD, USA. .,Tandy School of Computer Science, University of Tulsa, 800 South Tucker Drive, Tulsa, 74104, OK, USA.
| |
Collapse
|
17
|
Evolutionary genomics of miniature inverted-repeat transposable elements (MITEs) in Brassica. Mol Genet Genomics 2015; 290:2297-312. [DOI: 10.1007/s00438-015-1076-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2014] [Accepted: 05/29/2015] [Indexed: 11/26/2022]
|
18
|
Identification, Diversity and Evolution of MITEs in the Genomes of Microsporidian Nosema Parasites. PLoS One 2015; 10:e0123170. [PMID: 25898273 PMCID: PMC4405373 DOI: 10.1371/journal.pone.0123170] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2014] [Accepted: 01/27/2015] [Indexed: 11/29/2022] Open
Abstract
Miniature inverted-repeat transposable elements (MITEs) are short, non-autonomous DNA transposons, which are widespread in most eukaryotic genomes. However, genome-wide identification, origin and evolution of MITEs remain largely obscure in microsporidia. In this study, we investigated structural features for de novo identification of MITEs in genomes of silkworm microsporidia Nosema bombycis and Nosema antheraeae, as well as a honeybee microsporidia Nosema ceranae. A total of 1490, 149 and 83 MITE-related sequences from 89, 17 and five families, respectively, were found in the genomes of the above-mentioned species. Species-specific MITEs are predominant in each genome of microsporidian Nosema, with the exception of three MITE families that were shared by N. bombycis and N. antheraeae. One or multiple rounds of amplification occurred for MITEs in N. bombycis after divergence between N. bombycis and the other two species, suggesting that the more abundant families in N. bombycis could be attributed to the recent amplification of new MITEs. Significantly, some MITEs that inserted into the homologous protein-coding region of N. bombycis were recruited as introns, indicating that gene expansion occurred during the evolution of microsporidia. NbS31 and NbS24 had polymorphisms in different geographical strains of N. bombycis, indicating that they could still be active. In addition, several small RNAs in the MITEs in N. bombycis are mainly produced from both ends of the MITEs sequence.
Collapse
|
19
|
Fattash I, Lee CN, Mo K, Yang G. Efficient transposition of the youngest miniature inverted repeat transposable element family of yellow fever mosquito in yeast. FEBS J 2015; 282:1829-40. [PMID: 25754725 DOI: 10.1111/febs.13257] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2014] [Revised: 02/13/2015] [Accepted: 03/04/2015] [Indexed: 01/16/2023]
Abstract
Miniature inverted repeat transposable elements (MITEs) are often the most numerous DNA transposons in plant and animal genomes. The dramatic amplification of MITE families during evolution is puzzling, because the transposase sources for the vast majority of MITE families are unknown. The yellow fever mosquito genome contains > 220-Mb MITE sequences; however, transposition activity has not been demonstrated for any of the MITE families. The Gnome elements are the youngest MITE family in this genome, with at least 116 identical copies. To test whether the putative autonomous element Ozma is capable of mobilizing Gnome and its two sibling MITEs, analyses were performed in a yeast transposition assay system. Whereas the wild-type transposase resulted in very low transposition activity, mutations in the region containing a putative nuclear export signal motif resulted in a dramatic (at least 4160-fold) increase in transposition frequency. We have also demonstrated that each residue of the novel DD37E motif is required for the activity of the Ozma transposase. Footprint sequences left at the donor sites suggest that the transposase may cleave between the second and the third nucleotides from the 5' ends of the elements. The excised elements reinsert specifically at dinucleotide 'TA', ~ 55% of them in yeast genes. The elements described in this article could potentially be useful as genetic tools for genetic manipulation of mosquitoes.
Collapse
Affiliation(s)
- Isam Fattash
- Department of Biology, University of Toronto Mississauga, ON, Canada
| | - Chia-Ni Lee
- Department of Biology, University of Toronto Mississauga, ON, Canada
| | - Kaiguo Mo
- Department of Biology, University of Toronto Mississauga, ON, Canada
| | - Guojun Yang
- Department of Biology, University of Toronto Mississauga, ON, Canada
| |
Collapse
|
20
|
Luchetti A. terMITEs: miniature inverted-repeat transposable elements (MITEs) in the termite genome (Blattodea: Termitoidae). Mol Genet Genomics 2015; 290:1499-509. [PMID: 25711308 DOI: 10.1007/s00438-015-1010-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2014] [Accepted: 02/12/2015] [Indexed: 11/28/2022]
Abstract
Transposable elements (TEs) are discrete DNA sequences which are able to replicate and jump into different genomic locations. Miniature inverted-repeats TEs (MITEs) are non-autonomous DNA elements whose origin is still poorly understood. Recently, some MITEs were found to contain core repeats that can be arranged in tandem arrays; in some instances, these arrays have even given rise to satellite DNAs in the (peri)centromeric region of the host chromosomes. I report the discovery and analysis of three new MITEs found in the genome of several termite species (hence the name terMITEs) in two different families. For two of the MITEs (terMITE1-Tc1/mariner superfamily; terMITE2-piggyBac superfamily), evidence of past mobility was retrieved. Moreover, these two MITEs contained core repeats, 16 bp and 114 bp long respectively, exhibiting copy number variation. In terMITE2, the tandem duplication appeared associated with element degeneration, in line with a recently proposed evolutionary model on MITEs and the origin of tandem arrays. Concerning their genomic distribution, terMITE1 and terMITE3 appeared more frequently inserted close to coding regions while terMITE2 was mostly associated with TEs. Although MITEs are commonly distributed in coding regions, terMITE2 distribution is in line with that of other insects' piggyBac-related elements and of other small TEs found in termite genomes. This has been explained through insertional preference rather than through selective processes. Data presented here add to the knowledge on the poorly exploited polyneopteran genomes and will provide an interesting framework in which to study TEs' evolution and host's life history traits.
Collapse
Affiliation(s)
- Andrea Luchetti
- Dipartimento di Scienze Biologiche, Geologiche e Ambientali, Università di Bologna, via Selmi 3, 40126, Bologna, Italy,
| |
Collapse
|
21
|
Yang G, Fattash I, Lee CN, Liu K, Cavinder B. Birth of three stowaway-like MITE families via microhomology-mediated miniaturization of a Tc1/Mariner element in the yellow fever mosquito. Genome Biol Evol 2014; 5:1937-48. [PMID: 24068652 PMCID: PMC3814204 DOI: 10.1093/gbe/evt146] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Eukaryotic genomes contain numerous DNA transposons that move by a cut-and-paste mechanism. The majority of these elements are self-insufficient and dependent on their autonomous relatives to transpose. Miniature inverted repeat transposable elements (MITEs) are often the most numerous nonautonomous DNA elements in a higher eukaryotic genome. Little is known about the origin of these MITE families as few of them are accompanied by their direct ancestral elements in a genome. Analyses of MITEs in the yellow fever mosquito identified its youngest MITE family, designated as Gnome, that contains at least 116 identical copies. Genome-wide search for direct ancestral autonomous elements of Gnome revealed an elusive single copy Tc1/Mariner-like element, named as Ozma, that encodes a transposase with a DD37E triad motif. Strikingly, Ozma also gave rise to two additional MITE families, designated as Elf and Goblin. These three MITE families were derived at different times during evolution and bear internal sequences originated from different regions of Ozma. Upon close inspection of the sequence junctions, the internal deletions during the formation of these three MITE families always occurred between two microhomologous sites (6–8 bp). These results suggest that multiple MITE families may originate from a single ancestral autonomous element, and formation of MITEs can be mediated by sequence microhomology. Ozma and its related MITEs are exceptional candidates for the long sought-after endogenous active transposon tool in genetic control of mosquitoes.
Collapse
Affiliation(s)
- Guojun Yang
- Department of Biology, University of Toronto Mississauga, Ontario, Canada
- *Corresponding author: E-mail:
| | - Isam Fattash
- Department of Biology, University of Toronto Mississauga, Ontario, Canada
| | - Chia-Ni Lee
- Department of Biology, University of Toronto Mississauga, Ontario, Canada
| | - Kun Liu
- Department of Botany and Plant Sciences, University of California Riverside
| | - Brad Cavinder
- Department of Plant Pathology and Microbiology, University of California Riverside
| |
Collapse
|
22
|
Genome-wide comparative analysis of 20 miniature inverted-repeat transposable element families in Brassica rapa and B. oleracea. PLoS One 2014; 9:e94499. [PMID: 24747717 PMCID: PMC3991616 DOI: 10.1371/journal.pone.0094499] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2014] [Accepted: 03/17/2014] [Indexed: 12/25/2022] Open
Abstract
Miniature inverted-repeat transposable elements (MITEs) are ubiquitous, non-autonomous class II transposable elements. Here, we conducted genome-wide comparative analysis of 20 MITE families in B. rapa, B. oleracea, and Arabidopsis thaliana. A total of 5894 and 6026 MITE members belonging to the 20 families were found in the whole genome pseudo-chromosome sequences of B. rapa and B. oleracea, respectively. Meanwhile, only four of the 20 families, comprising 573 members, were identified in the Arabidopsis genome, indicating that most of the families were activated in the Brassica genus after divergence from Arabidopsis. Copy numbers varied from 4 to 1459 for each MITE family, and there was up to 6-fold variation between B. rapa and B. oleracea. In particular, analysis of intact members showed that whereas eleven families were present in similar copy numbers in B. rapa and B. oleracea, nine families showed copy number variation ranging from 2- to 16-fold. Four of those families (BraSto-3, BraTo-3, 4, 5) were more abundant in B. rapa, and the other five (BraSto-1, BraSto-4, BraTo-1, 7 and BraHAT-1) were more abundant in B. oleracea. Overall, 54% and 51% of the MITEs resided in or within 2 kb of a gene in the B. rapa and B. oleracea genomes, respectively. Notably, 92 MITEs were found within the CDS of annotated genes, suggesting that MITEs might play roles in diversification of genes in the recently triplicated Brassica genome. MITE insertion polymorphism (MIP) analysis of 289 MITE members showed that 52% and 23% were polymorphic at the inter- and intra-species levels, respectively, indicating that there has been recent MITE activity in the Brassica genome. These recently activated MITE families with abundant MIP will provide useful resources for molecular breeding and identification of novel functional genes arising from MITE insertion.
Collapse
|
23
|
Chen J, Hu Q, Zhang Y, Lu C, Kuang H. P-MITE: a database for plant miniature inverted-repeat transposable elements. Nucleic Acids Res 2013; 42:D1176-81. [PMID: 24174541 PMCID: PMC3964958 DOI: 10.1093/nar/gkt1000] [Citation(s) in RCA: 104] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Miniature inverted-repeat transposable elements (MITEs) are prevalent in eukaryotic species including plants. MITE families vary dramatically and usually cannot be identified based on homology. In this study, we de novo identified MITEs from 41 plant species, using computer programs MITE Digger, MITE-Hunter and/or Repetitive Sequence with Precise Boundaries (RSPB). MITEs were found in all, but one (Cyanidioschyzon merolae), species. Combined with the MITEs identified previously from the rice genome, >2.3 million sequences from 3527 MITE families were obtained from 41 plant species. In general, higher plants contain more MITEs than lower plants, with a few exceptions such as papaya, with only 538 elements. The largest number of MITEs is found in apple, with 237 302 MITE sequences. The number of MITE sequences in a genome is significantly correlated with genome size. A series of databases (plant MITE databases, P-MITE), available online at http://pmite.hzau.edu.cn/django/mite/, was constructed to host all MITE sequences from the 41 plant genomes. The databases are available for sequence similarity searches (BLASTN), and MITE sequences can be downloaded by family or by genome. The databases can be used to study the origin and amplification of MITEs, MITE-derived small RNAs and roles of MITEs on gene and genome evolution.
Collapse
Affiliation(s)
- Jiongjiong Chen
- Department of Vegetable Crops, Key Laboratory of Horticulture Biology, Ministry of Education, College of Horticulture and Forestry Sciences, Huazhong Agricultural University, Wuhan, 430070, P. R. China
| | | | | | | | | |
Collapse
|
24
|
Zhang HH, Shen YH, Xu HE, Liang HY, Han MJ, Zhang Z. A novel hAT element in Bombyx mori and Rhodnius prolixus: its relationship with miniature inverted repeat transposable elements (MITEs) and horizontal transfer. INSECT MOLECULAR BIOLOGY 2013; 22:584-596. [PMID: 23889491 DOI: 10.1111/imb.12047] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Comparative analysis of transposable elements (TEs) from different species can make it possible to reconstruct their history over evolutionary time. In this study, we identified a novel hAT element in Bombyx mori and Rhodnius prolixus with characteristic GGGCGGCA repeats in its subterminal region. Meanwhile, phylogenetic analysis demonstrated that the elements in these two species might represent a separate cluster of the hAT superfamily. Strikingly, a previously identified miniature inverted repeat transposable element (MITE) shared high identity with this autonomous element across the entire length, supporting the hypothesis that MITEs are derived from the internal deletion of DNA transposons. Interestingly, identity of the consensus sequences of this novel hAT element between B. mori and R. prolixus, which diverged about 370 million years ago, was as high as 96.5% over their full length (about 3.6 kb) at the nucleotide level. The patchy distribution amongst species, coupled with overall lack of intense purifying selection acting on this element, suggest that this novel hAT element might have experienced horizontal transfer between the ancestors of B. mori and R. prolixus. Our results highlight that this novel hAT element could be used as a potential tool for germline transformation of R. prolixus to control the transmission of Trypanosoma cruzi, which causes Chagas disease.
Collapse
Affiliation(s)
- H-H Zhang
- School of Life Sciences, Chongqing University, Chongqing, China
| | | | | | | | | | | |
Collapse
|
25
|
Yang G. MITE Digger, an efficient and accurate algorithm for genome wide discovery of miniature inverted repeat transposable elements. BMC Bioinformatics 2013; 14:186. [PMID: 23758809 PMCID: PMC3680318 DOI: 10.1186/1471-2105-14-186] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2013] [Accepted: 06/02/2013] [Indexed: 11/25/2022] Open
Abstract
Background Miniature inverted repeat transposable elements (MITEs) are abundant non-autonomous elements, playing important roles in shaping gene and genome evolution. Their characteristic structural features are suitable for automated identification by computational approaches, however, de novo MITE discovery at genomic levels is still resource expensive. Efficient and accurate computational tools are desirable. Existing algorithms process every member of a MITE family, therefore a major portion of the computing task is redundant. Results In this study, redundant computing steps were analyzed and a novel algorithm emphasizing on the reduction of such redundant computing was implemented in MITE Digger. It completed processing the whole rice genome sequence database in ~15 hours and produced 332 MITE candidates with low false positive (1.8%) and false negative (0.9%) rates. MITE Digger was also tested for genome wide MITE discovery with four other genomes. Conclusions MITE Digger is efficient and accurate for genome wide retrieval of MITEs. Its user friendly interface further facilitates genome wide analyses of MITEs on a routine basis. The MITE Digger program is available at: http://labs.csb.utoronto.ca/yang/MITEDigger.
Collapse
Affiliation(s)
- Guojun Yang
- Department of Biology, University of Toronto Mississauga, Mississauga, ON L5L 1C6, Canada.
| |
Collapse
|
26
|
Sampath P, Lee SC, Lee J, Izzah NK, Choi BS, Jin M, Park BS, Yang TJ. Characterization of a new high copy Stowaway family MITE, BRAMI-1 in Brassica genome. BMC PLANT BIOLOGY 2013; 13:56. [PMID: 23547712 PMCID: PMC3626606 DOI: 10.1186/1471-2229-13-56] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2012] [Accepted: 03/18/2013] [Indexed: 05/29/2023]
Abstract
BACKGROUND Miniature inverted-repeat transposable elements (MITEs) are expected to play important roles in evolution of genes and genome in plants, especially in the highly duplicated plant genomes. Various MITE families and their roles in plants have been characterized. However, there have been fewer studies of MITE families and their potential roles in evolution of the recently triplicated Brassica genome. RESULTS We identified a new MITE family, BRAMI-1, belonging to the Stowaway super-family in the Brassica genome. In silico mapping revealed that 697 members are dispersed throughout the euchromatic regions of the B. rapa pseudo-chromosomes. Among them, 548 members (78.6%) are located in gene-rich regions, less than 3 kb from genes. In addition, we identified 516 and 15 members in the 470 Mb and 15 Mb genomic shotgun sequences currently available for B. oleracea and B. napus, respectively. The resulting estimated copy numbers for the entire genomes were 1440, 1464 and 2490 in B. rapa, B. oleracea and B. napus, respectively. Concurrently, only 70 members of the related Arabidopsis ATTIRTA-1 MITE family were identified in the Arabidopsis genome. Phylogenetic analysis revealed that BRAMI-1 elements proliferated in the Brassica genus after divergence from the Arabidopsis lineage. MITE insertion polymorphism (MIP) was inspected for 50 BRAMI-1 members, revealing high levels of insertion polymorphism between and within species of Brassica that clarify BRAMI-1 activation periods up to the present. Comparative analysis of the 71 genes harbouring the BRAMI-1 elements with their non-insertion paralogs (NIPs) showed that the BRAMI-1 insertions mainly reside in non-coding sequences and that the expression levels of genes with the elements differ from those of their NIPs. CONCLUSION A Stowaway family MITE, named as BRAMI-1, was gradually amplified and remained present in over than 1400 copies in each of three Brassica species. Overall, 78% of the members were identified in gene-rich regions, and it is assumed that they may contribute to the evolution of duplicated genes in the highly duplicated Brassica genome. The resulting MIPs can serve as a good source of DNA markers for Brassica crops because the insertions are highly dispersed in the gene-rich euchromatin region and are polymorphic between or within species.
Collapse
Affiliation(s)
- Perumal Sampath
- Dept. of Plant Science, Plant Genomics and Breeding Institute, and Research Institute for Agriculture and Life Sciences, College of Agriculture and Life Sciences, Seoul National University, Seoul, 151-921, Republic of Korea
| | - Sang-Choon Lee
- Dept. of Plant Science, Plant Genomics and Breeding Institute, and Research Institute for Agriculture and Life Sciences, College of Agriculture and Life Sciences, Seoul National University, Seoul, 151-921, Republic of Korea
| | - Jonghoon Lee
- Dept. of Plant Science, Plant Genomics and Breeding Institute, and Research Institute for Agriculture and Life Sciences, College of Agriculture and Life Sciences, Seoul National University, Seoul, 151-921, Republic of Korea
| | - Nur Kholilatul Izzah
- Dept. of Plant Science, Plant Genomics and Breeding Institute, and Research Institute for Agriculture and Life Sciences, College of Agriculture and Life Sciences, Seoul National University, Seoul, 151-921, Republic of Korea
| | - Beom-Soon Choi
- National Instrumentation Center for Environmental Management, College of Agriculture and Life Sciences, Seoul National University, Seoul, 151-921, Republic of Korea
| | - Mina Jin
- National Academy of Agricultural Science, Rural Development Administration, 150 Suinro, Suwon, 441-707, Republic of Korea
| | - Beom-Seok Park
- National Academy of Agricultural Science, Rural Development Administration, 150 Suinro, Suwon, 441-707, Republic of Korea
| | - Tae-Jin Yang
- Dept. of Plant Science, Plant Genomics and Breeding Institute, and Research Institute for Agriculture and Life Sciences, College of Agriculture and Life Sciences, Seoul National University, Seoul, 151-921, Republic of Korea
| |
Collapse
|
27
|
Fattash I, Rooke R, Wong A, Hui C, Luu T, Bhardwaj P, Yang G. Miniature inverted-repeat transposable elements: discovery, distribution, and activity. Genome 2013; 56:475-86. [PMID: 24168668 DOI: 10.1139/gen-2012-0174] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Eukaryotic organisms have dynamic genomes, with transposable elements (TEs) as a major contributing factor. Although the large autonomous TEs can significantly shape genomic structures during evolution, genomes often harbor more miniature nonautonomous TEs that can infest genomic niches where large TEs are rare. In spite of their cut-and-paste transposition mechanisms that do not inherently favor copy number increase, miniature inverted-repeat transposable elements (MITEs) are abundant in eukaryotic genomes and exist in high copy numbers. Based on the large number of MITE families revealed in previous studies, accurate annotation of MITEs, particularly in newly sequenced genomes, will identify more genomes highly rich in these elements. Novel families identified from these analyses, together with the currently known families, will further deepen our understanding of the origins, transposase sources, and dramatic amplification of these elements.
Collapse
Affiliation(s)
- Isam Fattash
- a Department of Biology, University of Toronto at Mississauga, 3359 Mississauga Road, Mississauga, ON L5L 1C6, Canada
| | | | | | | | | | | | | |
Collapse
|
28
|
Abstract
The initial identification of transposable elements (TEs) was attributed to the activity of DNA transposable elements, which are prevalent in plants. Unlike RNA elements, which accumulate in the gene-poor heterochromatic regions, most DNA elements are located in the gene rich regions and many of them carry genes or gene fragments. As such, DNA elements have a more intimate relationship with genes and may have an immediate impact on gene expression and gene function. DNA elements are structurally distinct from RNA elements and most of them have terminal inverted repeats (TIRs). Such structural features have been used to identify the relevant elements from genomic sequences. Among the DNA elements in plants, the most abundant type is the miniature inverted repeat transposable elements (MITEs). This chapter discusses the methods to identify MITEs, Helitrons, and other DNA transposable elements.
Collapse
Affiliation(s)
- Ning Jiang
- Department of Horticulture, Michigan State University, East Lansing, MI, USA
| |
Collapse
|
29
|
Flutre T, Permal E, Quesneville H. Transposable Element Annotation in Completely Sequenced Eukaryote Genomes. PLANT TRANSPOSABLE ELEMENTS 2012. [DOI: 10.1007/978-3-642-31842-9_2] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
30
|
Abstract
Most genomes are populated by thousands of sequences that originated from mobile elements. On the one hand, these sequences present a real challenge in the process of genome analysis and annotation. On the other hand, there are very interesting biological subjects involved in many cellular processes. Here, we present an overview of transposable elements (TEs) biodiversity and their impact on genomic evolution. Finally, we discuss different approaches to the TEs detection and analyses.
Collapse
|
31
|
Lu C, Chen J, Zhang Y, Hu Q, Su W, Kuang H. Miniature inverted-repeat transposable elements (MITEs) have been accumulated through amplification bursts and play important roles in gene expression and species diversity in Oryza sativa. Mol Biol Evol 2011; 29:1005-17. [PMID: 22096216 PMCID: PMC3278479 DOI: 10.1093/molbev/msr282] [Citation(s) in RCA: 147] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Miniature inverted–repeat transposable elements (MITEs) are predicted to play important roles on genome evolution. We developed a BLASTN-based approach for de novo identification of MITEs and systematically analyzed MITEs in rice genome. The genome of rice cultivar Nipponbare (Oryza sativa ssp. japonica) harbors 178,533 MITE-related sequences classified into 338 families. Pairwise nucleotide diversity and phylogenetic tree analysis indicated that individual MITE families were resulted from one or multiple rounds of amplification bursts. The timing of amplification burst varied considerably between different MITE families or subfamilies. MITEs are associated with 23,623 (58.2%) genes in rice genome. At least 7,887 MITEs are transcribed and more than 3,463 were transcribed with rice genes. The MITE sequences transcribed with rice coding genes form 1,130 pairs of potential natural sense/antisense transcripts. MITEs generate 23.5% (183,837 of 781,885) of all small RNAs identified from rice. Some MITE families generated small RNAs mainly from the terminals, while other families generated small RNAs predominantly from the central region. More than half (51.8%) of the MITE-derived small RNAs were generated exclusively by MITEs located away from genes. Genome-wide analysis showed that genes associated with MITEs have significantly lower expression than genes away from MITEs. Approximately 14.8% of loci with full-length MITEs have presence/absence polymorphism between rice cultivars 93-11 (O. sativa ssp. indica) and Nipponbare. Considering that different sets of genes may be regulated by MITE-derived small RNAs in different genotypes, MITEs provide considerable diversity for O. sativa.
Collapse
Affiliation(s)
- Chen Lu
- Key Laboratory of Horticulture Biology, Ministry of Education and Department of Vegetable Crops, College of Horticulture and Forestry, Huazhong Agricultural University, Wuhan, People's Republic of China
| | | | | | | | | | | |
Collapse
|
32
|
Fleetwood DJ, Khan AK, Johnson RD, Young CA, Mittal S, Wrenn RE, Hesse U, Foster SJ, Schardl CL, Scott B. Abundant degenerate miniature inverted-repeat transposable elements in genomes of epichloid fungal endophytes of grasses. Genome Biol Evol 2011; 3:1253-64. [PMID: 21948396 PMCID: PMC3227409 DOI: 10.1093/gbe/evr098] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/20/2011] [Indexed: 12/20/2022] Open
Abstract
Miniature inverted-repeat transposable elements (MITEs) are abundant repeat elements in plant and animal genomes; however, there are few analyses of these elements in fungal genomes. Analysis of the draft genome sequence of the fungal endophyte Epichloë festucae revealed 13 MITE families that make up almost 1% of the E. festucae genome, and relics of putative autonomous parent elements were identified for three families. Sequence and DNA hybridization analyses suggest that at least some of the MITEs identified in the study were active early in the evolution of Epichloë but are not found in closely related genera. Analysis of MITE integration sites showed that these elements have a moderate integration site preference for 5' genic regions of the E. festucae genome and are particularly enriched near genes for secondary metabolism. Copies of the EFT-3m/Toru element appear to have mediated recombination events that may have abolished synthesis of two fungal alkaloids in different epichloae. This work provides insight into the potential impact of MITEs on epichloae evolution and provides a foundation for analysis in other fungal genomes.
Collapse
Affiliation(s)
- Damien J Fleetwood
- Forage Biotechnology Section, AgResearch, Palmerston North, New Zealand.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Fernández-Medina RD, Struchiner CJ, Ribeiro JMC. Novel transposable elements from Anopheles gambiae. BMC Genomics 2011; 12:260. [PMID: 21605407 PMCID: PMC3212995 DOI: 10.1186/1471-2164-12-260] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2010] [Accepted: 05/23/2011] [Indexed: 12/25/2022] Open
Abstract
Background Transposable elements (TEs) are DNA sequences, present in the genome of most eukaryotic organisms that hold the key characteristic of being able to mobilize and increase their copy number within chromosomes. These elements are important for eukaryotic genome structure and evolution and lately have been considered as potential drivers for introducing transgenes into pathogen-transmitting insects as a means to control vector-borne diseases. The aim of this work was to catalog the diversity and abundance of TEs within the Anopheles gambiae genome using the PILER tool and to consolidate a database in the form of a hyperlinked spreadsheet containing detailed and readily available information about the TEs present in the genome of An. gambiae. Results Here we present the spreadsheet named AnoTExcel that constitutes a database with detailed information on most of the repetitive elements present in the genome of the mosquito. Despite previous work on this topic, our approach permitted the identification and characterization both of previously described and novel TEs that are further described in detailed. Conclusions Identification and characterization of TEs in a given genome is important as a way to understand the diversity and evolution of the whole set of TEs present in a given species. This work contributes to a better understanding of the landscape of TEs present in the mosquito genome. It also presents a novel platform for the identification, analysis, and characterization of TEs on sequenced genomes.
Collapse
Affiliation(s)
- Rita D Fernández-Medina
- Fundação Oswaldo Cruz, Escola Nacional de Saúde Pública Sergio Arouca, Av, Brasil, 4365, 21040 360, Rio de Janeiro, Brazil.
| | | | | |
Collapse
|
34
|
Mancini E, Tammaro F, Baldini F, Via A, Raimondo D, George P, Audisio P, Sharakhov IV, Tramontano A, Catteruccia F, della Torre A. Molecular evolution of a gene cluster of serine proteases expressed in the Anopheles gambiae female reproductive tract. BMC Evol Biol 2011; 11:72. [PMID: 21418586 PMCID: PMC3068966 DOI: 10.1186/1471-2148-11-72] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2010] [Accepted: 03/19/2011] [Indexed: 11/30/2022] Open
Abstract
Background Genes involved in post-mating processes of multiple mating organisms are known to evolve rapidly due to coevolution driven by sexual conflict among male-female interacting proteins. In the malaria mosquito Anopheles gambiae - a monandrous species in which sexual conflict is expected to be absent or minimal - recent data strongly suggest that proteolytic enzymes specifically expressed in the female lower reproductive tissues are involved in the processing of male products transferred to females during mating. In order to better understand the role of selective forces underlying the evolution of proteins involved in post-mating responses, we analysed a cluster of genes encoding for three serine proteases that are down-regulated after mating, two of which specifically expressed in the atrium and one in the spermatheca of A. gambiae females. Results The analysis of polymorphisms and divergence of these female-expressed proteases in closely related species of the A. gambiae complex revealed a high level of replacement polymorphisms consistent with relaxed evolutionary constraints of duplicated genes, allowing to rapidly fix novel replacements to perform new or more specific functions. Adaptive evolution was detected in several codons of the 3 genes and hints of episodic selection were also found. In addition, the structural modelling of these proteases highlighted some important differences in their substrate specificity, and provided evidence that a number of sites evolving under selective pressures lie relatively close to the catalytic triad and/or on the edge of the specificity pocket, known to be involved in substrate recognition or binding. The observed patterns suggest that these proteases may interact with factors transferred by males during mating (e.g. substrates, inhibitors or pathogens) and that they may have differently evolved in independent A. gambiae lineages. Conclusions Our results - also examined in light of constraints in the application of selection-inference methods to the closely related species of the A. gambiae complex - reveal an unexpectedly intricate evolutionary scenario. Further experimental analyses are needed to investigate the biological functions of these genes in order to better interpret their molecular evolution and to assess whether they represent possible targets for limiting the fertility of Anopheles mosquitoes in malaria vector control strategies.
Collapse
Affiliation(s)
- Emiliano Mancini
- Istituto-Pasteur - Fondazione Cenci Bolognetti, Dipartimento di Sanità Pubblica e Malattie Infettive, 'Sapienza' Università di Roma, Italy
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine. PLoS One 2011; 6:e16214. [PMID: 21283709 PMCID: PMC3025025 DOI: 10.1371/journal.pone.0016214] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2010] [Accepted: 12/10/2010] [Indexed: 11/19/2022] Open
Abstract
Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu).
Collapse
|
36
|
Gui YJ, Zhou Y, Wang Y, Wang S, Wang SY, Hu Y, Bo SP, Chen H, Zhou CP, Ma NX, Zhang TZ, Fan LJ. Insights into the bamboo genome: syntenic relationships to rice and sorghum. JOURNAL OF INTEGRATIVE PLANT BIOLOGY 2010; 52:1008-1015. [PMID: 20977658 DOI: 10.1111/j.1744-7909.2010.00965.x] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Bamboo occupies an important phylogenetic node in the grass family and plays a significant role in the forest industry. We produced 1.2 Mb of tetraploid moso bamboo (Phyllostachys pubescens E. Mazel ex H. de Leh.) sequences from 13 bacterial artificial chromosome (BAC) clones, and these are the largest genomic sequences available so far from the subfamily Bambusoideae. The content of repetitive elements (36.2%) in bamboo is similar to that in rice. Both rice and sorghum exhibit high genomic synteny with bamboo, which suggests that rice and sorghum may be useful as models for decoding Bambusoideae genomes.
Collapse
Affiliation(s)
- Yi-Jie Gui
- Institute of Crop Science & Institute of Bioinformatics, Zhejiang University, Hangzhou 310029, China
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Han Y, Wessler SR. MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences. Nucleic Acids Res 2010; 38:e199. [PMID: 20880995 PMCID: PMC3001096 DOI: 10.1093/nar/gkq862] [Citation(s) in RCA: 411] [Impact Index Per Article: 27.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Miniature inverted-repeat transposable elements (MITEs) are a special type of Class 2 non-autonomous transposable element (TE) that are abundant in the non-coding regions of the genes of many plant and animal species. The accurate identification of MITEs has been a challenge for existing programs because they lack coding sequences and, as such, evolve very rapidly. Because of their importance to gene and genome evolution, we developed MITE-Hunter, a program pipeline that can identify MITEs as well as other small Class 2 non-autonomous TEs from genomic DNA data sets. The output of MITE-Hunter is composed of consensus TE sequences grouped into families that can be used as a library file for homology-based TE detection programs such as RepeatMasker. MITE-Hunter was evaluated by searching the rice genomic database and comparing the output with known rice TEs. It discovered most of the previously reported rice MITEs (97.6%), and found sixteen new elements. MITE-Hunter was also compared with two other MITE discovery programs, FINDMITE and MUST. Unlike MITE-Hunter, neither of these programs can search large genomic data sets including whole genome sequences. More importantly, MITE-Hunter is significantly more accurate than either FINDMITE or MUST as the vast majority of their outputs are false-positives.
Collapse
Affiliation(s)
- Yujun Han
- Department of Plant Biology, University of Georgia, Athens, GA 30602, USA
| | | |
Collapse
|
38
|
Han MJ, Shen YH, Gao YH, Chen LY, Xiang ZH, Zhang Z. Burst expansion, distribution and diversification of MITEs in the silkworm genome. BMC Genomics 2010; 11:520. [PMID: 20875122 PMCID: PMC2997013 DOI: 10.1186/1471-2164-11-520] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2010] [Accepted: 09/27/2010] [Indexed: 01/31/2023] Open
Abstract
Background Miniature inverted-repeat transposable elements (MITEs) are widespread in plants and animals. Although silkworm (Bombyx mori) has a large amount of and a variety of transposable elements, the genome-wide information of the silkworm MITEs is unknown. Results We used structure-based and homology approaches to search for MITEs in the silkworm genome. We identified 17 MITE families with a total of 5785 members, accounting for ~0.4% of the genome. 7 of 17 MITE families are completely novel based on the nucleotide composition of target site duplication (TSD) and/or terminal inverted repeats (TIR). Silkworm MITEs were widely and nonrandom distributed in the genome. One family named BmMITE-2 might experience a recent burst expansion. Network and diversity analyses for each family revealed different diversification patterns of the silkworm MITEs, reflecting the signatures of genome-shocks that silkworm experienced. Most silkworm MITEs preferentially inserted into or near genes and BmMITE-11 that encodes a germline-restricted small RNA might silence its the closest genes in silkworm ovary through a small RNA pathway. Conclusions Silkworm harbors 17 MITE families. The silkworm MITEs preferred to reside in or near genes and one MITE might be involved in gene silence. Our results emphasize the exceptional role of MITEs in transcriptional regulation of genes and have general implications to understand interaction between MITEs and their host genome.
Collapse
Affiliation(s)
- Min-Jin Han
- The Key Sericultural Laboratory of Agricultural Ministry, Southwest University, Chongqing 400715, China.
| | | | | | | | | | | |
Collapse
|
39
|
Wang S, Zhang L, Meyer E, Matz MV. Characterization of a group of MITEs with unusual features from two coral genomes. PLoS One 2010; 5:e10700. [PMID: 20502527 PMCID: PMC2872659 DOI: 10.1371/journal.pone.0010700] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2010] [Accepted: 04/27/2010] [Indexed: 01/24/2023] Open
Abstract
Background Miniature inverted-repeat transposable elements (MITEs), which are common in eukaryotic genomes, are small non-coding elements that transpose by utilizing transposases encoded by autonomous transposons. Recent genome-wide analyses and cross-mobilization assays have greatly improved our knowledge on MITE proliferation, however, specific mechanisms for the origin and evolution of MITEs are still unclear. Principal Findings A group of coral MITEs called CMITE were identified from two corals, Acropora millepora and Acropora palmata. CMITEs conform to many common characteristics of MITEs, but also present several unusual features. The most unusual feature of CMITEs is conservation of the internal region, which is more conserved between MITE families than the TIRs. The origin of this internal region remains unknown, although we found one CMITE family that seems to be derived from a piggyBac-like transposon in A. millepora. CMITEs can form tandem arrays, suggesting an unconventional way for MITEs to increase copy numbers. We also describe a case in which a novel transposable element was created by a CMITE insertion event. Conclusions To our knowledge, this is the first report of identification of MITEs from coral genomes. Proliferation of CMITEs seems to be related to the transposition machinery of piggyBac-like autonomous transposons. The highly conserved internal region of CMITEs suggests a potential role for this region in their successful transposition. However, the origin of these unusual features in CMITEs remains unclear, and thus represents an intriguing topic for future investigations.
Collapse
Affiliation(s)
- Shi Wang
- Section of Integrative Biology, University of Texas at Austin, Austin, Texas, United States of America.
| | | | | | | |
Collapse
|
40
|
Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs. Heredity (Edinb) 2009; 104:520-33. [PMID: 19935826 DOI: 10.1038/hdy.2009.165] [Citation(s) in RCA: 143] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
The production of genome sequences has led to another important advance in their annotation, which is closely linked to the exact determination of their content in terms of repeats, among which are transposable elements (TEs). The evolutionary implications and the presence of coding regions in some TEs can confuse gene annotation, and also hinder the process of genome assembly, making particularly crucial to be able to annotate and classify them correctly in genome sequences. This review is intended to provide an overview as comprehensive as possible of the automated methods currently used to annotate and classify TEs in sequenced genomes. Different categories of programs exist according to their methodology and the repeat, which they can identify. I describe here the main characteristics of the programs, their main goals and the difficulties they can entail. The drawbacks of the different methods are also highlighted to help biologists who are unfamiliar with algorithmic methods to understand this methodology better. Globally, using several different programs and carrying out a cross comparison of their results has the best chance of finding reliable results as any single program. However, this makes it essential to verify the results provided by each program independently. The ideal solution would be to test all programs against the same data set to obtain a true comparison of their actual performance.
Collapse
|
41
|
Anopheles gambiae APL1 is a family of variable LRR proteins required for Rel1-mediated protection from the malaria parasite, Plasmodium berghei. PLoS One 2008; 3:e3672. [PMID: 18989366 PMCID: PMC2577063 DOI: 10.1371/journal.pone.0003672] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2008] [Accepted: 10/20/2008] [Indexed: 11/20/2022] Open
Abstract
Background We previously identified by genetic mapping an Anopheles gambiae chromosome region with strong influence over the outcome of malaria parasite infection in nature. Candidate gene studies in the genetic interval, including functional tests using the rodent malaria parasite Plasmodium berghei, identified a novel leucine-rich repeat gene, APL1, with functional activity against P. berghei. Principal Findings Manual reannotation now reveals APL1 to be a family of at least 3 independently transcribed genes, APL1A, APL1B, and APL1C. Functional dissection indicates that among the three known APL1 family members, APL1C alone is responsible for host defense against P. berghei. APL1C functions within the Rel1-Cactus immune signaling pathway, which regulates APL1C transcript and protein abundance. Gene silencing of APL1C completely abolishes Rel1-mediated host protection against P. berghei, and thus the presence of APL1C is required for this protection. Further highlighting the influence of this chromosome region, allelic haplotypes at the APL1 locus are genetically associated with and have high explanatory power for the success or failure of P. berghei parasite infection. Conclusions APL1C functions as a required transducer of Rel1-dependent immune signal(s) to efficiently protect mosquitoes from P. berghei infection, and allelic genetic haplotypes of the APL1 locus display distinct levels of susceptibility and resistance to P. berghei.
Collapse
|
42
|
A recently active miniature inverted-repeat transposable element, Chunjie, inserted into an operon without disturbing the operon structure in Geobacter uraniireducens Rf4. Genetics 2008; 179:2291-7. [PMID: 18660544 DOI: 10.1534/genetics.108.089995] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Miniature inverted-repeat transposable elements (MITEs) are short DNA transposons with terminal inverted repeat (TIR) signals and have been extensively studied in plants and other eukaryotes. But little is known about them in eubacteria. We identified a novel and recently active MITE, Chunjie, when studying the recent duplication of an operon consisting of ABC transporters and a phosphate uptake regulator in the chromosome of Geobacter uraniireducens Rf4. Chunjie resembles the other known MITEs in many aspects, e.g., having TIR signals and direct repeats, small in size, noncoding, able to fold into a stable secondary structure, and typically inserted into A + T-rich regions. At least one case of recent transposition was observed, i.e., the insertion of Chunjie into one copy of the aforementioned operon. As far as we know, this is the first report that the insertion of a MITE does not disrupt the operon structure.
Collapse
|
43
|
Wang J, Du Y, Wang S, Brown SJ, Park Y. Large diversity of the piggyBac-like elements in the genome of Tribolium castaneum. INSECT BIOCHEMISTRY AND MOLECULAR BIOLOGY 2008; 38:490-8. [PMID: 18342253 PMCID: PMC3206788 DOI: 10.1016/j.ibmb.2007.04.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2007] [Revised: 04/04/2007] [Accepted: 04/25/2007] [Indexed: 05/12/2023]
Abstract
The piggyBac transposable element (TE), originally discovered in the cabbage looper, Trichoplusia ni, has been widely used in insect transgenesis including the red flour beetle Tribolium castaneum. We surveyed piggyBac-like (PLE) sequences in the genome of T. castaneum by homology searches using as queries the diverse PLE sequences that have been described previously. The search yielded a total of 32 piggyBac-like elements (TcPLEs) which were classified into 14 distinct groups. Most of the TcPLEs contain defective functional motifs in that they are lacking inverted terminal repeats (ITRs) or have disrupted open reading frames. Only one single copy of TcPLE1 appears to be intact with imperfect 16bp ITRs flanking an open reading frame encoding a transposase of 571 amino acid residues. Many copies of TcPLEs were found to be inserted into or close to other transposon-like sequences. This large diversity of TcPLEs with generally low copy numbers suggests multiple invasions of the TcPLEs over a long evolutionary time without extensive multiplications or occurrence of rapid loss of TcPLEs copies.
Collapse
Affiliation(s)
- Jianjun Wang
- Department of Plant Protection, Yangzhou University, Yangzhou, China.
| | | | | | | | | |
Collapse
|
44
|
Zhou F, Tran T, Xu Y. Nezha, a novel active miniature inverted-repeat transposable element in cyanobacteria. Biochem Biophys Res Commun 2008; 365:790-4. [DOI: 10.1016/j.bbrc.2007.11.038] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2007] [Accepted: 11/09/2007] [Indexed: 11/16/2022]
|
45
|
Dufresne M, Hua-Van A, El Wahab HA, Ben M'Barek S, Vasnier C, Teysset L, Kema GHJ, Daboussi MJ. Transposition of a fungal miniature inverted-repeat transposable element through the action of a Tc1-like transposase. Genetics 2006; 175:441-52. [PMID: 17179071 PMCID: PMC1775018 DOI: 10.1534/genetics.106.064360] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
The mimp1 element previously identified in the ascomycete fungus Fusarium oxysporum has hallmarks of miniature inverted-repeat transposable elements (MITEs): short size, terminal inverted repeats (TIRs), structural homogeneity, and a stable secondary structure. Since mimp1 has no coding capacity, its mobilization requires a transposase-encoding element. On the basis of the similarity of TIRs and target-site preference with the autonomous Tc1-like element impala, together with a correlated distribution of both elements among the Fusarium genus, we investigated the ability of mimp1 to jump upon expression of the impala transposase provided in trans. Under these conditions, we present evidence that mimp1 transposes by a cut-and-paste mechanism into TA dinucleotides, which are duplicated upon insertion. Our results also show that mimp1 reinserts very frequently in genic regions for at least one-third of the cases. We also show that the mimp1/impala double-component system is fully functional in the heterologous species F. graminearum, allowing the development of a highly efficient tool for gene tagging in filamentous fungi.
Collapse
Affiliation(s)
- Marie Dufresne
- Institut de Génétique et Microbiologie, Université Paris-Sud, UMR8621, F-91405 Orsay, France
| | | | | | | | | | | | | | | |
Collapse
|
46
|
Holligan D, Zhang X, Jiang N, Pritham EJ, Wessler SR. The transposable element landscape of the model legume Lotus japonicus. Genetics 2006; 174:2215-28. [PMID: 17028332 PMCID: PMC1698628 DOI: 10.1534/genetics.106.062752] [Citation(s) in RCA: 80] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2006] [Accepted: 09/18/2006] [Indexed: 11/18/2022] Open
Abstract
The largest component of plant and animal genomes characterized to date is transposable elements (TEs). The availability of a significant amount of Lotus japonicus genome sequence has permitted for the first time a comprehensive study of the TE landscape in a legume species. Here we report the results of a combined computer-assisted and experimental analysis of the TEs in the 32.4 Mb of finished TAC clones. While computer-assisted analysis facilitated a determination of TE abundance and diversity, the availability of complete TAC sequences permitted identification of full-length TEs, which facilitated the design of tools for genomewide experimental analysis. In addition to containing all TE types found in previously characterized plant genomes, the TE component of L. japonicus contained several surprises. First, it is the second species (after Oryza sativa) found to be rich in Pack-MULEs, with >1000 elements that have captured and amplified gene fragments. In addition, we have identified what appears to be a legume-specific MULE family that was previously identified only in fungal species. Finally, the L. japonicus genome contains many hundreds, perhaps thousands of Sireviruses: Ty1/copia-like elements with an extra ORF. Significantly, several of the L. japonicus Sireviruses have recently amplified and may still be actively transposing.
Collapse
Affiliation(s)
- Dawn Holligan
- Department of Plant Biology, University of Georgia, Athens 30602, USA
| | | | | | | | | |
Collapse
|
47
|
Quesneville H, Nouaud D, Anxolabéhère D. P elements and MITE relatives in the whole genome sequence of Anopheles gambiae. BMC Genomics 2006; 7:214. [PMID: 16919158 PMCID: PMC1562414 DOI: 10.1186/1471-2164-7-214] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2006] [Accepted: 08/18/2006] [Indexed: 11/25/2022] Open
Abstract
Background Miniature Inverted-repeat Terminal Elements (MITEs), which are particular class-II transposable elements (TEs), play an important role in genome evolution, because they have very high copy numbers and display recurrent bursts of transposition. The 5' and 3' subterminal regions of a given MITE family often show a high sequence similarity with the corresponding regions of an autonomous Class-II TE family. However, the sustained presence over a prolonged evolutionary time of MITEs and TE master copies able to promote their mobility has been rarely reported within the same genome, and this raises fascinating evolutionary questions. Results We report here the presence of P transposable elements with related MITE families in the Anopheles gambiae genome. Using a TE annotation pipeline we have identified and analyzed all the P sequences in the sequenced A. gambiae PEST strain genome. More than 0.49% of the genome consists of P elements and derivates. P elements can be divided into 9 different subfamilies, separated by more than 30% of nucleotide divergence. Seven of them present full length copies. Ten MITE families are associated with 6 out of the 9 Psubfamilies. Comparing their intra-element nucleotide diversities and their structures allows us to propose the putative dynamics of their emergence. In particular, one MITE family which has a hybrid structure, with ends each of which is related to a different P-subfamily, suggests a new mechanism for their emergence and their mobility. Conclusion This work contributes to a greater understanding of the relationship between full-length class-II TEs and MITEs, in this case P elements and their derivatives in the genome of A. gambiae. Moreover, it provides the most comprehensive catalogue to date of P-like transposons in this genome and provides convincing yet indirect evidence that some of the subfamilies have been recently active.
Collapse
Affiliation(s)
- Hadi Quesneville
- Dynamique du Génome et Evolution, Institut Jacques Monod, CNRS, Universités P.M. Curie and D. Diderot 2, Place Jussieu, 75252 Paris, France
- Bioinformatics and Genomics Lab, Institut Jacques Monod, CNRS, Universités P.M. Curie and D. Diderot 2, Place Jussieu, 75252 Paris, France
| | - Danielle Nouaud
- Dynamique du Génome et Evolution, Institut Jacques Monod, CNRS, Universités P.M. Curie and D. Diderot 2, Place Jussieu, 75252 Paris, France
| | - Dominique Anxolabéhère
- Dynamique du Génome et Evolution, Institut Jacques Monod, CNRS, Universités P.M. Curie and D. Diderot 2, Place Jussieu, 75252 Paris, France
| |
Collapse
|
48
|
Klein RR, Klein PE, Mullet JE, Minx P, Rooney WL, Schertz KF. Fertility restorer locus Rf1 [corrected] of sorghum (Sorghum bicolor L.) encodes a pentatricopeptide repeat protein not present in the colinear region of rice chromosome 12. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2005; 111:994-1012. [PMID: 16078015 DOI: 10.1007/s00122-005-2011-y] [Citation(s) in RCA: 106] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2004] [Accepted: 03/17/2005] [Indexed: 05/03/2023]
Abstract
With an aim to clone the sorghum fertility restorer gene Rf1, a high-resolution genetic and physical map of the locus was constructed. The Rf1 locus was resolved to a 32-kb region spanning four open reading frames: a plasma membrane Ca(2+)-ATPase, a cyclin D-1, an unknown protein, and a pentatricopeptide repeat (PPR13) gene family member. An approximately 19-kb region spanning the cyclin D-1 and unknown protein genes was completely conserved between sterile and fertile plants as was the sequence spanning the coding region of the Ca(2+)-ATPase. In contrast, 19 sequence polymorphisms were located in an approximately 7-kb region spanning PPR13, and all markers cosegregated with the fertility restoration phenotype. PPR13 was predicted to encode a mitochondrial-targeted protein containing a single exon with 14 PPR repeats, and the protein is classified as an E-type PPR subfamily member. To permit sequence-based comparison of the sorghum and rice genomes in the Rf1 region, 0.53 Mb of sorghum chromosome 8 was sequenced and compared to the colinear region of rice chromosome 12. Genome comparison revealed a mosaic pattern of colinearity with an approximately 275-kb gene-poor region with little gene conservation and an adjacent, approximately 245-kb gene-rice region that is more highly conserved between rice and sorghum. Despite being located in a region of high gene conservation, sorghum PPR13 was not located in a colinear position on rice chromosome 12. The present results suggest that sorghum PPR13 represents a potential candidate for the sorghum Rf1 gene, and its presence in the sorghum genome indicates a single gene transposition event subsequent to the divergence of rice and sorghum ancestors.
Collapse
Affiliation(s)
- R R Klein
- Southern Plains Agricultural Research Center, USDA-ARS, College Station, TX 77845, USA.
| | | | | | | | | | | |
Collapse
|
49
|
Moreno-Vázquez S, Ning J, Meyers BC. hATpin, a family of MITE-like hAT mobile elements conserved in diverse plant species that forms highly stable secondary structures. PLANT MOLECULAR BIOLOGY 2005; 58:869-886. [PMID: 16240179 DOI: 10.1007/s11103-005-8271-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2005] [Accepted: 06/01/2005] [Indexed: 05/04/2023]
Abstract
We identified a 178 bp mobile DNA element in lettuce with characteristic CGAGC/GCTCG repeats in the subterminal regions. This element has terminal inverted repeats and 8-bp target site duplications typical of the hAT superfamily of class II mobile elements, but its small size and potential to form a single-stranded stable hairpin-like secondary structure suggest that it is related to MITE elements. In silico searches for related elements identified 252 plant sequences with 8-bp target site duplications and sequence similarity in their terminal and subterminal regions. Some of these sequences were predicted to encode transposases and may be autonomous elements; these constituted a separate clade within the phylogram of hAT transposases. We demonstrate that the CGAGC/GCTCG pentamer maximizes the hairpin stability compared to any other pentamer with the same C + G content, and the secondary structures of these elements are more stable than for most MITEs. We named these elements collectively as hATpin elements because of the hAT similarity and their hairpin structures. The nearly complete rice genome sequence and the highly advanced genome annotation allowed us to localize most rice elements and to deduce insertion preferences. hATpin elements are distributed on all chromosomes, but with significant bias for chromosomes 1 and 10 and in regions of moderate gene density. This family of class II mobile elements is found primarily in monocot species, but is also present in dicot species.
Collapse
Affiliation(s)
- Santiago Moreno-Vázquez
- Departamento de Biología Vegetal, E.T.S. Ingenieros Agrónomos, Universidad Politécnica de Madrid, Ciudad Universitaria, 28040, Madrid, Spain
| | - Jianchang Ning
- Delaware Biotechnology Institute, University of Delaware, 19711, Newark, DE, USA
| | - Blake C Meyers
- Delaware Biotechnology Institute, University of Delaware, 19711, Newark, DE, USA.
- Department of Plant and Soil Sciences, University of Delaware, 19714, Newark, DE, USA.
| |
Collapse
|
50
|
The changing tails of a novel short interspersed element in Aedes aegypti: genomic evidence for slippage retrotransposition and the relationship between 3' tandem repeats and the poly(dA) tail. Genetics 2005; 168:2037-47. [PMID: 15611173 DOI: 10.1534/genetics.104.032045] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A novel family of tRNA-related SINEs named gecko was discovered in the yellow fever mosquito, Aedes aegypti. Approximately 7200 copies of gecko were distributed in the A. aegypti genome with a significant bias toward A + T-rich regions. The 3' end of gecko is similar in sequence and identical in secondary structure to the 3' end of MosquI, a non-LTR retrotransposon in A. aegypti. Nine conserved substitutions and a deletion separate gecko into two groups. Group I includes all gecko that end with poly(dA) and a copy that ends with AGAT repeats. Group II comprises gecko elements that end with CCAA or CAAT repeats. Members within each group cannot be differentiated when the 3' repeats are excluded in phylogenetic and sequence analyses, suggesting that the alterations of 3' tails are recent. Imperfect poly(dA) tail was recorded in group I and partial replication of the 3' tandem repeats was frequently observed in group II. Genomic evidence underscores the importance of slippage retrotransposition in the alteration and expansion of the tandem repeat during the evolution of gecko sequences, although we do not rule out postinsertion mechanisms that were previously invoked to explain the evolution of Alu-associated microsatellites. We propose that the 3' tandem repeats and the poly(dA) tail may be generated by similar mechanisms during retrotransposition of both SINEs and non-LTR retrotransposons and thus the distinction between poly(dA) retrotransposons such as L1 and non-poly(dA) retrotransposons such as I factor may not be informative.
Collapse
|