1
|
RepEnTools: an automated repeat enrichment analysis package for ChIP-seq data reveals hUHRF1 Tandem-Tudor domain enrichment in young repeats. Mob DNA 2024; 15:6. [PMID: 38570859 PMCID: PMC10988844 DOI: 10.1186/s13100-024-00315-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 03/05/2024] [Indexed: 04/05/2024] Open
Abstract
BACKGROUND Repeat elements (REs) play important roles for cell function in health and disease. However, RE enrichment analysis in short-read high-throughput sequencing (HTS) data, such as ChIP-seq, is a challenging task. RESULTS Here, we present RepEnTools, a software package for genome-wide RE enrichment analysis of ChIP-seq and similar chromatin pulldown experiments. Our analysis package bundles together various software with carefully chosen and validated settings to provide a complete solution for RE analysis, starting from raw input files to tabular and graphical outputs. RepEnTools implementations are easily accessible even with minimal IT skills (Galaxy/UNIX). To demonstrate the performance of RepEnTools, we analysed chromatin pulldown data by the human UHRF1 TTD protein domain and discovered enrichment of TTD binding on young primate and hominid specific polymorphic repeats (SVA, L1PA1/L1HS) overlapping known enhancers and decorated with H3K4me1-K9me2/3 modifications. We corroborated these new bioinformatic findings with experimental data by qPCR assays using newly developed primate and hominid specific qPCR assays which complement similar research tools. Finally, we analysed mouse UHRF1 ChIP-seq data with RepEnTools and showed that the endogenous mUHRF1 protein colocalizes with H3K4me1-H3K9me3 on promoters of REs which were silenced by UHRF1. These new data suggest a functional role for UHRF1 in silencing of REs that is mediated by TTD binding to the H3K4me1-K9me3 double mark and conserved in two mammalian species. CONCLUSIONS RepEnTools improves the previously available programmes for RE enrichment analysis in chromatin pulldown studies by leveraging new tools, enhancing accessibility and adding some key functions. RepEnTools can analyse RE enrichment rapidly, efficiently, and accurately, providing the community with an up-to-date, reliable and accessible tool for this important type of analysis.
Collapse
|
2
|
Towards targeting transposable elements for cancer therapy. Nat Rev Cancer 2024; 24:123-140. [PMID: 38228901 DOI: 10.1038/s41568-023-00653-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/04/2023] [Indexed: 01/18/2024]
Abstract
Transposable elements (TEs) represent almost half of the human genome. Historically deemed 'junk DNA', recent technological advancements have stimulated a wave of research into the functional impact of TEs on gene-regulatory networks in evolution and development, as well as in diseases including cancer. The genetic and epigenetic evolution of cancer involves the exploitation of TEs, whereby TEs contribute directly to cancer-specific gene activities. This Review provides a perspective on the role of TEs in cancer as being a 'double-edged sword', both promoting cancer evolution and representing a vulnerability that could be exploited in cancer therapy. We discuss how TEs affect transcriptome regulation and other cellular processes in cancer. We highlight the potential of TEs as therapeutic targets for cancer. We also summarize technical hurdles in the characterization of TEs with genomic assays. Last, we outline open questions and exciting future research avenues.
Collapse
|
3
|
Transposable elements mediate genetic effects altering the expression of nearby genes in colorectal cancer. Nat Commun 2024; 15:749. [PMID: 38272908 PMCID: PMC10811328 DOI: 10.1038/s41467-023-42405-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 10/10/2023] [Indexed: 01/27/2024] Open
Abstract
Transposable elements (TEs) are prevalent repeats in the human genome, play a significant role in the regulome, and their disruption can contribute to tumorigenesis. However, TE influence on gene expression in cancer remains unclear. Here, we analyze 275 normal colon and 276 colorectal cancer samples from the SYSCOL cohort, discovering 10,231 and 5,199 TE-expression quantitative trait loci (eQTLs) in normal and tumor tissues, respectively, of which 376 are colorectal cancer specific eQTLs, likely due to methylation changes. Tumor-specific TE-eQTLs show greater enrichment of transcription factors, compared to shared TE-eQTLs suggesting specific regulation of their expression in tumor. Bayesian networks reveal 1,766 TEs as mediators of genetic effects, altering the expression of 1,558 genes, including 55 known cancer driver genes and show that tumor-specific TE-eQTLs trigger the driver capability of TEs. These insights expand our knowledge of cancer drivers, deepening our understanding of tumorigenesis and presenting potential avenues for therapeutic interventions.
Collapse
|
4
|
Using long-read CAGE sequencing to profile cryptic-promoter-derived transcripts and their contribution to the immunopeptidome. Genome Res 2023; 33:gr.277061.122. [PMID: 38065624 PMCID: PMC10760525 DOI: 10.1101/gr.277061.122] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Accepted: 11/13/2023] [Indexed: 01/04/2024]
Abstract
Recent studies have shown that the noncoding genome can produce unannotated proteins as antigens that induce immune response. One major source of this activity is the aberrant epigenetic reactivation of transposable elements (TEs). In tumors, TEs often provide cryptic or alternate promoters, which can generate transcripts that encode tumor-specific unannotated proteins. Thus, TE-derived transcripts (TE transcripts) have the potential to produce tumor-specific, but recurrent, antigens shared among many tumors. Identification of TE-derived tumor antigens holds the promise to improve cancer immunotherapy approaches; however, current genomics and computational tools are not optimized for their detection. Here we combined CAGE technology with full-length long-read transcriptome sequencing (long-read CAGE, or LRCAGE) and developed a suite of computational tools to significantly improve immunopeptidome detection by incorporating TE and other tumor transcripts into the proteome database. By applying our methods to human lung cancer cell line H1299 data, we show that long-read technology significantly improves mapping of promoters with low mappability scores and that LRCAGE guarantees accurate construction of uncharacterized 5' transcript structure. Augmenting a reference proteome database with newly characterized transcripts enabled us to detect noncanonical antigens from HLA-pulldown LC-MS/MS data. Lastly, we show that epigenetic treatment increased the number of noncanonical antigens, particularly those encoded by TE transcripts, which might expand the pool of targetable antigens for cancers with low mutational burden.
Collapse
|
5
|
Statistical learning quantifies transposable element-mediated cis-regulation. Genome Biol 2023; 24:258. [PMID: 37950299 PMCID: PMC10637000 DOI: 10.1186/s13059-023-03085-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 10/09/2023] [Indexed: 11/12/2023] Open
Abstract
BACKGROUND Transposable elements (TEs) have colonized the genomes of most metazoans, and many TE-embedded sequences function as cis-regulatory elements (CREs) for genes involved in a wide range of biological processes from early embryogenesis to innate immune responses. Because of their repetitive nature, TEs have the potential to form CRE platforms enabling the coordinated and genome-wide regulation of protein-coding genes by only a handful of trans-acting transcription factors (TFs). RESULTS Here, we directly test this hypothesis through mathematical modeling and demonstrate that differences in expression at protein-coding genes alone are sufficient to estimate the magnitude and significance of TE-contributed cis-regulatory activities, even in contexts where TE-derived transcription fails to do so. We leverage hundreds of overexpression experiments and estimate that, overall, gene expression is influenced by TE-embedded CREs situated within approximately 500 kb of promoters. Focusing on the cis-regulatory potential of TEs within the gene regulatory network of human embryonic stem cells, we find that pluripotency-specific and evolutionarily young TE subfamilies can be reactivated by TFs involved in post-implantation embryogenesis. Finally, we show that TE subfamilies can be split into truly regulatorily active versus inactive fractions based on additional information such as matched epigenomic data, observing that TF binding may better predict TE cis-regulatory activity than differences in histone marks. CONCLUSION Our results suggest that TE-embedded CREs contribute to gene regulation during and beyond gastrulation. On a methodological level, we provide a statistical tool that infers TE-dependent cis-regulation from RNA-seq data alone, thus facilitating the study of TEs in the next-generation sequencing era.
Collapse
|
6
|
Immunogenicity in renal cell carcinoma: shifting focus to alternative sources of tumour-specific antigens. Nat Rev Nephrol 2023; 19:440-450. [PMID: 36973495 PMCID: PMC10801831 DOI: 10.1038/s41581-023-00700-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/27/2023] [Indexed: 03/29/2023]
Abstract
Renal cell carcinoma (RCC) comprises a group of malignancies arising from the kidney with unique tumour-specific antigen (TSA) signatures that can trigger cytotoxic immunity. Two classes of TSAs are now considered potential drivers of immunogenicity in RCC: small-scale insertions and deletions (INDELs) that result in coding frameshift mutations, and activation of human endogenous retroviruses. The presence of neoantigen-specific T cells is a hallmark of solid tumours with a high mutagenic burden, which typically have abundant TSAs owing to non-synonymous single nucleotide variations within the genome. However, RCC exhibits high cytotoxic T cell reactivity despite only having an intermediate non-synonymous single nucleotide variation mutational burden. Instead, RCC tumours have a high pan-cancer proportion of INDEL frameshift mutations, and coding frameshift INDELs are associated with high immunogenicity. Moreover, cytotoxic T cells in RCC subtypes seem to recognize tumour-specific endogenous retrovirus epitopes, whose presence is associated with clinical responses to immune checkpoint blockade therapy. Here, we review the distinct molecular landscapes in RCC that promote immunogenic responses, discuss clinical opportunities for discovery of biomarkers that can inform therapeutic immune checkpoint blockade strategies, and identify gaps in knowledge for future investigations.
Collapse
|
7
|
The peptide woods are lovely, dark and deep: Hunting for novel cancer antigens. Semin Immunol 2023; 67:101758. [PMID: 37027981 DOI: 10.1016/j.smim.2023.101758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 03/22/2023] [Accepted: 03/22/2023] [Indexed: 04/08/2023]
Abstract
Harnessing the patient's immune system to control a tumor is a proven avenue for cancer therapy. T cell therapies as well as therapeutic vaccines, which target specific antigens of interest, are being explored as treatments in conjunction with immune checkpoint blockade. For these therapies, selecting the best suited antigens is crucial. Most of the focus has thus far been on neoantigens that arise from tumor-specific somatic mutations. Although there is clear evidence that T-cell responses against mutated neoantigens are protective, the large majority of these mutations are not immunogenic. In addition, most somatic mutations are unique to each individual patient and their targeting requires the development of individualized approaches. Therefore, novel antigen types are needed to broaden the scope of such treatments. We review high throughput approaches for discovering novel tumor antigens and some of the key challenges associated with their detection, and discuss considerations when selecting tumor antigens to target in the clinic.
Collapse
|
8
|
Mate Pair Sequencing: Next-Generation Sequencing for Structural Variant Detection. Methods Mol Biol 2023; 2621:127-149. [PMID: 37041444 DOI: 10.1007/978-1-0716-2950-5_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/13/2023]
Abstract
Structural variant detection by next-generation sequencing (NGS) methods have a higher molecular resolution than conventional cytogenetic techniques (Aypar et al., Eur J Haematol 102(1):87-96, 2019; Smadbeck et al., Blood Cancer J 9(12):103, 2019) and are particularly helpful in characterizing genomic rearrangements. Mate pair sequencing (MPseq) leverages a unique library preparation chemistry involving circularization of long DNA fragments, allowing for a unique application of paired-end sequencing of reads that are expected to map 2-5 kb apart in the genome. The unique orientation of the reads allows the user to estimate the location of breakpoints involved in a structural variant either within the sequenced reads or between the two reads. The precision of structural variant and copy number detection by this method allows for characterization of cryptic and complex rearrangements that may be otherwise undetectable by conventional cytogenetic methods (Singh et al., Leuk Lymphoma 60(5):1304-1307, 2019; Peterson et al., Blood Adv 3(8):1298-1302, 2019; Schultz et al., Leuk Lymphoma 61(4):975-978, 2020; Peterson et al., Mol Case Studies 5(2), 2019; Peterson et al., Mol Case Studies 5(3), 2019).
Collapse
|
9
|
A review of strategies used to identify transposition events in plant genomes. FRONTIERS IN PLANT SCIENCE 2022; 13:1080993. [PMID: 36531345 PMCID: PMC9751208 DOI: 10.3389/fpls.2022.1080993] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Accepted: 11/17/2022] [Indexed: 06/17/2023]
Abstract
Transposable elements (TEs) were initially considered redundant and dubbed 'junk DNA'. However, more recently they were recognized as an essential element of genome plasticity. In nature, they frequently become active upon exposition of the host to stress conditions. Even though most transposition events are neutral or even deleterious, occasionally they may happen to be beneficial, resulting in genetic novelty providing better fitness to the host. Hence, TE mobilization may promote adaptability and, in the long run, act as a significant evolutionary force. There are many examples of TE insertions resulting in increased tolerance to stresses or in novel features of crops which are appealing to the consumer. Possibly, TE-driven de novo variability could be utilized for crop improvement. However, in order to systematically study the mechanisms of TE/host interactions, it is necessary to have suitable tools to globally monitor any ongoing TE mobilization. With the development of novel potent technologies, new high-throughput strategies for studying TE dynamics are emerging. Here, we present currently available methods applied to monitor the activity of TEs in plants. We divide them on the basis of their operational principles, the position of target molecules in the process of transposition and their ability to capture real cases of actively transposing elements. Their possible theoretical and practical drawbacks are also discussed. Finally, conceivable strategies and combinations of methods resulting in an improved performance are proposed.
Collapse
|
10
|
The HUSH complex controls brain architecture and protocadherin fidelity. SCIENCE ADVANCES 2022; 8:eabo7247. [PMID: 36332029 PMCID: PMC9635835 DOI: 10.1126/sciadv.abo7247] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 09/19/2022] [Indexed: 06/16/2023]
Abstract
The HUSH (human silencing hub) complex contains the H3K9me3 binding protein M-phase phosphoprotein 8 (MPP8) and recruits the histone methyltransferase SETDB1 as well as Microrchidia CW-type zinc finger protein 2 (MORC2). Functional and mechanistic studies of the HUSH complex have hitherto been centered around SETDB1 while the in vivo functions of MPP8 and MORC2 remain elusive. Here, we show that genetic inactivation of Mphosph8 or Morc2a in the nervous system of mice leads to increased brain size, altered brain architecture, and behavioral changes. Mechanistically, in both mouse brains and human cerebral organoids, MPP8 and MORC2 suppress the repetitive-like protocadherin gene cluster in an H3K9me3-dependent manner. Our data identify MPP8 and MORC2, previously linked to silencing of repetitive elements via the HUSH complex, as key epigenetic regulators of protocadherin expression in the nervous system and thereby brain development and neuronal individuality in mice and humans.
Collapse
|
11
|
SoloTE for improved analysis of transposable elements in single-cell RNA-Seq data using locus-specific expression. Commun Biol 2022; 5:1063. [PMID: 36202992 PMCID: PMC9537157 DOI: 10.1038/s42003-022-04020-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 09/21/2022] [Indexed: 11/08/2022] Open
Abstract
Transposable Elements (TEs) contribute to the repetitive fraction in almost every eukaryotic genome known to date, and their transcriptional activation can influence the expression of neighboring genes in healthy and disease states. Single cell RNA-Seq (scRNA-Seq) is a technical advance that allows the study of gene expression on a cell-by-cell basis. Although a current computational approach is available for the single cell analysis of TE expression, it omits their genomic location. Here we show SoloTE, a pipeline that outperforms the previous approach in terms of computational resources and by allowing the inclusion of locus-specific TE activity in scRNA-Seq expression matrixes. We then apply SoloTE to several datasets to reveal the repertoire of TEs that become transcriptionally active in different cell groups, and based on their genomic location, we predict their potential impact on gene expression. As our tool takes as input the resulting files from standard scRNA-Seq processing pipelines, we expect it to be widely adopted in single cell studies to help researchers discover patterns of cellular diversity associated with TE expression.
Collapse
|
12
|
Recent Bioinformatic Progress to Identify Epigenetic Changes Associated to Transposable Elements. Front Genet 2022; 13:891194. [PMID: 35646069 PMCID: PMC9140218 DOI: 10.3389/fgene.2022.891194] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Accepted: 04/25/2022] [Indexed: 11/13/2022] Open
Abstract
Transposable elements (TEs) are recognized for their great impact on the functioning and evolution of their host genomes. They are associated to various deleterious effects, which has led to the evolution of regulatory epigenetic mechanisms to control their activity. Despite these negative effects, TEs are also important actors in the evolution of genomes by promoting genetic diversity and new regulatory elements. Consequently, it is important to study the epigenetic modifications associated to TEs especially at a locus-specific level to determine their individual influence on gene functioning. To this aim, this short review presents the current bioinformatic tools to achieve this task.
Collapse
|
13
|
RNA transcription and degradation of Alu retrotransposons depends on sequence features and evolutionary history. G3 GENES|GENOMES|GENETICS 2022; 12:6543614. [PMID: 35253846 PMCID: PMC9073682 DOI: 10.1093/g3journal/jkac054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 02/25/2022] [Indexed: 11/16/2022]
Abstract
Alu elements are one of the most successful groups of RNA retrotransposons and make up 11% of the human genome with over 1 million individual loci. They are linked to genetic defects, increases in sequence diversity, and influence transcriptional activity. Still, their RNA metabolism is poorly understood yet. It is even unclear whether Alu elements are mostly transcribed by RNA Polymerase II or III. We have conducted a transcription shutoff experiment by α-amanitin and metabolic RNA labeling by 4-thiouridine combined with RNA fragmentation (TT-seq) and RNA-seq to shed further light on the origin and life cycle of Alu transcripts. We find that Alu RNAs are more stable than previously thought and seem to originate in part from RNA Polymerase II activity, as previous reports suggest. Their expression however seems to be independent of the transcriptional activity of adjacent genes. Furthermore, we have developed a novel statistical test for detecting the expression of quantitative trait loci in Alu elements that relies on the de Bruijn graph representation of all Alu sequences. It controls for both statistical significance and biological relevance using a tuned k-mer representation, discovering influential sequence features missed by regular motif search. In addition, we discover several point mutations using a generalized linear model, and motifs of interest, which also match transcription factor-binding motifs.
Collapse
|
14
|
LncRNA Biomarkers of Inflammation and Cancer. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1363:121-145. [PMID: 35220568 DOI: 10.1007/978-3-030-92034-0_7] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/09/2022]
Abstract
Long noncoding RNAs (lncRNAs) are promising candidates as biomarkers of inflammation and cancer. LncRNAs have several properties that make them well-suited as molecular markers of disease: (1) many lncRNAs are expressed in a tissue-specific manner, (2) distinct lncRNAs are upregulated based on different inflammatory or oncogenic stimuli, (3) lncRNAs released from cells are packaged and protected in extracellular vesicles, and (4) circulating lncRNAs in the blood are detectable using various RNA sequencing approaches. Here we focus on the potential for lncRNA biomarkers to detect inflammation and cancer, highlighting key biological, technological, and analytical considerations that will help advance the development of lncRNA-based liquid biopsies.
Collapse
|
15
|
Locus-specific expression analysis of transposable elements. Brief Bioinform 2021; 23:6400501. [PMID: 34664075 PMCID: PMC8769692 DOI: 10.1093/bib/bbab417] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Revised: 08/24/2021] [Accepted: 09/10/2021] [Indexed: 11/16/2022] Open
Abstract
Transposable elements (TEs) have been associated with many, frequently detrimental, biological roles. Consequently, the regulations of TEs, e.g. via DNA-methylation and histone modifications, are considered critical for maintaining genomic integrity and other functions. Still, the high-throughput study of TEs is usually limited to the family or consensus-sequence level because of alignment problems prompted by high-sequence similarities and short read lengths. To entirely comprehend the effects and reasons of TE expression, however, it is necessary to assess the TE expression at the level of individual instances. Our simulation study demonstrates that sequence similarities and short read lengths do not rule out the accurate assessment of (differential) expression of TEs at the instance-level. With only slight modifications to existing methods, TE expression analysis works surprisingly well for conventional paired-end sequencing data. We find that SalmonTE and Telescope can accurately tally a considerable amount of TE instances, allowing for differential expression recovery in model and non-model organisms.
Collapse
|
16
|
The Effects of GC-Biased Gene Conversion on Patterns of Genetic Diversity among and across Butterfly Genomes. Genome Biol Evol 2021; 13:evab064. [PMID: 33760095 PMCID: PMC8175052 DOI: 10.1093/gbe/evab064] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/22/2021] [Indexed: 12/28/2022] Open
Abstract
Recombination reshuffles the alleles of a population through crossover and gene conversion. These mechanisms have considerable consequences on the evolution and maintenance of genetic diversity. Crossover, for example, can increase genetic diversity by breaking the linkage between selected and nearby neutral variants. Bias in favor of G or C alleles during gene conversion may instead promote the fixation of one allele over the other, thus decreasing diversity. Mutation bias from G or C to A and T opposes GC-biased gene conversion (gBGC). Less recognized is that these two processes may-when balanced-promote genetic diversity. Here, we investigate how gBGC and mutation bias shape genetic diversity patterns in wood white butterflies (Leptidea sp.). This constitutes the first in-depth investigation of gBGC in butterflies. Using 60 resequenced genomes from six populations of three species, we find substantial variation in the strength of gBGC across lineages. When modeling the balance of gBGC and mutation bias and comparing analytical results with empirical data, we reject gBGC as the main determinant of genetic diversity in these butterfly species. As alternatives, we consider linked selection and GC content. We find evidence that high values of both reduce diversity. We also show that the joint effects of gBGC and mutation bias can give rise to a diversity pattern which resembles the signature of linked selection. Consequently, gBGC should be considered when interpreting the effects of linked selection on levels of genetic diversity.
Collapse
|
17
|
Activation of HERV-K(HML-2) disrupts cortical patterning and neuronal differentiation by increasing NTRK3. Cell Stem Cell 2021; 28:1566-1581.e8. [PMID: 33951478 DOI: 10.1016/j.stem.2021.04.009] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 03/05/2021] [Accepted: 04/12/2021] [Indexed: 12/20/2022]
Abstract
The biological function and disease association of human endogenous retroviruses (HERVs) are largely elusive. HERV-K(HML-2) has been associated with neurotoxicity, but there is no clear understanding of its role or mechanistic basis. We addressed the physiological functions of HERV-K(HML-2) in neuronal differentiation using CRISPR engineering to activate or repress its expression levels in a human-pluripotent-stem-cell-based system. We found that elevated HERV-K(HML-2) transcription is detrimental for the development and function of cortical neurons. These effects are cell-type-specific, as dopaminergic neurons are unaffected. Moreover, high HERV-K(HML-2) transcription alters cortical layer formation in forebrain organoids. HERV-K(HML-2) transcriptional activation leads to hyperactivation of NTRK3 expression and other neurodegeneration-related genes. Direct activation of NTRK3 phenotypically resembles HERV-K(HML-2) induction, and reducing NTRK3 levels in context of HERV-K(HML-2) induction restores cortical neuron differentiation. Hence, these findings unravel a cell-type-specific role for HERV-K(HML-2) in cortical neuron development.
Collapse
|
18
|
Integrated transcription factor profiling with transcriptome analysis identifies L1PA2 transposons as global regulatory modulators in a breast cancer model. Sci Rep 2021; 11:8083. [PMID: 33850167 PMCID: PMC8044218 DOI: 10.1038/s41598-021-86395-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Accepted: 02/26/2021] [Indexed: 12/13/2022] Open
Abstract
While transposons are generally silenced in somatic tissues, many transposons escape epigenetic repression in epithelial cancers, become transcriptionally active and contribute to the regulation of human gene expression. We have developed a bioinformatic pipeline for the integrated analysis of transcription factor binding and transcriptomic data to identify transposon-derived promoters that are activated in specific diseases and developmental states. We applied this pipeline to a breast cancer model, and found that the L1PA2 transposon subfamily contributes abundant regulatory sequences to co-ordinated transcriptional regulation in breast cancer. Transcription factor profiling demonstrates that over 27% of L1PA2 transposons harbour co-localised binding sites of functionally interacting, cancer-associated transcription factors in MCF7 cells, a cell line used to model breast cancer. Transcriptomic analysis reveals that L1PA2 transposons also contribute transcription start sites to up-regulated transcripts in MCF7 cells, including some transcripts with established oncogenic properties. In addition, we verified the utility of our pipeline on other transposon subfamilies, as well as on leukemia and lung carcinoma cell lines. We demonstrate that the normally quiescent regulatory activities of transposons can be activated and alter the cancer transcriptome. In particular, the L1PA2 subfamily contributes abundant regulatory sequences, and likely plays a global role in modulating breast cancer transcriptional regulation. Understanding the regulatory impact of L1PA2 on breast cancer genomes provides additional insights into cancer genome regulation, and may provide novel biomarkers for disease diagnosis, prognosis and therapy.
Collapse
|
19
|
Abstract
Transposable elements (TEs) are insertional mutagens that contribute greatly to the plasticity of eukaryotic genomes, influencing the evolution and adaptation of species as well as physiology or disease in individuals. Measuring TE expression helps to understand not only when and where TE mobilization can occur but also how this process alters gene expression, chromatin accessibility or cellular signalling pathways. Although genome-wide gene expression assays such as RNA sequencing include transposon-derived transcripts, most computational analytical tools discard or misinterpret TE-derived reads. Emerging approaches are improving the identification of expressed TE loci and helping to discriminate TE transcripts that permit TE mobilization from chimeric gene-TE transcripts or pervasive transcription. Here we review the main challenges associated with the detection of TE expression, including mappability, insertional and internal sequence polymorphisms, and the diversity of the TE transcriptional landscape, as well as the different experimental and computational strategies to solve them.
Collapse
|
20
|
Abstract
Next-generation sequencing approaches have fundamentally changed the types of questions that can be asked about gene function and regulation. With the goal of approaching truly genome-wide quantifications of all the interaction partners and downstream effects of particular genes, these quantitative assays have allowed for an unprecedented level of detail in exploring biological interactions. However, many challenges remain in our ability to accurately describe and quantify the interactions that take place in those hard to reach and extremely repetitive regions of our genome comprised mostly of transposable elements (TEs). Tools dedicated to TE-derived sequences have lagged behind, making the inclusion of these sequences in genome-wide analyses difficult. Recent improvements, both computational and experimental, allow for the better inclusion of TE sequences in genomic assays and a renewed appreciation for the importance of TE biology. This review will discuss the recent improvements that have been made in the computational analysis of TE-derived sequences as well as the areas where such analysis still proves difficult. This article is part of a discussion meeting issue 'Crossroads between transposons and gene regulation'.
Collapse
|
21
|
Specific subfamilies of transposable elements contribute to different domains of T lymphocyte enhancers. Proc Natl Acad Sci U S A 2020; 117:7905-7916. [PMID: 32193341 PMCID: PMC7148579 DOI: 10.1073/pnas.1912008117] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Transposable elements (TEs) compose nearly half of mammalian genomes and provide building blocks for cis-regulatory elements. Using high-throughput sequencing, we show that 84 TE subfamilies are overrepresented, and distributed in a lineage-specific fashion in core and boundary domains of CD8+ T cell enhancers. Endogenous retroviruses are most significantly enriched in core domains with accessible chromatin, and bear recognition motifs for immune-related transcription factors. In contrast, short interspersed elements (SINEs) are preferentially overrepresented in nucleosome-containing boundaries. A substantial proportion of these SINEs harbor a high density of the enhancer-specific histone mark H3K4me1 and carry sequences that match enhancer boundary nucleotide composition. Motifs with regulatory features are better preserved within enhancer-enriched TE copies compared to their subfamily equivalents located in gene deserts. TE-rich and TE-poor enhancers associate with both shared and unique gene groups and are enriched in overlapping functions related to lymphocyte and leukocyte biology. The majority of T cell enhancers are shared with other immune lineages and are accessible in common hematopoietic progenitors. A higher proportion of immune tissue-specific enhancers are TE-rich compared to enhancers specific to other tissues, correlating with higher TE occurrence in immune gene-associated genomic regions. Our results suggest that during evolution, TEs abundant in these regions and carrying motifs potentially beneficial for enhancer architecture and immune functions were particularly frequently incorporated by evolving enhancers. Their putative selection and regulatory cooption may have accelerated the evolution of immune regulatory networks.
Collapse
|
22
|
Tools and best practices for retrotransposon analysis using high-throughput sequencing data. Mob DNA 2019; 10:52. [PMID: 31890048 PMCID: PMC6935493 DOI: 10.1186/s13100-019-0192-1] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Accepted: 12/04/2019] [Indexed: 12/26/2022] Open
Abstract
Background Sequencing technologies give access to a precise picture of the molecular mechanisms acting upon genome regulation. One of the biggest technical challenges with sequencing data is to map millions of reads to a reference genome. This problem is exacerbated when dealing with repetitive sequences such as transposable elements that occupy half of the mammalian genome mass. Sequenced reads coming from these regions introduce ambiguities in the mapping step. Therefore, applying dedicated parameters and algorithms has to be taken into consideration when transposable elements regulation is investigated with sequencing datasets. Results Here, we used simulated reads on the mouse and human genomes to define the best parameters for aligning transposable element-derived reads on a reference genome. The efficiency of the most commonly used aligners was compared and we further evaluated how transposable element representation should be estimated using available methods. The mappability of the different transposon families in the mouse and the human genomes was calculated giving an overview into their evolution. Conclusions Based on simulated data, we provided recommendations on the alignment and the quantification steps to be performed when transposon expression or regulation is studied, and identified the limits in detecting specific young transposon families of the mouse and human genomes. These principles may help the community to adopt standard procedures and raise awareness of the difficulties encountered in the study of transposable elements.
Collapse
|
23
|
Transcriptome analyses of tumor-adjacent somatic tissues reveal genes co-expressed with transposable elements. Mob DNA 2019; 10:39. [PMID: 31497073 PMCID: PMC6720085 DOI: 10.1186/s13100-019-0180-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2018] [Accepted: 08/14/2019] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Despite the long-held assumption that transposons are normally only expressed in the germ-line, recent evidence shows that transcripts of transposable element (TE) sequences are frequently found in the somatic cells. However, the extent of variation in TE transcript levels across different tissues and different individuals are unknown, and the co-expression between TEs and host gene mRNAs have not been examined. RESULTS Here we report the variation in TE derived transcript levels across tissues and between individuals observed in the non-tumorous tissues collected for The Cancer Genome Atlas. We found core TE co-expression modules consisting mainly of transposons, showing correlated expression across broad classes of TEs. Despite this co-expression within tissues, there are individual TE loci that exhibit tissue-specific expression patterns, when compared across tissues. The core TE modules were negatively correlated with other gene modules that consisted of immune response genes in interferon signaling. KRAB Zinc Finger Proteins (KZFPs) were over-represented gene members of the TE modules, showing positive correlation across multiple tissues. But we did not find overlap between TE-KZFP pairs that are co-expressed and TE-KZFP pairs that are bound in published ChIP-seq studies. CONCLUSIONS We find unexpected variation in TE derived transcripts, within and across non-tumorous tissues. We describe a broad view of the RNA state for non-tumorous tissues exhibiting higher level of TE transcripts. Tissues with higher level of TE transcripts have a broad range of TEs co-expressed, with high expression of a large number of KZFPs, and lower RNA levels of immune genes.
Collapse
|