1
|
Ebert P, Audano PA, Zhu Q, Rodriguez-Martin B, Porubsky D, Bonder MJ, Sulovari A, Ebler J, Zhou W, Serra Mari R, Yilmaz F, Zhao X, Hsieh P, Lee J, Kumar S, Lin J, Rausch T, Chen Y, Ren J, Santamarina M, Höps W, Ashraf H, Chuang NT, Yang X, Munson KM, Lewis AP, Fairley S, Tallon LJ, Clarke WE, Basile AO, Byrska-Bishop M, Corvelo A, Evani US, Lu TY, Chaisson MJP, Chen J, Li C, Brand H, Wenger AM, Ghareghani M, Harvey WT, Raeder B, Hasenfeld P, Regier AA, Abel HJ, Hall IM, Flicek P, Stegle O, Gerstein MB, Tubio JMC, Mu Z, Li YI, Shi X, Hastie AR, Ye K, Chong Z, Sanders AD, Zody MC, Talkowski ME, Mills RE, Devine SE, Lee C, Korbel JO, Marschall T, Eichler EE. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 2021; 372:eabf7117. [PMID: 33632895 PMCID: PMC8026704 DOI: 10.1126/science.abf7117] [Citation(s) in RCA: 286] [Impact Index Per Article: 95.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 02/09/2021] [Indexed: 12/14/2022]
Abstract
Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.
Collapse
Affiliation(s)
- Peter Ebert
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
| | - Bernardo Rodriguez-Martin
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Marc Jan Bonder
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Jana Ebler
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Weichen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
| | - Rebecca Serra Mari
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Feyza Yilmaz
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
| | - Xuefang Zhao
- Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA 02114, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Joyce Lee
- Bionano Genomics, San Diego, CA 92121, USA
| | - Sushant Kumar
- Program in Computational Biology and Bioinformatics, Yale University, BASS 432 and 437, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Jiadong Lin
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, China
| | - Tobias Rausch
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Yu Chen
- Department of Genetics and Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Jingwen Ren
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Martin Santamarina
- Genomes and Disease, Centre for Research in Molecular Medicine and Chronic Diseases (CIMUS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
- Department of Zoology, Genetics, and Physical Anthropology, Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| | - Wolfram Höps
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Hufsah Ashraf
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Nelson T Chuang
- Institute for Genome Sciences, University of Maryland School of Medicine, 670 W Baltimore Street, Baltimore, MD 21201, USA
| | - Xiaofei Yang
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, China
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Luke J Tallon
- Institute for Genome Sciences, University of Maryland School of Medicine, 670 W Baltimore Street, Baltimore, MD 21201, USA
| | | | | | | | | | | | - Tsung-Yu Lu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Junjie Chen
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA
| | - Chong Li
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA
| | - Harrison Brand
- Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA 02114, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Aaron M Wenger
- Pacific Biosciences of California, Menlo Park, CA 94025, USA
| | - Maryam Ghareghani
- Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, 66123 Saarbrücken, Germany
- Saarbrücken Graduate School of Computer Science, Saarland University, Saarland Informatics Campus E1.3, 66123 Saarbrücken, Germany
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Benjamin Raeder
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Patrick Hasenfeld
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Allison A Regier
- Department of Medicine, Washington University, St. Louis, MO 63108, USA
| | - Haley J Abel
- Department of Medicine, Washington University, St. Louis, MO 63108, USA
| | - Ira M Hall
- Department of Genetics, Yale School of Medicine, 333 Cedar Street, New Haven, CT 06510, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Oliver Stegle
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Mark B Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, BASS 432 and 437, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Jose M C Tubio
- Genomes and Disease, Centre for Research in Molecular Medicine and Chronic Diseases (CIMUS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
- Department of Zoology, Genetics, and Physical Anthropology, Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| | - Zepeng Mu
- Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, IL 60637, USA
| | - Yang I Li
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Xinghua Shi
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA
| | | | - Kai Ye
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, China
- Department of Human Genetics, University of Michigan, 1241 E. Catherine Street, Ann Arbor, MI 48109, USA
| | - Zechen Chong
- Department of Genetics and Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Ashley D Sanders
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | | | - Michael E Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA 02114, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Ryan E Mills
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
- Department of Human Genetics, University of Michigan, 1241 E. Catherine Street, Ann Arbor, MI 48109, USA
| | - Scott E Devine
- Institute for Genome Sciences, University of Maryland School of Medicine, 670 W Baltimore Street, Baltimore, MD 21201, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA.
- Precision Medicine Center, The First Affiliated Hospital of Xi'an Jiaotong University, 277 West Yanta Road, Xi'an, 710061, Shaanxi, China
- Department of Graduate Studies-Life Sciences, Ewha Womans University, Ewhayeodae-gil, Seodaemun-gu, Seoul 120-750, South Korea
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tobias Marschall
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany.
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
2
|
Criscione SW, Theodosakis N, Micevic G, Cornish TC, Burns KH, Neretti N, Rodić N. Genome-wide characterization of human L1 antisense promoter-driven transcripts. BMC Genomics 2016; 17:463. [PMID: 27301971 PMCID: PMC4908685 DOI: 10.1186/s12864-016-2800-5] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2016] [Accepted: 05/26/2016] [Indexed: 11/23/2022] Open
Abstract
Background Long INterspersed Element-1 (LINE-1 or L1) is the only autonomously active, transposable element in the human genome. L1 sequences comprise approximately 17 % of the human genome, but only the evolutionarily recent, human-specific subfamily is retrotransposition competent. The L1 promoter has a bidirectional orientation containing a sense promoter that drives the transcription of two proteins required for retrotransposition and an antisense promoter. The L1 antisense promoter can drive transcription of chimeric transcripts: 5’ L1 antisense sequences spliced to the exons of neighboring genes. Results The impact of L1 antisense promoter activity on cellular transcriptomes is poorly understood. To investigate this, we analyzed GenBank ESTs for messenger RNAs that initiate in the L1 antisense promoter. We identified 988 putative L1 antisense chimeric transcripts, 911 of which have not been previously reported. These appear to be alternative genic transcripts, sense-oriented with respect to gene and initiating near, but typically downstream of, the gene transcriptional start site. In multiple cell lines, L1 antisense promoters display enrichment for YY1 transcription factor and histone modifications associated with active promoters. Global run-on sequencing data support the activity of the L1 antisense promoter. We independently detected 124 L1 antisense chimeric transcripts using long read Pacific Biosciences RNA-seq data. Furthermore, we validated four chimeric transcripts by quantitative RT-PCR and Sanger sequencing and demonstrated that they are readily detectable in many normal human tissues. Conclusions We present a comprehensive characterization of human L1 antisense promoter-driven transcripts and provide substantial evidence that they are transcribed in a variety of human cell-types. Our findings reveal a new wide-reaching aspect of L1 biology by identifying antisense transcripts affecting as many as 4 % of all human genes. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2800-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Steven W Criscione
- Department of Molecular Biology, Cell Biology, and Biochemistry, Center for Computational Molecular Biology, Brown University, Providence, RI, 02912, USA
| | - Nicholas Theodosakis
- Department of Pathology, Yale University, New Haven, CT, 06510, USA.,Department of Dermatology, Division of Dermatopathology, Yale University, New Haven, CT, 06510, USA
| | - Goran Micevic
- Department of Pathology, Yale University, New Haven, CT, 06510, USA.,Department of Dermatology, Division of Dermatopathology, Yale University, New Haven, CT, 06510, USA
| | - Toby C Cornish
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Kathleen H Burns
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, USA.,McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA.,Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, Baltimore, MD, USA.,High Throughput (HiT) Biology Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Nicola Neretti
- Department of Molecular Biology, Cell Biology, and Biochemistry, Center for Computational Molecular Biology, Brown University, Providence, RI, 02912, USA.
| | - Nemanja Rodić
- Department of Pathology, Yale University, New Haven, CT, 06510, USA. .,Department of Dermatology, Division of Dermatopathology, Yale University, New Haven, CT, 06510, USA.
| |
Collapse
|
3
|
Streva VA, Jordan VE, Linker S, Hedges DJ, Batzer MA, Deininger PL. Sequencing, identification and mapping of primed L1 elements (SIMPLE) reveals significant variation in full length L1 elements between individuals. BMC Genomics 2015; 16:220. [PMID: 25887476 PMCID: PMC4381410 DOI: 10.1186/s12864-015-1374-y] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2014] [Accepted: 02/20/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND There are over a half a million copies of L1 retroelements in the human genome which are responsible for as much as 0.5% of new human genetic diseases. Most new L1 inserts arise from young source elements that are polymorphic in the human genome. Highly active polymorphic "hot" L1 source elements have been shown to be capable of extremely high levels of mobilization and result in numerous instances of disease. Additionally, hot polymorphic L1s have been described to be highly active within numerous cancer genomes. These hot L1s result in mutagenesis by insertion of new L1 copies elsewhere in the genome, but also have been shown to generate additional full length L1 insertions which are also hot and able to further retrotranspose. Through this mechanism, hot L1s may amplify within a tumor and result in a continued cycle of mutagenesis. RESULTS AND CONCLUSIONS We have developed a method to detect full-length, polymorphic L1 elements using a targeted next generation sequencing approach, Sequencing Identification and Mapping of Primed L1 Elements (SIMPLE). SIMPLE has 94% sensitivity and detects nearly all full-length L1 elements in a genome. SIMPLE will allow researchers to identify hot mutagenic full-length L1s as potential drivers of genome instability. Using SIMPLE we find that the typical individual has approximately 100 non-reference, polymorphic L1 elements in their genome. These elements are at relatively low population frequencies relative to previously identified polymorphic L1 elements and demonstrate the tremendous diversity in potentially active L1 elements in the human population.
Collapse
Affiliation(s)
- Vincent A Streva
- Tulane Cancer Center and Department of Epidemiology, Tulane University, New Orleans, LA, USA. .,Present Address: Division of Infectious Diseases, Boston Children's Hospital and Harvard Medical School, Boston, MA, USA.
| | - Vallmer E Jordan
- Department of Biology, Louisiana State University, Baton Rouge, LA, USA.
| | - Sara Linker
- Department of Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA.
| | - Dale J Hedges
- Department of Internal Medicine, The Ohio State University, Columbus, OH, USA.
| | - Mark A Batzer
- Department of Biology, Louisiana State University, Baton Rouge, LA, USA.
| | - Prescott L Deininger
- Tulane Cancer Center and Department of Epidemiology, Tulane University, New Orleans, LA, USA.
| |
Collapse
|
4
|
Criscione SW, Zhang Y, Thompson W, Sedivy JM, Neretti N. Transcriptional landscape of repetitive elements in normal and cancer human cells. BMC Genomics 2014; 15:583. [PMID: 25012247 PMCID: PMC4122776 DOI: 10.1186/1471-2164-15-583] [Citation(s) in RCA: 173] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2014] [Accepted: 07/03/2014] [Indexed: 12/11/2022] Open
Abstract
Background Repetitive elements comprise at least 55% of the human genome with more recent estimates as high as two-thirds. Most of these elements are retrotransposons, DNA sequences that can insert copies of themselves into new genomic locations by a “copy and paste” mechanism. These mobile genetic elements play important roles in shaping genomes during evolution, and have been implicated in the etiology of many human diseases. Despite their abundance and diversity, few studies investigated the regulation of endogenous retrotransposons at the genome-wide scale, primarily because of the technical difficulties of uniquely mapping high-throughput sequencing reads to repetitive DNA. Results Here we develop a new computational method called RepEnrich to study genome-wide transcriptional regulation of repetitive elements. We show that many of the Long Terminal Repeat retrotransposons in humans are transcriptionally active in a cell line-specific manner. Cancer cell lines display increased RNA Polymerase II binding to retrotransposons than cell lines derived from normal tissue. Consistent with increased transcriptional activity of retrotransposons in cancer cells we found significantly higher levels of L1 retrotransposon RNA expression in prostate tumors compared to normal-matched controls. Conclusions Our results support increased transcription of retrotransposons in transformed cells, which may explain the somatic retrotransposition events recently reported in several types of cancers. Electronic Supplementary Material Supplementary material is available for this article at 10.1186/1471-2164-15-583 and is accessible for authorized users.
Collapse
Affiliation(s)
| | | | | | | | - Nicola Neretti
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI 02912, USA.
| |
Collapse
|
5
|
O'Donnell KA, Burns KH. Mobilizing diversity: transposable element insertions in genetic variation and disease. Mob DNA 2010; 1:21. [PMID: 20813032 PMCID: PMC2941744 DOI: 10.1186/1759-8753-1-21] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2010] [Accepted: 09/02/2010] [Indexed: 02/06/2023] Open
Abstract
Transposable elements (TEs) comprise a large fraction of mammalian genomes. A number of these elements are actively jumping in our genomes today. As a consequence, these insertions provide a source of genetic variation and, in rare cases, these events cause mutations that lead to disease. Yet, the extent to which these elements impact their host genomes is not completely understood. This review will summarize our current understanding of the mechanisms underlying transposon regulation and the contribution of TE insertions to genetic diversity in the germline and in somatic cells. Finally, traditional methods and emerging technologies for identifying transposon insertions will be considered.
Collapse
Affiliation(s)
- Kathryn A O'Donnell
- Department of Molecular Biology and Genetics, The Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| | | |
Collapse
|
6
|
Ewing AD, Kazazian HH. High-throughput sequencing reveals extensive variation in human-specific L1 content in individual human genomes. Genome Res 2010; 20:1262-70. [PMID: 20488934 DOI: 10.1101/gr.106419.110] [Citation(s) in RCA: 224] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Using high-throughput sequencing, we devised a technique to determine the insertion sites of virtually all members of the human-specific L1 retrotransposon family in any human genome. Using diagnostic nucleotides, we were able to locate the approximately 800 L1Hs copies corresponding specifically to the pre-Ta, Ta-0, and Ta-1 L1Hs subfamilies, with over 90% of sequenced reads corresponding to human-specific elements. We find that any two individual genomes differ at an average of 285 sites with respect to L1 insertion presence or absence. In total, we assayed 25 individuals, 15 of which are unrelated, at 1139 sites, including 772 shared with the reference genome and 367 nonreference L1 insertions. We show that L1Hs profiles recapitulate genetic ancestry, and determine the chromosomal distribution of these elements. Using these data, we estimate that the rate of L1 retrotransposition in humans is between 1/95 and 1/270 births, and the number of dimorphic L1 elements in the human population with gene frequencies greater than 0.05 is between 3000 and 10,000.
Collapse
Affiliation(s)
- Adam D Ewing
- University of Pennsylvania Department of Genetics, Philadelphia, Pennsylvania 19104, USA
| | | |
Collapse
|
7
|
Crow MK. Long interspersed nuclear elements (LINE-1): potential triggers of systemic autoimmune disease. Autoimmunity 2010; 43:7-16. [PMID: 19961365 DOI: 10.3109/08916930903374865] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Recent advances have identified immune complexes containing nucleic acids as stimuli for toll-like receptors and inducers of type I interferon (IFN). While a similar mechanism may serve to amplify immune system activation and production of inflammatory mediators in vivo in the context of systemic autoimmune diseases, the initial triggers of autoimmunity have not been defined. In this review, we describe a category of potential inducers of autoimmunity, the endogenous retroelements, with a particular focus on long interspersed nuclear elements (LINE-1, L1). Increased expression of L1 transcripts or decreased degradation of L1 DNA or RNA could provide potent stimuli for an innate immune response, priming of the immune system, and induction of autoimmunity and inflammation. Genomic and genetic variations among individuals, sex-related differences in L1 regulation, and environmental triggers are among the potential mechanisms that might account for increased L1 expression. Induction of type I IFN by L1-enriched nucleic acids through TLR-independent pathways could represent a first step in the complex series of events leading to systemic autoimmune disease.
Collapse
Affiliation(s)
- Mary K Crow
- Mary Kirkland Center for Lupus Research, Hospital for Special Surgery, New York 10021, USA.
| |
Collapse
|
8
|
Terai G, Yoshizawa A, Okida H, Asai K, Mituyama T. Discovery of short pseudogenes derived from messenger RNAs. Nucleic Acids Res 2009; 38:1163-71. [PMID: 19965772 PMCID: PMC2831318 DOI: 10.1093/nar/gkp1098] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
More than 40% of the human genome is generated by retrotransposition, a series of in vivo processes involving reverse transcription of RNA molecules and integration of the transcripts into the genomic sequence. The mechanism of retrotransposition, however, is not fully understood, and additional genomic elements generated by retrotransposition may remain to be discovered. Here, we report that the human genome contains many previously unidentified short pseudogenes generated by retrotransposition of mRNAs. Genomic elements generated by non-long terminal repeat retrotransposition have specific sequence signatures: a poly-A tract that is immediately downstream and a pair of duplicated sequences, called target site duplications (TSDs), at either end. Using a new computer program, TSDscan, that can accurately detect pseudogenes based on the presence of the poly-A tract and TSDs, we found 654 short (≤300 bp), previously unknown pseudogenes derived from mRNAs. Comprehensive analyses of the pseudogenes that we identified and their parent mRNAs revealed that the pseudogene length depends on the parent mRNA length: long mRNAs generate more short pseudogenes than do short mRNAs. To explain this phenomenon, we hypothesize that most long mRNAs are truncated before they are reverse transcribed. Truncated mRNAs would be rapidly degraded during reverse transcription, resulting in the generation of short pseudogenes.
Collapse
Affiliation(s)
- Goro Terai
- INTEC Systems Institute Inc., Koto-ku 136-0075, Japan.
| | | | | | | | | |
Collapse
|
9
|
Marchani EE, Xing J, Witherspoon DJ, Jorde LB, Rogers AR. Estimating the age of retrotransposon subfamilies using maximum likelihood. Genomics 2009; 94:78-82. [PMID: 19379804 DOI: 10.1016/j.ygeno.2009.04.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2009] [Revised: 04/10/2009] [Accepted: 04/11/2009] [Indexed: 11/29/2022]
Abstract
We present a maximum likelihood model to estimate the age of retrotransposon subfamilies. This method is designed around a master gene model which assumes a constant retrotransposition rate. The statistical properties of this model and an ad hoc estimation procedure are compared using two simulated data sets. We also test whether each estimation procedure is robust to violation of the master gene model. According to our results, both estimation procedures are accurate under the master gene model. While both methods tend to overestimate ages under the intermediate model, the maximum likelihood estimate is significantly less inflated than the ad hoc estimate. We estimate the ages of two subfamilies of human-specific LINE-I insertions using both estimation procedures. By calculating confidence intervals around the maximum likelihood estimate, our model can both provide an estimate of retrotransposon subfamily age and describe the range of subfamily ages consistent with the data.
Collapse
Affiliation(s)
- Elizabeth E Marchani
- Division of Medical Genetics, University of Washington, BOX 357720, Seattle, WA 98195, USA.
| | | | | | | | | |
Collapse
|
10
|
Mamedov IZ, Amosova AL, Fisunov GY, Lebedev YB. A new polymorphic retroelement database (PRED) for the human genome. Mol Biol 2008. [DOI: 10.1134/s0026893308040213] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
11
|
Xing J, Witherspoon DJ, Ray DA, Batzer MA, Jorde LB. Mobile DNA elements in primate and human evolution. AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 2008; Suppl 45:2-19. [PMID: 18046749 DOI: 10.1002/ajpa.20722] [Citation(s) in RCA: 106] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Roughly 50% of the primate genome consists of mobile, repetitive DNA sequences such as Alu and LINE1 elements. The causes and evolutionary consequences of mobile element insertion, which have received considerable attention during the past decade, are reviewed in this article. Because of their unique mutational mechanisms, these elements are highly useful for answering phylogenetic questions. We demonstrate how they have been used to help resolve a number of questions in primate phylogeny, including the human-chimpanzee-gorilla trichotomy and New World primate phylogeny. Alu and LINE1 element insertion polymorphisms have also been analyzed in human populations to test hypotheses about human evolution and population affinities and to address forensic issues. Finally, these elements have had impacts on the genome itself. We review how they have influenced fundamental ongoing processes like nonhomologous recombination, genomic deletion, and X chromosome inactivation.
Collapse
Affiliation(s)
- Jinchuan Xing
- Department of Human Genetics, University of Utah Health Sciences Center, Salt Lake City, UT 84112, USA
| | | | | | | | | |
Collapse
|
12
|
Kehrer-Sawatzki H, Cooper DN. Understanding the recent evolution of the human genome: insights from human-chimpanzee genome comparisons. Hum Mutat 2007; 28:99-130. [PMID: 17024666 DOI: 10.1002/humu.20420] [Citation(s) in RCA: 94] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The sequencing of the chimpanzee genome and the comparison with its human counterpart have begun to reveal the spectrum of genetic changes that has accompanied human evolution. In addition to gross karyotypic rearrangements such as the fusion that formed human chromosome 2 and the human-specific pericentric inversions of chromosomes 1 and 18, there is considerable submicroscopic structural variation involving deletions, duplications, and inversions. Lineage-specific segmental duplications, detected by array comparative genomic hybridization and direct sequence comparison, have made a very significant contribution to this structural divergence, which is at least three-fold greater than that due to nucleotide substitutions. Since structural genomic changes may have given rise to irreversible functional differences between the diverging species, their detailed analysis could help to identify the biological processes that have accompanied speciation. To this end, interspecies comparisons have revealed numerous human-specific gains and losses of genes as well as changes in gene expression. The very considerable structural diversity (polymorphism) evident within both lineages has, however, hampered the analysis of the structural divergence between the human and chimpanzee genomes. The concomitant evaluation of genetic divergence and diversity at the nucleotide level has nevertheless served to identify many genes that have evolved under positive selection and may thus have been involved in the development of human lineage-specific traits. Genes that display signs of weak negative selection have also been identified and could represent candidate loci for complex genomic disorders. Here, we review recent progress in comparing the human and chimpanzee genomes and discuss how the differences detected have improved our understanding of the evolution of the human genome.
Collapse
|
13
|
Wang J, Song L, Grover D, Azrak S, Batzer MA, Liang P. dbRIP: a highly integrated database of retrotransposon insertion polymorphisms in humans. Hum Mutat 2006; 27:323-9. [PMID: 16511833 PMCID: PMC1855216 DOI: 10.1002/humu.20307] [Citation(s) in RCA: 148] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Retrotransposons constitute over 40% of the human genome and play important roles in the evolution of the genome. Since certain types of retrotransposons, particularly members of the Alu, L1, and SVA families, are still active, their recent and ongoing propagation generates a unique and important class of human genomic diversity/polymorphism (for the presence and absence of an insertion) with some elements known to cause genetic diseases. So far, over 2,300, 500, and 80 Alu, L1, and SVA insertions, respectively, have been reported to be polymorphic and many more are yet to be discovered. We present here the Database of Retrotransposon Insertion Polymorphisms (dbRIP; http://falcon.roswellpark.org:9090), a highly integrated and interactive database of human retrotransposon insertion polymorphisms (RIPs). dbRIP currently contains a nonredundant list of 1,625, 407, and 63 polymorphic Alu, L1, and SVA elements, respectively, or a total of 2,095 RIPs. In dbRIP, we deploy the utilities and annotated data of the genome browser developed at the University of California at Santa Cruz (UCSC) for user-friendly queries and integrative browsing of RIPs along with all other genome annotation information. Users can query the database by a variety of means and have access to the detailed information related to a RIP, including detailed insertion sequences and genotype data. dbRIP represents the first database providing comprehensive, integrative, and interactive compilation of RIP data, and it will be a useful resource for researchers working in the area of human genetics.
Collapse
Affiliation(s)
- Jianxin Wang
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, New York
| | - Lei Song
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, New York
| | - Deepak Grover
- Department of Biological Sciences, Biological Computation and Visualization Center, Center for BioModular Multi-scale Systems, Louisiana State University, Baton Rouge, Louisiana
| | - Sami Azrak
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, New York
| | - Mark A. Batzer
- Department of Biological Sciences, Biological Computation and Visualization Center, Center for BioModular Multi-scale Systems, Louisiana State University, Baton Rouge, Louisiana
| | - Ping Liang
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, New York
- * Correspondence to: Dr. Ping Liang, Department of Cancer Genetics, Roswell Park Cancer Institute, Elm & Carlton Streets, Bu¡alo, NY 14263. E-mail:
| |
Collapse
|
14
|
Witherspoon DJ, Marchani EE, Watkins WS, Ostler CT, Wooding SP, Anders BA, Fowlkes JD, Boissinot S, Furano AV, Ray DA, Rogers AR, Batzer MA, Jorde LB. Human population genetic structure and diversity inferred from polymorphic L1(LINE-1) and Alu insertions. Hum Hered 2006; 62:30-46. [PMID: 17003565 DOI: 10.1159/000095851] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2006] [Accepted: 07/25/2006] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND/AIMS The L1 retrotransposable element family is the most successful self-replicating genomic parasite of the human genome. L1 elements drive replication of Alu elements, and both have had far-reaching impacts on the human genome. We use L1 and Alu insertion polymorphisms to analyze human population structure. METHODS We genotyped 75 recent, polymorphic L1 insertions in 317 individuals from 21 populations in sub-Saharan Africa, East Asia, Europe and the Indian subcontinent. This is the first sample of L1 loci large enough to support detailed population genetic inference. We analyzed these data in parallel with a set of 100 polymorphic Alu insertion loci previously genotyped in the same individuals. RESULTS AND CONCLUSION The data sets yield congruent results that support the recent African origin model of human ancestry. A genetic clustering algorithm detects clusters of individuals corresponding to continental regions. The number of loci sampled is critical: with fewer than 50 typical loci, structure cannot be reliably discerned in these populations. The inclusion of geographically intermediate populations (from India) reduces the distinctness of clustering. Our results indicate that human genetic variation is neither perfectly correlated with geographic distance (purely clinal) nor independent of distance (purely clustered), but a combination of both: stepped clinal.
Collapse
Affiliation(s)
- D J Witherspoon
- Department of Human Genetics, University of Utah Health Sciences Center, Salt Lake City, UT 84112-5330, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Lee J, Cordaux R, Han K, Wang J, Hedges DJ, Liang P, Batzer MA. Different evolutionary fates of recently integrated human and chimpanzee LINE-1 retrotransposons. Gene 2006; 390:18-27. [PMID: 17055192 PMCID: PMC1847406 DOI: 10.1016/j.gene.2006.08.029] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2006] [Revised: 08/05/2006] [Accepted: 08/25/2006] [Indexed: 11/21/2022]
Abstract
The long interspersed element-1 (LINE-1 or L1) is a highly successful retrotransposon in mammals. L1 elements have continued to actively propagate subsequent to the human-chimpanzee divergence, approximately 6 million years ago, resulting in species-specific inserts. Here, we report a detailed characterization of chimpanzee-specific L1 subfamily diversity and a comparison with their human-specific counterparts. Our results indicate that L1 elements have experienced different evolutionary fates in humans and chimpanzees within the past approximately 6 million years. Although the species-specific L1 copy numbers are on the same order in both species (1200-2000 copies), the number of retrotransposition-competent elements appears to be much higher in the human genome than in the chimpanzee genome. Also, while human L1 subfamilies belong to the same lineage, we identified two lineages of recently integrated L1 subfamilies in the chimpanzee genome. The two lineages seem to have coexisted for several million years, but only one shows evidence of expansion within the past three million years. These differential evolutionary paths may be the result of random variation, or the product of competition between L1 subfamily lineages. Our results suggest that the coexistence of several L1 subfamily lineages within a species may be resolved in a very short evolutionary period of time, perhaps in just a few million years. Therefore, the chimpanzee genome constitutes an excellent model in which to analyze the evolutionary dynamics of L1 retrotransposons.
Collapse
Affiliation(s)
- Jungnam Lee
- Department of Biological Sciences, Biological Computation and Visualization Center, Center for BioModular Multi-Scale Systems, Louisiana State University, 202 Life Sciences Building, Baton Rouge, LA 70803, USA
| | - Richard Cordaux
- Department of Biological Sciences, Biological Computation and Visualization Center, Center for BioModular Multi-Scale Systems, Louisiana State University, 202 Life Sciences Building, Baton Rouge, LA 70803, USA
| | - Kyudong Han
- Department of Biological Sciences, Biological Computation and Visualization Center, Center for BioModular Multi-Scale Systems, Louisiana State University, 202 Life Sciences Building, Baton Rouge, LA 70803, USA
| | - Jianxin Wang
- Department of Cancer Genetics, Roswell Park Cancer Institute, Elm and Carlton Streets, Buffalo, NY 14263, USA
| | - Dale J. Hedges
- Department of Biological Sciences, Biological Computation and Visualization Center, Center for BioModular Multi-Scale Systems, Louisiana State University, 202 Life Sciences Building, Baton Rouge, LA 70803, USA
| | - Ping Liang
- Department of Cancer Genetics, Roswell Park Cancer Institute, Elm and Carlton Streets, Buffalo, NY 14263, USA
| | - Mark A. Batzer
- Department of Biological Sciences, Biological Computation and Visualization Center, Center for BioModular Multi-Scale Systems, Louisiana State University, 202 Life Sciences Building, Baton Rouge, LA 70803, USA
- * Corresponding author. Tel.: +1 225 578 7102; fax: +1 225 578 7113. E-mail address: (M.A. Batzer)
| |
Collapse
|
16
|
Gasior SL, Preston G, Hedges DJ, Gilbert N, Moran JV, Deininger PL. Characterization of pre-insertion loci of de novo L1 insertions. Gene 2006; 390:190-8. [PMID: 17067767 PMCID: PMC1850991 DOI: 10.1016/j.gene.2006.08.024] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2006] [Revised: 08/21/2006] [Accepted: 08/22/2006] [Indexed: 10/24/2022]
Abstract
The human Long Interspersed Element-1 (LINE-1) and the Short Interspersed Element (SINE) Alu comprise 28% of the human genome. They share the same L1-encoded endonuclease for insertion, which recognizes an A+T-rich sequence. Under a simple model of insertion distribution, this nucleotide preference would lead to the prediction that the populations of both elements would be biased towards A+T-rich regions. Genomic L1 elements do show an A+T-rich bias. In contrast, Alu is biased towards G+C-rich regions when compared to the genome average. Several analyses have demonstrated that relatively recent insertions of both elements show less G+C content bias relative to older elements. We have analyzed the repetitive element and G+C composition of more than 100 pre-insertion loci derived from de novo L1 insertions in cultured human cancer cells, which should represent an evolutionarily unbiased set of insertions. An A+T-rich bias is observed in the 50 bp flanking the endonuclease target site, consistent with the known target site for the L1 endonuclease. The L1, Alu, and G+C content of 20 kb of the de novo pre-insertion loci shows a different set of biases than that observed for fixed L1s in the human genome. In contrast to the insertion sites of genomic L1s, the de novo L1 pre-insertion loci are relatively L1-poor, Alu-rich and G+C neutral. Finally, a statistically significant cluster of de novo L1 insertions was localized in the vicinity of the c-myc gene. These results suggest that the initial insertion preference of L1, while A+T-rich in the initial vicinity of the break site, can be influenced by the broader content of the flanking genomic region and have implications for understanding the dynamics of L1 and Alu distributions in the human genome.
Collapse
Affiliation(s)
- Stephen L. Gasior
- Tulane Cancer Center and Dept. of Epidemiology, Tulane University Health Sciences Center SL-66, 1430 Tulane Ave., New Orleans, LA 70112, Phone: (504) 988-6385, Fax: (504) 988-5516,
| | - Graeme Preston
- Tulane Cancer Center and Dept. of Epidemiology, Tulane University Health Sciences Center SL-66, 1430 Tulane Ave., New Orleans, LA 70112, Phone: (504) 988-6385, Fax: (504) 988-5516,
| | - Dale J. Hedges
- Tulane Cancer Center and Dept. of Epidemiology, Tulane University Health Sciences Center SL-66, 1430 Tulane Ave., New Orleans, LA 70112, Phone: (504) 988-6385, Fax: (504) 988-5516,
| | - Nicolas Gilbert
- Institut de Génétique Humaine, CNRS, UPR 1142, 141 rue de la Cardonille, 34396 Montpellier cedex 5, France
| | - John V. Moran
- Departments of Human Genetics and Internal Medicine, 1241 E. Catherine St., University of Michigan Medical School, Ann Arbor, Michigan 48109-0618
| | - Prescott L. Deininger
- Tulane Cancer Center and Dept. of Epidemiology, Tulane University Health Sciences Center SL-66, 1430 Tulane Ave., New Orleans, LA 70112, Phone: (504) 988-6385, Fax: (504) 988-5516,
- *Address for Correspondence: Tulane Cancer Center, SL66, Tulane University Health Sciences Center, 1430 Tulane Ave., New Orleans, LA 70112, 504-988-6385,
| |
Collapse
|
17
|
Wilson AS, Power BE, Molloy PL. DNA hypomethylation and human diseases. Biochim Biophys Acta Rev Cancer 2006; 1775:138-62. [PMID: 17045745 DOI: 10.1016/j.bbcan.2006.08.007] [Citation(s) in RCA: 324] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2006] [Revised: 08/24/2006] [Accepted: 08/27/2006] [Indexed: 12/14/2022]
Abstract
Changes in human DNA methylation patterns are an important feature of cancer development and progression and a potential role in other conditions such as atherosclerosis and autoimmune diseases (e.g., multiple sclerosis and lupus) is being recognised. The cancer genome is frequently characterised by hypermethylation of specific genes concurrently with an overall decrease in the level of 5 methyl cytosine. This hypomethylation of the genome largely affects the intergenic and intronic regions of the DNA, particularly repeat sequences and transposable elements, and is believed to result in chromosomal instability and increased mutation events. This review examines our understanding of the patterns of cancer-associated hypomethylation, and how recent advances in understanding of chromatin biology may help elucidate the mechanisms underlying repeat sequence demethylation. It also considers how global demethylation of repeat sequences including transposable elements and the site-specific hypomethylation of certain genes might contribute to the deleterious effects that ultimately result in the initiation and progression of cancer and other diseases. The use of hypomethylation of interspersed repeat sequences and genes as potential biomarkers in the early detection of tumors and their prognostic use in monitoring disease progression are also examined.
Collapse
Affiliation(s)
- Ann S Wilson
- Preventative Health National Research Flagship, North Ryde, NSW, Australia
| | | | | |
Collapse
|
18
|
Konkel MK, Wang J, Liang P, Batzer MA. Identification and characterization of novel polymorphic LINE-1 insertions through comparison of two human genome sequence assemblies. Gene 2006; 390:28-38. [PMID: 17034961 DOI: 10.1016/j.gene.2006.07.040] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2006] [Revised: 07/18/2006] [Accepted: 07/26/2006] [Indexed: 11/29/2022]
Abstract
Mobile elements represent a relatively new class of markers for the study of human evolution. Long interspersed elements (LINEs) belong to a group of retrotransposons comprising approximately 21% of the human genome. Young LINE-1 (L1) elements that have integrated recently into the human genome can be polymorphic for insertion presence/absence in different human populations at particular chromosomal locations. To identify putative novel L1 insertion polymorphisms, we computationally compared two draft assemblies of the whole human genome (Public and Celera Human Genome assemblies). We identified a total of 148 potential polymorphic L1 insertion loci, among which 73 were candidates for novel polymorphic loci. Based on additional analyses we selected 34 loci for further experimental studies. PCR-based assays and DNA sequence analysis were performed for these 34 loci in 80 unrelated individuals from four diverse human populations: African-American, Asian, Caucasian, and South American. All but two of the selected loci were confirmed as polymorphic in our human population panel. Approximately 47% of the analyzed loci integrated into other repetitive elements, most commonly older L1s. One of the insertions was accompanied by a BC200 sequence. Collectively, these mobile elements represent a valuable source of genomic polymorphism for the study of human population genetics. Our results also suggest that the exhaustive identification of L1 insertion polymorphisms is far from complete, and new whole genome sequences are valuable sources for finding novel retrotransposon insertion polymorphisms.
Collapse
Affiliation(s)
- Miriam K Konkel
- Department of Biological Sciences, Biological Computation and Visualization Center, Center for BioModular Multi-Scale Systems, Louisiana State University, 202 Life Sciences Building, Baton Rouge, LA 70803, USA
| | | | | | | |
Collapse
|
19
|
Scott LA, Kuroiwa A, Matsuda Y, Wichman HA. X accumulation of LINE-1 retrotransposons in Tokudaia osimensis, a spiny rat with the karyotype XO. Cytogenet Genome Res 2006; 112:261-9. [PMID: 16484782 DOI: 10.1159/000089880] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2005] [Accepted: 07/25/2005] [Indexed: 01/02/2023] Open
Abstract
The observation that LINE-1 transposable elements are enriched on the X in comparison to the autosomes led to the hypothesis that LINE-1s play a role in X chromosome inactivation. If this hypothesis is correct, loss of LINE-1 activity would be expected to result in species extinction or in an alternate pathway of dosage compensation. One such alternative pathway would be to evolve a karyotype that does not require dosage compensation between the sexes. Two of the three extant species of the Ryukyu spiny rat Tokudaia have such a karyotype; both males and females are XO. We asked whether this karyotype arose due to loss of LINE-1 activity and thus the loss of a putative component in the X inactivation pathway. Although XO Tokudaia has no need for dosage compensation, LINE-1s have been recently active in Tokudaia osimensis and show higher density on the lone X than on the autosomes.
Collapse
Affiliation(s)
- L A Scott
- Department of Biological Sciences, University of Idaho, Moscow, ID 83844-3051, USA
| | | | | | | |
Collapse
|
20
|
Piskareva O, Schmatchenko V. DNA polymerization by the reverse transcriptase of the human L1 retrotransposon on its own template in vitro. FEBS Lett 2006; 580:661-8. [PMID: 16412437 DOI: 10.1016/j.febslet.2005.12.077] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2005] [Revised: 12/02/2005] [Accepted: 12/27/2005] [Indexed: 10/25/2022]
Abstract
L1 elements (LINE-1s) account for 17% of the human genome and have achieved this abundance by transpositions via an RNA intermediate, or retrotransposition. Reverse transcription is a crucial event in the retrotransposition of the active human L1 element and is carried out by the L1-encoded ORF2 protein. Previously, we performed biochemical characterization of the human L1 ORF2 protein with reverse transcriptase (RT) activity (referred to as L1 RT), expressed in baculovirus-infected insect cells. In the present study, we describe the properties of DNA- and RNA-dependent DNA synthesis catalyzed by the L1 RT on the L1 templates in vitro. We found that L1 RT synthesized at least 620 of nucleotides per template binding event utilizing L1 RNA in vitro. Under processive conditions the L1 RT synthesized cDNA over 5 times longer than that Moloney murine leukemia virus RT on the heteropolymeric RNA template used in these studies. These data are the first to demonstrate that RT from the human L1 element is a highly processive polymerase among RT enzymes. This report also presents a strong evidence of lack of RNase H activity for the L1 ORF2 protein in vitro, distinguishing L1 RT from retroviral RTs. Finally, we found strong pausing for of the L1 RT during DNA polymerization within the 3' untranslated region of L1 mRNA, that is result of contribution both rGs runs of the polypurine stretch and immediately adjacent stem-loop structure. A mechanism facilitating minus-strand DNA synthesis during reverse transcription of L1 element in vivo is discussed.
Collapse
Affiliation(s)
- Olga Piskareva
- Institute of Biochemistry and Physiology of Microorganisms RAS Pushchino, Prosoekt Nauki 5, 142290 Pushchino, Moscow region, Russia
| | | |
Collapse
|
21
|
Babushok DV, Ostertag EM, Courtney CE, Choi JM, Kazazian HH. L1 integration in a transgenic mouse model. Genome Res 2005; 16:240-50. [PMID: 16365384 PMCID: PMC1361720 DOI: 10.1101/gr.4571606] [Citation(s) in RCA: 90] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
To study integration of the human LINE-1 retrotransposon (L1) in vivo, we developed a transgenic mouse model of L1 retrotransposition that displays de novo somatic L1 insertions at a high frequency, occasionally several insertions per mouse. We mapped 3' integration sites of 51 insertions by Thermal Asymmetric Interlaced PCR (TAIL-PCR). Analysis of integration locations revealed a broad genomic distribution with a modest preference for intergenic regions. We characterized the complete structures of 33 de novo retrotransposition events. Our results highlight the large number of highly truncated L1s, as over 52% (27/51) of total integrants were <1/3 the length of a full-length element. New integrants carry all structural characteristics typical of genomic L1s, including a number with inversions, deletions, and 5'-end microhomologies to the target DNA sequence. Notably, at least 13% (7/51) of all insertions contain a short stretch of extra nucleotides at their 5' end, which we postulate result from template-jumping by the L1-encoded reverse transcriptase. We propose a unified model of L1 integration that explains all of the characteristic features of L1 retrotransposition, such as 5' truncations, inversions, extra nucleotide additions, and 5' boundary and inversion point microhomologies.
Collapse
Affiliation(s)
- Daria V Babushok
- Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | | | | | | |
Collapse
|
22
|
Wang H, Xing J, Grover D, Hedges DJ, Han K, Walker JA, Batzer MA. SVA elements: a hominid-specific retroposon family. J Mol Biol 2005; 354:994-1007. [PMID: 16288912 DOI: 10.1016/j.jmb.2005.09.085] [Citation(s) in RCA: 253] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2005] [Revised: 09/22/2005] [Accepted: 09/27/2005] [Indexed: 11/25/2022]
Abstract
SVA is a composite repetitive element named after its main components, SINE, VNTR and Alu. We have identified 2762 SVA elements from the human genome draft sequence. Genomic distribution analysis indicates that the SVA elements are enriched in G+C-rich regions but have no preferences for inter- or intragenic regions. A phylogenetic analysis of the elements resulted in the recovery of six subfamilies that were named SVA_A to SVA_F. The composition, age and genomic distribution of the subfamilies have been examined. Subfamily age estimates based upon nucleotide divergence indicate that the expansion of four SVA subfamilies (SVA_A, SVA_B, SVA_C and SVA_D) began before the divergence of human, chimpanzee and gorilla, while subfamilies SVA_E and SVA_F are restricted to the human lineage. A survey of human genomic diversity associated with SVA_E and SVA_F subfamily members showed insertion polymorphism frequencies of 37.5% and 27.6%, respectively. In addition, we examined the amplification dynamics of SVA elements throughout the primate order and traced their origin back to the beginnings of hominid primate evolution, approximately 18 to 25 million years ago. This makes SVA elements the youngest family of retroposons in the primate order.
Collapse
Affiliation(s)
- Hui Wang
- Department of Biological Sciences, Biological Computation and Visualization Center, Center for BioModular Multi-Scale Systems, Louisiana State University, 202 Life Sciences Building, Baton Rouge, LA 70803, USA
| | | | | | | | | | | | | |
Collapse
|
23
|
Ho HJ, Ray DA, Salem AH, Myers JS, Batzer MA. Straightening out the LINEs: LINE-1 orthologous loci. Genomics 2005; 85:201-7. [PMID: 15676278 DOI: 10.1016/j.ygeno.2004.10.016] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2004] [Accepted: 10/29/2004] [Indexed: 11/19/2022]
Abstract
The L1Hs preTa subfamily of long interspersed elements (LINEs) originated after the divergence of human and chimpanzee and is therefore found only in the human genome. Thirty-three of the 254 L1Hs preTa elements are polymorphic for the absence/presence of the insertion, making them useful markers for studying human population genetics. The problem of homoplasy, however, can diminish the value of LINEs as phylogenetic and population genetic markers. We examined anomalous orthologous sites in a range of nonhuman primates. Only two cases of other mobile elements inserting near the preintegration sites of L1Hs preTa elements were observed: an AluY insertion in Chlorocebus and an L1PA8 insertion in Aotus. Sequence analysis showed that both elements were clearly distinguishable from their human counterparts. We conclude that L1 elements can continue to be regarded as essentially homoplasy-free genetic characters.
Collapse
Affiliation(s)
- Huei Jin Ho
- Department of Biological Sciences, Biological Computation and Visualization Center, Louisiana State University, 202 Life Sciences Building, Baton Rouge, LA 70803, USA
| | | | | | | | | |
Collapse
|
24
|
Grahn RA, Rinehart TA, Cantrell MA, Wichman HA. Extinction of LINE-1 activity coincident with a major mammalian radiation in rodents. Cytogenet Genome Res 2005; 110:407-15. [PMID: 16093693 DOI: 10.1159/000084973] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2004] [Accepted: 04/07/2004] [Indexed: 11/19/2022] Open
Abstract
LINE-1 transposable elements (L1s) are ubiquitous in mammals and are thought to have remained active since before the mammalian radiation. Only one L1 extinction event, in South American rodents in the genus Oryzomys, has been convincingly demonstrated. Here we examine the phylogenetic limits and evolutionary tempo of that extinction event by characterizing L1s in related rodents. Fourteen genera from five tribes within the Sigmodontinae subfamily were examined. Only the Sigmodontini, the most basal tribe in this group, demonstrate recent L1 activity. The Oryzomyini, Akodontini, Phyllotini, and Thomasomyini contain only L1s that appear to have inserted long ago; their L1s lack open reading frames, have mutations at conserved amino acid residues, and show numerous private mutations. They also lack restriction site-defined L1 subfamilies specific to any species, genus or tribe examined, and fail to form monophyletic species, genus or tribal L1 clusters. We determine here that this L1 extinction event occurred roughly 8.8 million years ago, near the divergence of Sigmodon from the remaining Sigmodontinae species. These species appear to be ideal model organisms for studying the impact of L1 inactivity on mammalian genomes.
Collapse
Affiliation(s)
- R A Grahn
- Department of Biological Sciences, University of Idaho, Moscow, ID 83844-3051, USA
| | | | | | | |
Collapse
|
25
|
Han K, Sen SK, Wang J, Callinan PA, Lee J, Cordaux R, Liang P, Batzer MA. Genomic rearrangements by LINE-1 insertion-mediated deletion in the human and chimpanzee lineages. Nucleic Acids Res 2005; 33:4040-52. [PMID: 16034026 PMCID: PMC1179734 DOI: 10.1093/nar/gki718] [Citation(s) in RCA: 102] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Long INterspersed Elements (LINE-1s or L1s) are abundant non-LTR retrotransposons in mammalian genomes that are capable of insertional mutagenesis. They have been associated with target site deletions upon insertion in cell culture studies of retrotransposition. Here, we report 50 deletion events in the human and chimpanzee genomes directly linked to the insertion of L1 elements, resulting in the loss of approximately 18 kb of sequence from the human genome and approximately 15 kb from the chimpanzee genome. Our data suggest that during the primate radiation, L1 insertions may have deleted up to 7.5 Mb of target genomic sequences. While the results of our in vivo analysis differ from those of previous cell culture assays of L1 insertion-mediated deletions in terms of the size and rate of sequence deletion, evolutionary factors can reconcile the differences. We report a pattern of genomic deletion sizes similar to those created during the retrotransposition of Alu elements. Our study provides support for the existence of different mechanisms for small and large L1-mediated deletions, and we present a model for the correlation of L1 element size and the corresponding deletion size. In addition, we show that internal rearrangements can modify L1 structure during retrotransposition events associated with large deletions.
Collapse
Affiliation(s)
| | | | - Jianxin Wang
- Department of Cancer Genetics, Roswell Park Cancer InstituteElm and Carlton Streets, Buffalo, NY 14263, USA
| | | | | | | | - Ping Liang
- Department of Cancer Genetics, Roswell Park Cancer InstituteElm and Carlton Streets, Buffalo, NY 14263, USA
| | - Mark A. Batzer
- To whom correspondence should be addressed. Tel: +1 225 578 7102; Fax: +1 225 578 7113;
| |
Collapse
|
26
|
Wheelan SJ, Aizawa Y, Han JS, Boeke JD. Gene-breaking: a new paradigm for human retrotransposon-mediated gene evolution. Genome Res 2005; 15:1073-8. [PMID: 16024818 PMCID: PMC1182219 DOI: 10.1101/gr.3688905] [Citation(s) in RCA: 85] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
The L1 retrotransposon is the most highly successful autonomous retrotransposon in mammals. This prolific genome parasite may on occasion benefit its host through genome rearrangements or adjustments of host gene expression. In examining possible effects of L1 elements on host gene expression, we investigated whether a full-length L1 element inserted in the antisense orientation into an intron of a cellular gene may actually split the gene's transcript into two smaller transcripts: (1) a transcript containing the upstream exons and terminating in the major antisense polyadenylation site (MAPS) of the L1, and (2) a transcript derived from the L1 antisense promoter (ASP) that includes the downstream exons of the gene. Bioinformatic analysis and experimental follow-up provide evidence for this L1 "gene-breaking" hypothesis. We identified three human genes apparently "broken" by L1 elements, as well as 12 more candidate genes. Most of the inserted L1 elements in our 15 candidate genes predate the human/chimp divergence. If indeed split, the transcripts of these genes may in at least one case encode potentially interacting proteins, and in another case may encode novel proteins. Gene-breaking represents a new mechanism through which L1 elements remodel mammalian genomes.
Collapse
Affiliation(s)
- Sarah J Wheelan
- Department of Molecular Biology and Genetics, The Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA
| | | | | | | |
Collapse
|
27
|
Chen JM, Stenson PD, Cooper DN, Férec C. A systematic analysis of LINE-1 endonuclease-dependent retrotranspositional events causing human genetic disease. Hum Genet 2005; 117:411-27. [PMID: 15983781 DOI: 10.1007/s00439-005-1321-0] [Citation(s) in RCA: 155] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2005] [Accepted: 04/04/2005] [Indexed: 10/25/2022]
Abstract
Diverse long interspersed element-1 (LINE-1 or L1)-dependent mutational mechanisms have been extensively studied with respect to L1 and Alu elements engineered for retrotransposition in cultured cells and/or in genome-wide analyses. To what extent the in vitro studies can be held to accurately reflect in vivo events in the human genome, however, remains to be clarified. We have attempted to address this question by means of a systematic analysis of recent L1-mediated retrotranspositional events that have caused human genetic disease, with a view to providing a more complete picture of how L1-mediated retrotransposition impacts upon the architecture of the human genome. A total of 48 such mutations were identified, including those described as L1-mediated retrotransposons, as well as insertions reported to contain a poly(A) tail: 26 were L1 trans-driven Alu insertions, 15 were direct L1 insertions, four were L1 trans-driven SVA insertions, and three were associated with simple poly(A) insertions. The systematic study of these lesions, when combined with previous in vitro and genome-wide analyses, has strengthened several important conclusions regarding L1-mediated retrotransposition in humans: (a) approximately 25% of L1 insertions are associated with the 3' transduction of adjacent genomic sequences, (b) approximately 25% of the new L1 inserts are full-length, (c) poly(A) tail length correlates inversely with the age of the element, and (d) the length of target site duplication in vivo is rarely longer than 20 bp. Our analysis also suggests that some 10% of L1-mediated retrotranspositional events are associated with significant genomic deletions in humans. Finally, the identification of independent retrotranspositional events that have integrated at the same genomic locations provides new insight into the L1-mediated insertional process in humans.
Collapse
Affiliation(s)
- Jian-Min Chen
- INSERM U613-Génétique Moléculaire et Génétique Epidémiologique, Etablissement Français du Sang-Bretagne, Université de Bretagne Occidentale, Centre Hospitalier Universitaire, Brest, 29220, France.
| | | | | | | |
Collapse
|
28
|
Salem AH, Ray DA, Batzer MA. Identity by descent and DNA sequence variation of human SINE and LINE elements. Cytogenet Genome Res 2004; 108:63-72. [PMID: 15545717 DOI: 10.1159/000080803] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2003] [Accepted: 11/21/2003] [Indexed: 11/19/2022] Open
Abstract
To test the hypothesis that Alu and L1 elements are genetic characters that are essentially homoplasy-free, we sequenced a total of five human L1 elements and eleven recently integrated Alu elements from 160 chromosomes (80 individuals representing four diverse human populations). Analysis of worldwide samples at L1 loci revealed 292 segregating sites and a nucleotide diversity of 0.0050. For Ya5 Alu loci, there were 129 segregating sites and nucleotide diversity was estimated at 0.0045. The Alu and L1 sequence diversity varied element to element. No completely or partially deleted Alu or L1 alleles were identified during the analysis. These data suggest that mobile element insertions are identical by descent characters for the study of human population genetics.
Collapse
Affiliation(s)
- A-H Salem
- Department of Biological Sciences, Biological Computation and Visualization Center, Louisiana State University, Baton Rouge 70803, USA
| | | | | |
Collapse
|
29
|
Farley AH, Luning Prak ET, Kazazian HH. More active human L1 retrotransposons produce longer insertions. Nucleic Acids Res 2004; 32:502-10. [PMID: 14742665 PMCID: PMC373329 DOI: 10.1093/nar/gkh202] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2003] [Revised: 09/18/2003] [Accepted: 12/10/2003] [Indexed: 11/13/2022] Open
Abstract
The vast majority of L1 insertions are 5' truncated and thus inactive. Yet, the mechanism of 5' truncation is unknown. To examine whether the frequency of L1 retrotransposition is directly correlated with the length of genomic L1 insertions, we used a cell culture assay to measure retrotransposition frequency and a PCR-based assay to measure L1 insertion length. We tested five full-length human L1 elements that retrotranspose at different frequencies: LRE3, L1(RP), L1.3, L1.2A and L1.2B. Our data suggest that L1 insertion length correlates with L1 retrotransposition frequency for insertions >1 kb in length. For two elements, L1(RP) and L1.2A, we found that swapping the reverse transcriptase domains had little effect. Instead, we found that genomic insertion length and retrotransposition frequency are substantially affected by amino acid substitutions at positions 363, 1220 and 1259 in ORF2. We suggest that the region containing residues 1220 and 1259 may be important in the binding of ORF2p to L1 RNA to facilitate reverse transcription.
Collapse
Affiliation(s)
- Alexander H Farley
- Department of Genetics, University of Pennsylvania School of Medicine, Philadelphia, PA, USA
| | | | | |
Collapse
|