1
|
Katikaneni A, Lowe CB. Novelty versus innovation of gene regulatory elements in human evolution and disease. Curr Opin Genet Dev 2025; 90:102279. [PMID: 39591813 PMCID: PMC11769741 DOI: 10.1016/j.gde.2024.102279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Revised: 10/10/2024] [Accepted: 10/22/2024] [Indexed: 11/28/2024]
Abstract
It is not currently understood how much of human evolution is due to modifying existing functional elements in the genome versus forging novel elements from nonfunctional DNA. Many early experiments that aimed to assign genetic changes on the human lineage to their resulting phenotypic change have focused on mutations that modify existing elements. However, a number of recent studies have highlighted the potential ease and importance of forging novel gene regulatory elements from nonfunctional sequences on the human lineage. In this review, we distinguish gene regulatory element novelty from innovation. We propose definitions for these terms and emphasize their importance in studying the genetic basis of human uniqueness. We discuss why the forging of novel regulatory elements may have been less emphasized during the previous decades, and why novel regulatory elements are likely to play a significant role in both human adaptation and disease.
Collapse
Affiliation(s)
- Anushka Katikaneni
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC 27710, USA; University Program in Genetics and Genomics, Duke University, Durham, NC 27708, USA
| | - Craig B Lowe
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, NC 27710, USA; University Program in Genetics and Genomics, Duke University, Durham, NC 27708, USA.
| |
Collapse
|
2
|
Borgsmüller N, Valecha M, Kuipers J, Beerenwinkel N, Posada D. Single-cell phylogenies reveal changes in the evolutionary rate within cancer and healthy tissues. CELL GENOMICS 2023; 3:100380. [PMID: 37719146 PMCID: PMC10504633 DOI: 10.1016/j.xgen.2023.100380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Revised: 05/03/2023] [Accepted: 07/18/2023] [Indexed: 09/19/2023]
Abstract
Cell lineages accumulate somatic mutations during organismal development, potentially leading to pathological states. The rate of somatic evolution within a cell population can vary due to multiple factors, including selection, a change in the mutation rate, or differences in the microenvironment. Here, we developed a statistical test called the Poisson Tree (PT) test to detect varying evolutionary rates among cell lineages, leveraging the phylogenetic signal of single-cell DNA sequencing (scDNA-seq) data. We applied the PT test to 24 healthy and cancer samples, rejecting a constant evolutionary rate in 11 out of 15 cancer and five out of nine healthy scDNA-seq datasets. In six cancer datasets, we identified subclonal mutations in known driver genes that could explain the rate accelerations of particular cancer lineages. Our findings demonstrate the efficacy of scDNA-seq for studying somatic evolution and suggest that cell lineages often evolve at different rates within cancer and healthy tissues.
Collapse
Affiliation(s)
- Nico Borgsmüller
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland
| | - Monica Valecha
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
| | - Jack Kuipers
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, 4058 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, 4058 Basel, Switzerland
| | - David Posada
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Vigo, Spain
- Department of Biochemistry, Genetics, and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| |
Collapse
|
3
|
Chuang TJ, Chiang TW, Chen CY. Assessing the impacts of various factors on circular RNA reliability. Life Sci Alliance 2023; 6:e202201793. [PMID: 36849251 PMCID: PMC9971162 DOI: 10.26508/lsa.202201793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 02/15/2023] [Accepted: 02/15/2023] [Indexed: 03/01/2023] Open
Abstract
Circular RNAs (circRNAs) are non-polyadenylated RNAs with a continuous loop structure characterized by a non-colinear back-splice junction (BSJ). Although millions of circRNA candidates have been identified, it remains a major challenge for determining circRNA reliability because of various types of false positives. Here, we systematically assess the impacts of numerous factors related to circRNA identification, conservation, biogenesis, and function on circRNA reliability by comparisons of circRNA expression from mock and the corresponding colinear/polyadenylated RNA-depleted datasets based on three different RNA treatment approaches. Eight important indicators of circRNA reliability are determined. The relative contribution to variability explained analyses reveal that the relative importance of these factors in affecting circRNA reliability in descending order is the conservation level of circRNA, full-length circular sequences, supporting BSJ read count, both BSJ donor and acceptor splice sites at the same colinear transcript isoforms, both BSJ donor and acceptor splice sites at the annotated exon boundaries, BSJs detected by multiple tools, supporting functional features, and both BSJ donor and acceptor splice sites undergoing alternative splicing. This study thus provides a useful guideline and an important resource for selecting high-confidence circRNAs for further investigations.
Collapse
Affiliation(s)
| | - Tai-Wei Chiang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Chia-Ying Chen
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
4
|
Cesca F, Bettella E, Polli R, Leonardi E, Aspromonte MC, Sicilian B, Stanzial F, Benedicenti F, Sensi A, Ciorba A, Bigoni S, Cama E, Scimemi P, Santarelli R, Murgia A. Frequency of Usher gene mutations in non-syndromic hearing loss: higher variability of the Usher phenotype. J Hum Genet 2020; 65:855-864. [PMID: 32467589 DOI: 10.1038/s10038-020-0783-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Revised: 05/05/2020] [Accepted: 05/15/2020] [Indexed: 11/09/2022]
Abstract
Non-syndromic hearing loss (NSHL) is characterized by a vast genetic heterogeneity; some syndromic forms as Usher syndrome (USH) have onset as isolated deafness and then evolve later in life. We developed an NGS targeted gene-panel containing 59 genes and a customized bioinformatic pipeline for the analysis of DNA samples from clinically highly selected subjects with sensorineural hearing loss, previously resulted negative for GJB2 mutations/GJB6 deletions. Among the 217 tested subjects, 24 (11.1%) were found to carry mutations in genes involved both in NSHL and USH. For 6 out of 24 patients a diagnosis of USH was performed. Eleven subjects out of 24 had hearing loss without vestibular or ocular dysfunction and, due to their young age, it was not possible to establish whether their phenotype could be NSHL or USH. Seven subjects were diagnosed with NSHL, due to their age and phenotype. A total of 41 likely pathogenic/pathogenic mutations were identified, among which 17 novel ones. We report a high frequency of mutations in genes involved both in NSHL and in USH in a cohort of individuals tested for seemingly isolated deafness. Our data also highlight a wider than expected phenotypic variability in the USH phenotype.
Collapse
Affiliation(s)
- Federica Cesca
- Laboratory of Molecular Genetics of Neurodevelopment, Department of Women's and Children's Health, University of Padua, Padua, Italy.,Fondazione Istituto di Ricerca Pediatrica (IRP), Città della Speranza, Padua, Italy
| | - Elisa Bettella
- Laboratory of Molecular Genetics of Neurodevelopment, Department of Women's and Children's Health, University of Padua, Padua, Italy.,Fondazione Istituto di Ricerca Pediatrica (IRP), Città della Speranza, Padua, Italy
| | - Roberta Polli
- Laboratory of Molecular Genetics of Neurodevelopment, Department of Women's and Children's Health, University of Padua, Padua, Italy.,Fondazione Istituto di Ricerca Pediatrica (IRP), Città della Speranza, Padua, Italy
| | - Emanuela Leonardi
- Laboratory of Molecular Genetics of Neurodevelopment, Department of Women's and Children's Health, University of Padua, Padua, Italy.,Fondazione Istituto di Ricerca Pediatrica (IRP), Città della Speranza, Padua, Italy
| | - Maria Cristina Aspromonte
- Laboratory of Molecular Genetics of Neurodevelopment, Department of Women's and Children's Health, University of Padua, Padua, Italy.,Fondazione Istituto di Ricerca Pediatrica (IRP), Città della Speranza, Padua, Italy
| | - Barbara Sicilian
- Medical Center of Phoniatrics, Casa di Cura Trieste, Padua, Italy
| | - Franco Stanzial
- Genetic Counseling Service, Regional Hospital of Bolzano, Bolzano, Italy
| | | | - Alberto Sensi
- U.O. Medical Genetics Romagna, AULS Romagna, Cesena, Italy
| | - Andrea Ciorba
- ENT and Audiology Department, University Hospital of Ferrara, Ferrara, Italy
| | - Stefania Bigoni
- Medical Genetics Unit, University Hospital of Ferrara, Ferrara, Italy
| | - Elona Cama
- Department of Neurosciences, University of Padua, Padua, Italy.,Audiology Service, Santi Giovanni e Paolo Hospital, ULSS3 Serenissima, Venice, Italy
| | - Pietro Scimemi
- Department of Neurosciences, University of Padua, Padua, Italy.,Audiology Service, Santi Giovanni e Paolo Hospital, ULSS3 Serenissima, Venice, Italy
| | - Rosamaria Santarelli
- Department of Neurosciences, University of Padua, Padua, Italy.,Audiology Service, Santi Giovanni e Paolo Hospital, ULSS3 Serenissima, Venice, Italy
| | - Alessandra Murgia
- Laboratory of Molecular Genetics of Neurodevelopment, Department of Women's and Children's Health, University of Padua, Padua, Italy. .,Fondazione Istituto di Ricerca Pediatrica (IRP), Città della Speranza, Padua, Italy.
| |
Collapse
|
5
|
Mai TL, Chuang TJ. A-to-I RNA editing contributes to the persistence of predicted damaging mutations in populations. Genome Res 2019; 29:1766-1776. [PMID: 31515285 PMCID: PMC6836733 DOI: 10.1101/gr.246033.118] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Accepted: 09/04/2019] [Indexed: 12/13/2022]
Abstract
Adenosine-to-inosine (A-to-I) RNA editing is a very common co-/posttranscriptional modification that can lead to A-to-G changes at the RNA level and compensate for G-to-A genomic changes to a certain extent. It has been shown that each healthy individual can carry dozens of missense variants predicted to be severely deleterious. Why strongly detrimental variants are preserved in a population and not eliminated by negative natural selection remains mostly unclear. Here, we ask if RNA editing correlates with the burden of deleterious A/G polymorphisms in a population. Integrating genome and transcriptome sequencing data from 447 human lymphoblastoid cell lines, we show that nonsynonymous editing activities (prevalence/level) are negatively correlated with the deleteriousness of A-to-G genomic changes and positively correlated with that of G-to-A genomic changes within the population. We find a significantly negative correlation between nonsynonymous editing activities and allele frequency of A within the population. This negative editing-allele frequency correlation is particularly strong when editing sites are located in highly important genes/loci. Examinations of deleterious missense variants from the 1000 Genomes Project further show a significantly higher proportion of rare missense mutations for G-to-A changes than for other types of changes. The proportion for G-to-A changes increases with increasing deleterious effects of the changes. Moreover, the deleteriousness of G-to-A changes is significantly positively correlated with the percentage of editing enzyme binding motifs at the variants. Overall, we show that nonsynonymous editing is associated with the increased burden of G-to-A missense mutations in healthy individuals, expanding RNA editing in pathogenomics studies.
Collapse
Affiliation(s)
- Te-Lun Mai
- Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan
| | | |
Collapse
|
6
|
Jing R, Liu Y, Guo P, Ni T, Gao X, Mei R, He X, Zhang J. Evaluation of Common Variants in Matrix Metalloproteinase-9 Gene with Lumbar Disc Herniation in Han Chinese Population. Genet Test Mol Biomarkers 2018; 22:622-629. [PMID: 30289281 DOI: 10.1089/gtmb.2018.0080] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
OBJECTIVE Lumbar disc herniation (LDH) is a common and frequent orthopedic disease with strong genetic determinants. The disruption of the intervertebral disc extracellular matrix has been found to play a key role in the development of LDH, suggesting that abnormal matrix metalloproteinases (MMPs) may promote the degradation of the disc matrix. MMP-9, an important member of the MMP family, is a good candidate for the LDH susceptibility gene. The present study aimed to investigate the association of common variants in the MMP-9 gene with the risk, severity, and clinical characteristic variables of LDH. MATERIALS AND METHODS Fourteen tag single nucleotide polymorphisms (SNPs) entirely covering the region of the MMP-9 gene were analyzed in a sample of 845 patients and 1751 healthy controls. RESULTS The SNP rs17576 was found to be significantly associated with susceptibility to LDH (OR = 0.77, p = 0.0002), which was also confirmed by haplotype-based analyses (rs79845319-rs17576-rs45437897, global p < 0.001). Our results indicated that the A allele of rs17576 reduced the risk of LDH by ∼23% on average. Furthermore, the G allele of rs17576 was found to correlate with more severe grades of disc degeneration. CONCLUSION Our results provide additional evidence supporting an important role of the MMP-9 gene in the pathogenesis of LDH.
Collapse
Affiliation(s)
- Rong Jing
- 1 Department of Orthopedics, The Second Affiliated Hospital, Xi'an Jiaotong University , Xi'an, China
| | - Yunlei Liu
- 2 Department of Traditional Chinese Medicine, Affiliated Hospital of Yan'an University , Yan' an, China
| | - Peng Guo
- 3 Department of Joint Surgery, Yan'an People's Hospital , Yan'an, China
| | - Tong Ni
- 4 Key Laboratory of National Ministry of Health for Forensic Sciences, School of Medicine and Forensics, Xi'an Jiaotong University , Xi'an, China
| | - Xiang Gao
- 5 Department of Rehabilitation Medicine, Affiliated Hospital of Yan'an University , Yan' an, China
| | - Rong Mei
- 5 Department of Rehabilitation Medicine, Affiliated Hospital of Yan'an University , Yan' an, China
| | - Xijing He
- 1 Department of Orthopedics, The Second Affiliated Hospital, Xi'an Jiaotong University , Xi'an, China
| | - Jianlin Zhang
- 3 Department of Joint Surgery, Yan'an People's Hospital , Yan'an, China
| |
Collapse
|
7
|
Cesca F, Bettella E, Polli R, Cama E, Scimemi P, Santarelli R, Murgia A. A novel mutation of the EYA4 gene associated with post-lingual hearing loss in a proband is co-segregating with a novel PAX3 mutation in two congenitally deaf family members. Int J Pediatr Otorhinolaryngol 2018; 104:88-93. [PMID: 29287889 DOI: 10.1016/j.ijporl.2017.10.042] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/01/2017] [Revised: 10/25/2017] [Accepted: 10/27/2017] [Indexed: 01/02/2023]
Abstract
OBJECTIVES This work was aimed at establishing the molecular etiology of hearing loss in a 9-year old girl with post-lingual non-syndromic mild sensorineural hearing loss with a complex family history of clinically heterogeneous deafness. METHODS The proband's DNA was subjected to NGS analysis of a 59-targeted gene panel, with the use of the Ion Torrent PGM platform. Conventional Sanger sequencing was used for segregation analysis in all the affected relatives. The proband and all the other hearing impaired members of the family underwent a thorough clinical and audiological evaluation. RESULTS A new likely pathogenic mutation in the EYA4 gene (c.1154C > T; p.Ser385Leu) was identified in the proband and in her 42-year-old father with post-lingual non-syndromic profound sensorineural hearing loss. The EYA4 mutation was also found in the proband's grandfather and uncle, both showing clinical features of Waardenburg syndrome type 1. A novel pathogenic splice-site mutation (c.321+1G > A) of the PAX3 gene was found to co-segregate with the EYA4 mutation in these two subjects. CONCLUSION The identified novel EYA4 mutation can be considered responsible of the hearing loss observed in the proband and her father, while a dual molecular diagnosis was reached in the relatives co-segregating the EYA4 and the PAX3 mutations. In these two subjects the DFNA10 phenotype was masked by Waardenburg syndrome. The use of NGS targeted gene-panel, in combination with an extensive clinical and audiological examination led us to identify the genetic cause of the hearing loss in members of a family in which different forms of autosomal dominant deafness segregate. These results provide precise and especially important prognostic and follow-up information for the future audiologic management in the youngest affected member.
Collapse
Affiliation(s)
- Federica Cesca
- Laboratory of Molecular Genetics of Neurodevelopment, Department of Women's and Children's Health, University of Padova, Italy
| | - Elisa Bettella
- Laboratory of Molecular Genetics of Neurodevelopment, Department of Women's and Children's Health, University of Padova, Italy
| | - Roberta Polli
- Laboratory of Molecular Genetics of Neurodevelopment, Department of Women's and Children's Health, University of Padova, Italy
| | - Elona Cama
- Audiology and Phoniatrics Service, Treviso Regional Hospital, Italy; Neuroscience Department, University of Padova, Italy
| | - Pietro Scimemi
- Audiology and Phoniatrics Service, Treviso Regional Hospital, Italy; Neuroscience Department, University of Padova, Italy
| | - Rosamaria Santarelli
- Audiology and Phoniatrics Service, Treviso Regional Hospital, Italy; Neuroscience Department, University of Padova, Italy
| | - Alessandra Murgia
- Laboratory of Molecular Genetics of Neurodevelopment, Department of Women's and Children's Health, University of Padova, Italy; Neuroscience Department, University of Padova, Italy.
| |
Collapse
|
8
|
Chuang TJ, Tseng YH, Chen CY, Wang YD. Assessment of imprinting- and genetic variation-dependent monoallelic expression using reciprocal allele descendants between human family trios. Sci Rep 2017; 7:7038. [PMID: 28765567 PMCID: PMC5539102 DOI: 10.1038/s41598-017-07514-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Accepted: 06/23/2017] [Indexed: 11/23/2022] Open
Abstract
Genomic imprinting is an important epigenetic process that silences one of the parentally-inherited alleles of a gene and thereby exhibits allelic-specific expression (ASE). Detection of human imprinting events is hampered by the infeasibility of the reciprocal mating system in humans and the removal of ASE events arising from non-imprinting factors. Here, we describe a pipeline with the pattern of reciprocal allele descendants (RADs) through genotyping and transcriptome sequencing data across independent parent-offspring trios to discriminate between varied types of ASE (e.g., imprinting, genetic variation-dependent ASE, and random monoallelic expression (RME)). We show that the vast majority of ASE events are due to sequence-dependent genetic variant, which are evolutionarily conserved and may themselves play a cis-regulatory role. Particularly, 74% of non-RAD ASE events, even though they exhibit ASE biases toward the same parentally-inherited allele across different individuals, are derived from genetic variation but not imprinting. We further show that the RME effect may affect the effectiveness of the population-based method for detecting imprinting events and our pipeline can help to distinguish between these two ASE types. Taken together, this study provides a good indicator for categorization of different types of ASE, opening up this widespread and complex mechanism for comprehensive characterization.
Collapse
Affiliation(s)
| | | | - Chia-Ying Chen
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Yi-Da Wang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
9
|
Purifying selection shapes the coincident SNP distribution of primate coding sequences. Sci Rep 2016; 6:27272. [PMID: 27255481 PMCID: PMC4891680 DOI: 10.1038/srep27272] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2016] [Accepted: 05/17/2016] [Indexed: 12/13/2022] Open
Abstract
Genome-wide analysis has observed an excess of coincident single nucleotide polymorphisms (coSNPs) at human-chimpanzee orthologous positions, and suggested that this is due to cryptic variation in the mutation rate. While this phenomenon primarily corresponds with non-coding coSNPs, the situation in coding sequences remains unclear. Here we calculate the observed-to-expected ratio of coSNPs (coSNPO/E) to estimate the prevalence of human-chimpanzee coSNPs, and show that the excess of coSNPs is also present in coding regions. Intriguingly, coSNPO/E is much higher at zero-fold than at nonzero-fold degenerate sites; such a difference is due to an elevation of coSNPO/E at zero-fold degenerate sites, rather than a reduction at nonzero-fold degenerate ones. These trends are independent of chimpanzee subpopulation, population size, or sequencing techniques; and hold in broad generality across primates. We find that this discrepancy cannot fully explained by sequence contexts, shared ancestral polymorphisms, SNP density, and recombination rate, and that coSNPO/E in coding sequences is significantly influenced by purifying selection. We also show that selection and mutation rate affect coSNPO/E independently, and coSNPs tend to be less damaging and more correlated with human diseases than non-coSNPs. These suggest that coSNPs may represent a “signature” during primate protein evolution.
Collapse
|
10
|
Abstract
The world of primate genomics is expanding rapidly in new and exciting ways owing to lowered costs and new technologies in molecular methods and bioinformatics. The primate order is composed of 78 genera and 478 species, including human. Taxonomic inferences are complex and likely a consequence of ongoing hybridization, introgression, and reticulate evolution among closely related taxa. Recently, we applied large-scale sequencing methods and extensive taxon sampling to generate a highly resolved phylogeny that affirms, reforms, and extends previous depictions of primate speciation. The next stage of research uses this phylogeny as a foundation for investigating genome content, structure, and evolution across primates. Ongoing and future applications of a robust primate phylogeny are discussed, highlighting advancements in adaptive evolution of genes and genomes, taxonomy and conservation management of endangered species, next-generation genomic technologies, and biomedicine.
Collapse
Affiliation(s)
- Jill Pecon-Slattery
- Laboratory of Genomic Diversity, National Cancer Institute, Frederick, Maryland 21702; Current Affiliation: Smithsonian Conservation Biology Institute, National Zoological Park, Front Royal, Virginia 22630;
| |
Collapse
|
11
|
Gittelman RM, Hun E, Ay F, Madeoy J, Pennacchio L, Noble WS, Hawkins RD, Akey JM. Comprehensive identification and analysis of human accelerated regulatory DNA. Genome Res 2015; 25:1245-55. [PMID: 26104583 PMCID: PMC4561485 DOI: 10.1101/gr.192591.115] [Citation(s) in RCA: 77] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2015] [Accepted: 06/15/2015] [Indexed: 01/19/2023]
Abstract
It has long been hypothesized that changes in gene regulation have played an important role in human evolution, but regulatory DNA has been much more difficult to study compared with protein-coding regions. Recent large-scale studies have created genome-scale catalogs of DNase I hypersensitive sites (DHSs), which demark potentially functional regulatory DNA. To better define regulatory DNA that has been subject to human-specific adaptive evolution, we performed comprehensive evolutionary and population genetics analyses on over 18 million DHSs discovered in 130 cell types. We identified 524 DHSs that are conserved in nonhuman primates but accelerated in the human lineage (haDHS), and estimate that 70% of substitutions in haDHSs are attributable to positive selection. Through extensive computational and experimental analyses, we demonstrate that haDHSs are often active in brain or neuronal cell types; play an important role in regulating the expression of developmentally important genes, including many transcription factors such as SOX6, POU3F2, and HOX genes; and identify striking examples of adaptive regulatory evolution that may have contributed to human-specific phenotypes. More generally, our results reveal new insights into conserved and adaptive regulatory DNA in humans and refine the set of genomic substrates that distinguish humans from their closest living primate relatives.
Collapse
Affiliation(s)
- Rachel M Gittelman
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Enna Hun
- Division of Medical Genetics, University of Washington, Seattle, Washington 98195, USA
| | - Ferhat Ay
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Jennifer Madeoy
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Len Pennacchio
- Lawrence Berkeley National Laboratory, Genomics Division, Berkeley, California 94701, USA
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - R David Hawkins
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA; Division of Medical Genetics, University of Washington, Seattle, Washington 98195, USA
| | - Joshua M Akey
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
12
|
Popovic D, Sifrim A, Davis J, Moreau Y, De Moor B. Problems with the nested granularity of feature domains in bioinformatics: the eXtasy case. BMC Bioinformatics 2015; 16 Suppl 4:S2. [PMID: 25734591 PMCID: PMC4347616 DOI: 10.1186/1471-2105-16-s4-s2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Background Data from biomedical domains often have an inherit hierarchical structure. As this structure is usually implicit, its existence can be overlooked by practitioners interested in constructing and evaluating predictive models from such data. Ignoring these constructs leads to potentially problematic and the routinely unrecognized bias in the models and results. In this work, we discuss this bias in detail and propose a simple, sampling-based solution for it. Next, we explore its sources and extent on synthetic data. Finally, we demonstrate how the state-of-the-art variant prioritization framework, eXtasy, benefits from using the described approach in its Random forest-based core classification model. Results and conclusions The conducted simulations clearly indicate that the heterogeneous granularity of feature domains poses significant problems for both the standard Random forest classifier and a modification that relies on stratified bootstrapping. Conversely, using the proposed sampling scheme when training the classifier mitigates the described bias. Furthermore, when applied to the eXtasy data under a realistic class distribution scenario, a Random forest learned using the proposed sampling scheme displays much better precision that its standard version, without degrading recall. Moreover, the largest performance gains are achieved in the most important part of the operating range: the top of prioritized gene list.
Collapse
|
13
|
Menezes MJ, Guo Y, Zhang J, Riley LG, Cooper ST, Thorburn DR, Li J, Dong D, Li Z, Glessner J, Davis RL, Sue CM, Alexander SI, Arbuckle S, Kirwan P, Keating BJ, Xu X, Hakonarson H, Christodoulou J. Mutation in mitochondrial ribosomal protein S7 (MRPS7) causes congenital sensorineural deafness, progressive hepatic and renal failure and lactic acidemia. Hum Mol Genet 2015; 24:2297-307. [DOI: 10.1093/hmg/ddu747] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
|
14
|
Inherited bone marrow failure associated with germline mutation of ACD, the gene encoding telomere protein TPP1. Blood 2014; 124:2767-74. [PMID: 25205116 DOI: 10.1182/blood-2014-08-596445] [Citation(s) in RCA: 77] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Telomerase is a ribonucleoprotein enzyme that is necessary for overcoming telomere shortening in human germ and stem cells. Mutations in telomerase or other telomere-maintenance proteins can lead to diseases characterized by depletion of hematopoietic stem cells and bone marrow failure (BMF). Telomerase localization to telomeres requires an interaction with a region on the surface of the telomere-binding protein TPP1 known as the TEL patch. Here, we identify a family with aplastic anemia and other related hematopoietic disorders in which a 1-amino-acid deletion in the TEL patch of TPP1 (ΔK170) segregates with disease. All family members carrying this mutation, but not those with wild-type TPP1, have short telomeres. When introduced into 293T cells, TPP1 with the ΔK170 mutation is able to localize to telomeres but fails to recruit telomerase to telomeres, supporting a causal relationship between this TPP1 mutation and bone marrow disorders. ACD/TPP1 is thus a newly identified telomere-related gene in which mutations cause aplastic anemia and related BMF disorders.
Collapse
|
15
|
Guo Y, Prokudin I, Yu C, Liang J, Xie Y, Flaherty M, Tian L, Crofts S, Wang F, Snyder J, Donaldson C, Abdel-Magid N, Vazquez L, Keating B, Hakonarson H, Wang J, Jamieson RV. Advantage of Whole Exome Sequencing over Allele-Specific and Targeted Segment Sequencing in Detection of Novel TULP1 Mutation in Leber Congenital Amaurosis. Ophthalmic Genet 2014; 36:333-8. [PMID: 24547928 DOI: 10.3109/13816810.2014.886269] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
BACKGROUND Leber congenital amaurosis (LCA) is a severe form of retinal dystrophy with marked underlying genetic heterogeneity. Until recently, allele-specific assays and Sanger sequencing of targeted segments were the only available approaches for attempted genetic diagnosis in this condition. A broader next-generation sequencing (NGS) strategy, such as whole exome sequencing, provides an improved molecular genetic diagnostic capacity for patients with these conditions. MATERIALS AND METHODS In a child with LCA, an allele-specific assay analyzing 135 known LCA-causing variations, followed by targeted segment sequencing of 61 regions in 14 causative genes was performed. Subsequently, exome sequencing was undertaken in the proband, unaffected consanguineous parents and two unaffected siblings. Bioinformatic analysis used two independent pipelines, BWA-GATK and SOAP, followed by Annovar and SnpEff to annotate the variants. RESULTS No disease-causing variants were found using the allele-specific or targeted segment Sanger sequencing assays. Analysis of variants in the exome sequence data revealed a novel homozygous nonsense mutation (c.1081C > T, p.Arg361*) in TULP1, a gene with roles in photoreceptor function where mutations were previously shown to cause LCA and retinitis pigmentosa. The identified homozygous variant was the top candidate using both bioinformatic pipelines. CONCLUSIONS This study highlights the value of the broad sequencing strategy of exome sequencing for disease gene identification in LCA, over other existing methods. NGS is particularly beneficial in LCA where there are a large number of causative disease genes, few distinguishing clinical features for precise candidate disease gene selection, and few mutation hotspots in any of the known disease genes.
Collapse
Affiliation(s)
- Yiran Guo
- a Center for Applied Genomics, The Children's Hospital of Philadelphia , Philadelphia , PA , USA
| | - Ivan Prokudin
- b Eye and Developmental Genetics Research Group, Western Sydney Genetics Program, The Children's Hospital at Westmead , Sydney , NSW , Australia .,c Children's Medical Research Institute , Westmead , Sydney , NSW , Australia
| | - Cong Yu
- d College of Life Sciences, Sichuan University, Key Laboratory for Bio-resources and Eco-environment of Ministry of Education, Sichuan Key Laboratory of Molecular Biology and Biotechnology , Chengdu , PR China .,e BGI-Shenzhen , Shenzhen , China
| | | | - Yi Xie
- e BGI-Shenzhen , Shenzhen , China
| | - Maree Flaherty
- f Department of Ophthalmology , The Children's Hospital at Westmead , Sydney , NSW , Australia
| | - Lifeng Tian
- a Center for Applied Genomics, The Children's Hospital of Philadelphia , Philadelphia , PA , USA
| | - Stephanie Crofts
- f Department of Ophthalmology , The Children's Hospital at Westmead , Sydney , NSW , Australia
| | - Fengxiang Wang
- a Center for Applied Genomics, The Children's Hospital of Philadelphia , Philadelphia , PA , USA
| | - James Snyder
- a Center for Applied Genomics, The Children's Hospital of Philadelphia , Philadelphia , PA , USA
| | - Craig Donaldson
- f Department of Ophthalmology , The Children's Hospital at Westmead , Sydney , NSW , Australia
| | - Nada Abdel-Magid
- a Center for Applied Genomics, The Children's Hospital of Philadelphia , Philadelphia , PA , USA
| | - Lyam Vazquez
- a Center for Applied Genomics, The Children's Hospital of Philadelphia , Philadelphia , PA , USA
| | - Brendan Keating
- a Center for Applied Genomics, The Children's Hospital of Philadelphia , Philadelphia , PA , USA .,g Division of Human Genetics , The Children's Hospital of Philadelphia , Philadelphia , PA , USA .,h Department of Pediatrics , The Perelman School of Medicine, University of Pennsylvania , Philadelphia , PA , USA
| | - Hakon Hakonarson
- a Center for Applied Genomics, The Children's Hospital of Philadelphia , Philadelphia , PA , USA .,g Division of Human Genetics , The Children's Hospital of Philadelphia , Philadelphia , PA , USA .,h Department of Pediatrics , The Perelman School of Medicine, University of Pennsylvania , Philadelphia , PA , USA
| | - Jun Wang
- e BGI-Shenzhen , Shenzhen , China .,i Department of Biology , University of Copenhagen , Copenhagen , Denmark .,j King Abdulaziz University , Jeddah , Saudi Arabia
| | - Robyn V Jamieson
- b Eye and Developmental Genetics Research Group, Western Sydney Genetics Program, The Children's Hospital at Westmead , Sydney , NSW , Australia .,c Children's Medical Research Institute , Westmead , Sydney , NSW , Australia .,k Discipline of Ophthalmology & Save Sight Institute, University of Sydney , Sydney , Australia , and.,l Disciplines of Paediatrics and Child Health & Genetic Medicine, University of Sydney , Sydney , NSW , Australia
| |
Collapse
|
16
|
Chuang TJ, Chen FC. DNA methylation is associated with an increased level of conservation at nondegenerate nucleotides in mammals. Mol Biol Evol 2014; 31:387-396. [PMID: 24157417 PMCID: PMC3907051 DOI: 10.1093/molbev/mst208] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
DNA methylation at CpG dinucleotides can significantly increase the rate of cytosine-to-thymine mutations and the level of sequence divergence. Although the correlations between DNA methylation and genomic sequence evolution have been widely studied, an unaddressed yet fundamental question is how DNA methylation is associated with the conservation of individual nucleotides in different sequence contexts. Here, we demonstrate that in mammalian exons, the correlations between DNA methylation and the conservation of individual nucleotides are dependent on the type of exonic sequence (coding or untranslated), the degeneracy of coding nucleotides, background selection pressure, and the relative position (first or nonfirst exon in the transcript) where the nucleotides are located. For untranslated and nonzero-fold degenerate nucleotides, methylated sites are less conserved than unmethylated sites regardless of background selection pressure and the relative position of the exon. For zero-fold degenerate (or nondegenerate) nucleotides, however, the reverse trend is observed in nonfirst coding exons and first coding exons that are under stringent background selection pressure. Furthermore, cytosine-to-thymine mutations at methylated zero-fold degenerate nucleotides are predicted to be more detrimental than those that occur at unmethylated nucleotides. As zero-fold and nonzero-fold degenerate nucleotides are very close to each other, our results suggest that the "functional resolution" of DNA methylation may be finer than previously recognized. In addition, the positive correlation between CpG methylation and the level of conservation at zero-fold degenerate nucleotides implies that CpG methylation may serve as an "indicator" of functional importance of these nucleotides.
Collapse
Affiliation(s)
- Trees-Juen Chuang
- Physical and Computational Genomics Division, Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Feng-Chi Chen
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Miaoli County, Taiwan
- Department of Life Science, National Chiao-Tung University, Hsinchu, Taiwan
- Department of Dentistry, China Medical University, Taichung, Taiwan
| |
Collapse
|
17
|
eXtasy: variant prioritization by genomic data fusion. Nat Methods 2013; 10:1083-4. [PMID: 24076761 DOI: 10.1038/nmeth.2656] [Citation(s) in RCA: 123] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2013] [Accepted: 08/26/2013] [Indexed: 01/01/2023]
Abstract
Massively parallel sequencing greatly facilitates the discovery of novel disease genes causing Mendelian and oligogenic disorders. However, many mutations are present in any individual genome, and identifying which ones are disease causing remains a largely open problem. We introduce eXtasy, an approach to prioritize nonsynonymous single-nucleotide variants (nSNVs) that substantially improves prediction of disease-causing variants in exome sequencing data by integrating variant impact prediction, haploinsufficiency prediction and phenotype-specific gene prioritization.
Collapse
|