1
|
Argov CM, Shneyour A, Jubran J, Sabag E, Mansbach A, Sepunaru Y, Filtzer E, Gruber G, Volozhinsky M, Yogev Y, Birk O, Chalifa-Caspi V, Rokach L, Yeger-Lotem E. Tissue-aware interpretation of genetic variants advances the etiology of rare diseases. Mol Syst Biol 2024; 20:1187-1206. [PMID: 39285047 PMCID: PMC11535248 DOI: 10.1038/s44320-024-00061-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 08/08/2024] [Accepted: 08/09/2024] [Indexed: 09/19/2024] Open
Abstract
Pathogenic variants underlying Mendelian diseases often disrupt the normal physiology of a few tissues and organs. However, variant effect prediction tools that aim to identify pathogenic variants are typically oblivious to tissue contexts. Here we report a machine-learning framework, denoted "Tissue Risk Assessment of Causality by Expression for variants" (TRACEvar, https://netbio.bgu.ac.il/TRACEvar/ ), that offers two advancements. First, TRACEvar predicts pathogenic variants that disrupt the normal physiology of specific tissues. This was achieved by creating 14 tissue-specific models that were trained on over 14,000 variants and combined 84 attributes of genetic variants with 495 attributes derived from tissue omics. TRACEvar outperformed 10 well-established and tissue-oblivious variant effect prediction tools. Second, the resulting models are interpretable, thereby illuminating variants' mode of action. Application of TRACEvar to variants of 52 rare-disease patients highlighted pathogenicity mechanisms and relevant disease processes. Lastly, the interpretation of all tissue models revealed that top-ranking determinants of pathogenicity included attributes of disease-affected tissues, particularly cellular process activities. Collectively, these results show that tissue contexts and interpretable machine-learning models can greatly enhance the etiology of rare diseases.
Collapse
Affiliation(s)
- Chanan M Argov
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Ariel Shneyour
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Juman Jubran
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Eric Sabag
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Avigdor Mansbach
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Yair Sepunaru
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Emmi Filtzer
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Gil Gruber
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Miri Volozhinsky
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Yuval Yogev
- Morris Kahn Laboratory of Human Genetics and the Genetics Institute at Soroka Medical Center, Faculty of Health Sciences, Ben Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Ohad Birk
- Morris Kahn Laboratory of Human Genetics and the Genetics Institute at Soroka Medical Center, Faculty of Health Sciences, Ben Gurion University of the Negev, Beer Sheva, 84105, Israel
- The National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Vered Chalifa-Caspi
- Ilse Katz Institute for Nanoscale Science & Technology, Ben-Gurion University of the Negev, Beer-Sheva, 84105, Israel
| | - Lior Rokach
- Department of Software & Information Systems Engineering, Faculty of Engineering Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel
| | - Esti Yeger-Lotem
- Department of Clinical Biochemistry and Pharmacology, Faculty of Health Sciences, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel.
- The National Institute for Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer Sheva, 84105, Israel.
| |
Collapse
|
2
|
Roberts M, Josephs EB. Previously unmeasured genetic diversity explains part of Lewontin's paradox in a k -mer-based meta-analysis of 112 plant species. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.17.594778. [PMID: 38798362 PMCID: PMC11118579 DOI: 10.1101/2024.05.17.594778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
At the molecular level, most evolution is expected to be neutral. A key prediction of this expectation is that the level of genetic diversity in a population should scale with population size. However, as was noted by Richard Lewontin in 1974 and reaffirmed by later studies, the slope of the population size-diversity relationship in nature is much weaker than expected under neutral theory. We hypothesize that one contributor to this paradox is that current methods relying on single nucleotide polymorphisms (SNPs) called from aligning short reads to a reference genome underestimate levels of genetic diversity in many species. To test this idea, we calculated nucleotide diversity ( π ) and k -mer-based metrics of genetic diversity across 112 plant species, amounting to over 205 terabases of DNA sequencing data from 27,488 individual plants. We then compared how these different metrics correlated with proxies of population size that account for both range size and population density variation across species. We found that our population size proxies scaled anywhere from about 3 to over 20 times faster with k -mer diversity than nucleotide diversity after adjusting for evolutionary history, mating system, life cycle habit, cultivation status, and invasiveness. The relationship between k -mer diversity and population size proxies also remains significant after correcting for genome size, whereas the analogous relationship for nucleotide diversity does not. These results suggest that variation not captured by common SNP-based analyses explains part of Lewontin's paradox in plants.
Collapse
Affiliation(s)
- Miles Roberts
- Genetics and Genome Sciences Program, Michigan State University, East Lansing MI
| | - Emily B. Josephs
- Department of Plant Biology, Michigan State University, East Lansing, MI
- Ecology, Evolution, and Behavior Program, Michigan State University, East Lansing, MI
- Plant Resilience Institute, Michigan State University, East Lansing, MI
| |
Collapse
|
3
|
Buffalo V, Kern AD. A quantitative genetic model of background selection in humans. PLoS Genet 2024; 20:e1011144. [PMID: 38507461 PMCID: PMC10984650 DOI: 10.1371/journal.pgen.1011144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Revised: 04/01/2024] [Accepted: 01/19/2024] [Indexed: 03/22/2024] Open
Abstract
Across the human genome, there are large-scale fluctuations in genetic diversity caused by the indirect effects of selection. This "linked selection signal" reflects the impact of selection according to the physical placement of functional regions and recombination rates along chromosomes. Previous work has shown that purifying selection acting against the steady influx of new deleterious mutations at functional portions of the genome shapes patterns of genomic variation. To date, statistical efforts to estimate purifying selection parameters from linked selection models have relied on classic Background Selection theory, which is only applicable when new mutations are so deleterious that they cannot fix in the population. Here, we develop a statistical method based on a quantitative genetics view of linked selection, that models how polygenic additive fitness variance distributed along the genome increases the rate of stochastic allele frequency change. By jointly predicting the equilibrium fitness variance and substitution rate due to both strong and weakly deleterious mutations, we estimate the distribution of fitness effects (DFE) and mutation rate across three geographically distinct human samples. While our model can accommodate weaker selection, we find evidence of strong selection operating similarly across all human samples. Although our quantitative genetic model of linked selection fits better than previous models, substitution rates of the most constrained sites disagree with observed divergence levels. We find that a model incorporating selective interference better predicts observed divergence in conserved regions, but overall our results suggest uncertainty remains about the processes generating fitness variation in humans.
Collapse
Affiliation(s)
- Vince Buffalo
- Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America
- Institute of Ecology and Evolution and Department of Biology, University of Oregon, Eugene, Oregon, United States of America
| | - Andrew D. Kern
- Institute of Ecology and Evolution and Department of Biology, University of Oregon, Eugene, Oregon, United States of America
| |
Collapse
|
4
|
Omori Y, Burgess SM. The Goldfish Genome and Its Utility for Understanding Gene Regulation and Vertebrate Body Morphology. Methods Mol Biol 2024; 2707:335-355. [PMID: 37668923 DOI: 10.1007/978-1-0716-3401-1_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/06/2023]
Abstract
Goldfish, widely viewed as an ornamental fish, is a member of Cyprinidae family and has a very long history in research for both genetics and physiology studies. Among Cyprinidae, the chromosomal locations of orthologs and the amino acid sequences are usually highly conserved. Adult goldfish are 1000 times larger than adult zebrafish (who are in the same family of fishes), which can make it easier to perform several types of experiments compared to their zebrafish cousins. Comparing mutant phenotypes in orthologous genes between goldfish and zebrafish can often be very informative and provide a deeper insight into the gene function than studying the gene in either species alone. Comparative genomics and phenotypic comparisons between goldfish and zebrafish will provide new opportunities for understanding the development and evolution of body forms in the vertebrate lineage.
Collapse
Affiliation(s)
- Yoshihiro Omori
- Laboratory of Functional Genomics, Graduate School of Bioscience, Nagahama Institute of Bioscience and Technology, Nagahama, Japan.
| | - Shawn M Burgess
- Translational and Functional Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA.
| |
Collapse
|
5
|
Guo Q, Wu S, Geschwind DH. Characterization of Gene Regulatory Elements in Human Fetal Cortical Development: Enhancing Our Understanding of Neurodevelopmental Disorders and Evolution. Dev Neurosci 2023; 46:69-83. [PMID: 37231806 DOI: 10.1159/000530929] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 04/24/2023] [Indexed: 05/27/2023] Open
Abstract
The neocortex is the region that most distinguishes human brain from other mammals and primates [Annu Rev Genet. 2021 Nov;55(1):555-81]. Studying the development of human cortex is important in understanding the evolutionary changes occurring in humans relative to other primates, as well as in elucidating mechanisms underlying neurodevelopmental disorders. Cortical development is a highly regulated process, spatially and temporally coordinated by expression of essential transcriptional factors in response to signaling pathways [Neuron. 2019 Sep;103(6):980-1004]. Enhancers are the most well-understood cis-acting, non-protein-coding regulatory elements that regulate gene expression [Nat Rev Genet. 2014 Apr;15(4):272-86]. Importantly, given the conservation of both DNA sequence and molecular function of the majority of proteins across mammals [Genome Res. 2003 Dec;13(12):2507-18], enhancers [Science. 2015 Mar;347(6226):1155-9], which are far more divergent at the sequence level, likely account for the phenotypes that distinguish the human brain by changing the regulation of gene expression. In this review, we will revisit the conceptual framework of gene regulation during human brain development, as well as the evolution of technologies to study transcriptional regulation, with recent advances in genome biology that open a window allowing us to systematically characterize cis-regulatory elements in developing human brain [Hum Mol Genet. 2022 Oct;31(R1):R84-96]. We provide an update on work to characterize the suite of all enhancers in the developing human brain and the implications for understanding neuropsychiatric disorders. Finally, we discuss emerging therapeutic ideas that utilize our emerging knowledge of enhancer function.
Collapse
Affiliation(s)
- Qiuyu Guo
- Center for Neurobehavioral Genetics, Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, California, USA
- Center for Autism Research and Treatment, Semel Institute, University of California Los Angeles, Los Angeles, California, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, USA
| | - Sarah Wu
- Center for Neurobehavioral Genetics, Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, California, USA
| | - Daniel H Geschwind
- Center for Neurobehavioral Genetics, Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, California, USA
- Center for Autism Research and Treatment, Semel Institute, University of California Los Angeles, Los Angeles, California, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, USA
- Institute of Precision Health, University of California Los Angeles, Los Angeles, California, USA
| |
Collapse
|
6
|
Song H, Wang Q, Zhang Z, Lin K, Pang E. Identification of clade-wide putative cis-regulatory elements from conserved non-coding sequences in Cucurbitaceae genomes. HORTICULTURE RESEARCH 2023; 10:uhad038. [PMID: 37799630 PMCID: PMC10548412 DOI: 10.1093/hr/uhad038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 02/20/2023] [Indexed: 10/07/2023]
Abstract
Cis-regulatory elements regulate gene expression and play an essential role in the development and physiology of organisms. Many conserved non-coding sequences (CNSs) function as cis-regulatory elements. They control the development of various lineages. However, predicting clade-wide cis-regulatory elements across several closely related species remains challenging. Based on the relationship between CNSs and cis-regulatory elements, we present a computational approach that predicts the clade-wide putative cis-regulatory elements in 12 Cucurbitaceae genomes. Using 12-way whole-genome alignment, we first obtained 632 112 CNSs in Cucurbitaceae. Next, we identified 16 552 Cucurbitaceae-wide cis-regulatory elements based on collinearity among all 12 Cucurbitaceae plants. Furthermore, we predicted 3 271 potential regulatory pairs in the cucumber genome, of which 98 were verified using integrative RNA sequencing and ChIP sequencing datasets from samples collected during various fruit development stages. The CNSs, Cucurbitaceae-wide cis-regulatory elements, and their target genes are accessible at http://cmb.bnu.edu.cn/cisRCNEs_cucurbit/. These elements are valuable resources for functionally annotating CNSs and their regulatory roles in Cucurbitaceae genomes.
Collapse
Affiliation(s)
- Hongtao Song
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | - Qi Wang
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | - Zhonghua Zhang
- College of Horticulture, Qingdao Agricultural University, Qingdao 266109, China
| | - Kui Lin
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| | - Erli Pang
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering and Beijing Key Laboratory of Gene Resource and Molecular Development, College of Life Sciences, Beijing Normal University, Beijing 100875, China
| |
Collapse
|
7
|
Chey YCJ, Arudkumar J, Aartsma-Rus A, Adikusuma F, Thomas PQ. CRISPR applications for Duchenne muscular dystrophy: From animal models to potential therapies. WIREs Mech Dis 2023; 15:e1580. [PMID: 35909075 PMCID: PMC10078488 DOI: 10.1002/wsbm.1580] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 04/28/2022] [Accepted: 06/30/2022] [Indexed: 01/31/2023]
Abstract
CRISPR gene-editing technology creates precise and permanent modifications to DNA. It has significantly advanced our ability to generate animal disease models for use in biomedical research and also has potential to revolutionize the treatment of genetic disorders. Duchenne muscular dystrophy (DMD) is a monogenic muscle-wasting disease that could potentially benefit from the development of CRISPR therapy. It is commonly associated with mutations that disrupt the reading frame of the DMD gene that encodes dystrophin, an essential scaffolding protein that stabilizes striated muscles and protects them from contractile-induced damage. CRISPR enables the rapid generation of various animal models harboring mutations that closely simulates the wide variety of mutations observed in DMD patients. These models provide a platform for the testing of sequence-specific interventions like CRISPR therapy that aim to reframe or skip DMD mutations to restore functional dystrophin expression. This article is categorized under: Congenital Diseases > Genetics/Genomics/Epigenetics.
Collapse
Affiliation(s)
- Yu C J Chey
- School of Biomedicine and Robinson Research Institute, University of Adelaide, Adelaide, South Australia, Australia.,Genome Editing Program, South Australian Health and Medical Research Institute (SAHMRI), Adelaide, South Australia, Australia
| | - Jayshen Arudkumar
- School of Biomedicine and Robinson Research Institute, University of Adelaide, Adelaide, South Australia, Australia.,Genome Editing Program, South Australian Health and Medical Research Institute (SAHMRI), Adelaide, South Australia, Australia
| | - Annemieke Aartsma-Rus
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Fatwa Adikusuma
- School of Biomedicine and Robinson Research Institute, University of Adelaide, Adelaide, South Australia, Australia.,Genome Editing Program, South Australian Health and Medical Research Institute (SAHMRI), Adelaide, South Australia, Australia.,CSIRO Synthetic Biology Future Science Platform, Canberra, Australia
| | - Paul Q Thomas
- School of Biomedicine and Robinson Research Institute, University of Adelaide, Adelaide, South Australia, Australia.,Genome Editing Program, South Australian Health and Medical Research Institute (SAHMRI), Adelaide, South Australia, Australia.,South Australian Genome Editing (SAGE), South Australian Health and Medical Research Institute (SAHMRI), Adelaide, South Australia, Australia
| |
Collapse
|
8
|
Smeds L, Ellegren H. From high masked to high realized genetic load in inbred Scandinavian wolves. Mol Ecol 2022; 32:1567-1580. [PMID: 36458895 DOI: 10.1111/mec.16802] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 11/17/2022] [Accepted: 11/28/2022] [Indexed: 12/03/2022]
Abstract
When new mutations arise at functional sites they are more likely to impair than improve fitness. If not removed by purifying selection, such deleterious mutations will generate a genetic load that can have negative fitness effects in small populations and increase the risk of extinction. This is relevant for the highly inbred Scandinavian wolf (Canis lupus) population, founded by only three wolves in the 1980s and suffering from inbreeding depression. We used functional annotation and evolutionary conservation scores to study deleterious variation in a total of 209 genomes from both the Scandinavian and neighbouring wolf populations in northern Europe. The masked load (deleterious mutations in heterozygote state) was highest in Russia and Finland with deleterious alleles segregating at lower frequency than neutral variation. Genetic drift in the Scandinavian population led to the loss of ancestral alleles, fixation of deleterious variants and a significant increase in the per-individual realized load (deleterious mutations in homozygote state; an increase by 45% in protein-coding genes) over five generations of inbreeding. Arrival of immigrants gave a temporary genetic rescue effect with ancestral alleles re-entering the population and thereby shifting deleterious alleles from homozygous into heterozygote genotypes. However, in the absence of permanent connectivity to Finnish and Russian populations, inbreeding has then again led to the exposure of deleterious mutations. These observations provide genome-wide insight into the magnitude of genetic load and genetic rescue at the molecular level, and in relation to population history. They emphasize the importance of securing gene flow in the management of endangered populations.
Collapse
Affiliation(s)
- Linnéa Smeds
- Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden
| | - Hans Ellegren
- Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden
| |
Collapse
|
9
|
Zheng M, Li RG, Song J, Zhao X, Tang L, Erhardt S, Chen W, Nguyen BH, Li X, Li M, Wang J, Evans SM, Christoffels VM, Li N, Wang J. Hippo-Yap Signaling Maintains Sinoatrial Node Homeostasis. Circulation 2022; 146:1694-1711. [PMID: 36317529 PMCID: PMC9897204 DOI: 10.1161/circulationaha.121.058777] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Accepted: 09/20/2022] [Indexed: 11/06/2022]
Abstract
BACKGROUND The sinoatrial node (SAN) functions as the pacemaker of the heart, initiating rhythmic heartbeats. Despite its importance, the SAN is one of the most poorly understood cardiac entities because of its small size and complex composition and function. The Hippo signaling pathway is a molecular signaling pathway fundamental to heart development and regeneration. Although abnormalities of the Hippo pathway are associated with cardiac arrhythmias in human patients, the role of this pathway in the SAN is unknown. METHODS We investigated key regulators of the Hippo pathway in SAN pacemaker cells by conditionally inactivating the Hippo signaling kinases Lats1 and Lats2 using the tamoxifen-inducible, cardiac conduction system-specific Cre driver Hcn4CreERT2 with Lats1 and Lats2 conditional knockout alleles. In addition, the Hippo-signaling effectors Yap and Taz were conditionally inactivated in the SAN. To determine the function of Hippo signaling in the SAN and other cardiac conduction system components, we conducted a series of physiological and molecular experiments, including telemetry ECG recording, echocardiography, Masson Trichrome staining, calcium imaging, immunostaining, RNAscope, cleavage under targets and tagmentation sequencing using antibodies against Yap1 or H3K4me3, quantitative real-time polymerase chain reaction, and Western blotting. We also performed comprehensive bioinformatics analyses of various datasets. RESULTS We found that Lats1/2 inactivation caused severe sinus node dysfunction. Compared with the controls, Lats1/2 conditional knockout mutants exhibited dysregulated calcium handling and increased fibrosis in the SAN, indicating that Lats1/2 function through both cell-autonomous and non-cell-autonomous mechanisms. It is notable that the Lats1/2 conditional knockout phenotype was rescued by genetic deletion of Yap and Taz in the cardiac conduction system. These rescued mice had normal sinus rhythm and reduced fibrosis of the SAN, indicating that Lats1/2 function through Yap and Taz. Cleavage Under Targets and Tagmentation sequencing data showed that Yap potentially regulates genes critical for calcium homeostasis such as Ryr2 and genes encoding paracrine factors important in intercellular communication and fibrosis induction such as Tgfb1 and Tgfb3. Consistent with this, Lats1/2 conditional knockout mutants had decreased Ryr2 expression and increased Tgfb1 and Tgfb3 expression compared with control mice. CONCLUSIONS We reveal, for the first time to our knowledge, that the canonical Hippo-Yap pathway plays a pivotal role in maintaining SAN homeostasis.
Collapse
Affiliation(s)
- Mingjie Zheng
- Department of Pediatrics, McGovern Medical School, The University of Texas Health Science Center at Houston (M.Z., X.Z., S.E., W.C., Jun Wang)
| | - Rich G Li
- Texas Heart Institute, Houston (R.G.L., X.L.)
| | - Jia Song
- Department of Medicine (Section of Cardiovascular Research), Cardiovascular Research Institute, Baylor College of Medicine, Houston, TX (J.S., N.L.)
| | - Xiaolei Zhao
- Department of Pediatrics, McGovern Medical School, The University of Texas Health Science Center at Houston (M.Z., X.Z., S.E., W.C., Jun Wang)
| | - Li Tang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan, China (L.T., M.L., Jianxin Wang)
| | - Shannon Erhardt
- Department of Pediatrics, McGovern Medical School, The University of Texas Health Science Center at Houston (M.Z., X.Z., S.E., W.C., Jun Wang)
- MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, The University of Texas, Houston (S.E., Jun Wang)
| | - Wen Chen
- Department of Pediatrics, McGovern Medical School, The University of Texas Health Science Center at Houston (M.Z., X.Z., S.E., W.C., Jun Wang)
| | - Bao H Nguyen
- Department of Molecular Physiology and Biophysics (B.H.N.)
| | - Xiao Li
- Texas Heart Institute, Houston (R.G.L., X.L.)
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan, China (L.T., M.L., Jianxin Wang)
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan, China (L.T., M.L., Jianxin Wang)
| | - Sylvia M Evans
- Skaggs School of Pharmacy and Pharmaceutical Sciences, Departments of Pharmacology and Medicine, University of California at San Diego, La Jolla (S.M.E.)
| | - Vincent M Christoffels
- Medical Biology, Amsterdam Cardiovascular Sciences, Amsterdam UMC, University of Amsterdam, The Netherlands (V.M.C.)
| | - Na Li
- Department of Medicine (Section of Cardiovascular Research), Cardiovascular Research Institute, Baylor College of Medicine, Houston, TX (J.S., N.L.)
| | - Jun Wang
- Department of Pediatrics, McGovern Medical School, The University of Texas Health Science Center at Houston (M.Z., X.Z., S.E., W.C., Jun Wang)
- MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, The University of Texas, Houston (S.E., Jun Wang)
| |
Collapse
|
10
|
Bae J, Choi YS, Cho G, Jang SJ. The Patient-Derived Cancer Organoids: Promises and Challenges as Platforms for Cancer Discovery. Cancers (Basel) 2022; 14:cancers14092144. [PMID: 35565273 PMCID: PMC9105149 DOI: 10.3390/cancers14092144] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 04/21/2022] [Accepted: 04/22/2022] [Indexed: 02/01/2023] Open
Abstract
The cancer burden is rapidly increasing in most countries, and thus, new anticancer drugs for effective cancer therapy must be developed. Cancer model systems that recapitulate the biological processes of human cancers are one of the cores of the drug development process. PDCO has emerged as a unique model that preserves the genetic, physiological, and histologic characteristics of original cancer, including inter- and intratumoral heterogeneities. Due to these advantages, the PCDO model is increasingly investigated for anticancer drug screening and efficacy testing, preclinical patient stratification, and precision medicine for selecting the most effective anticancer therapy for patients. Here, we review the prospects and limitations of PDCO compared to the conventional cancer models. With advances in culture success rates, co-culture systems with the tumor microenvironment, organoid-on-a-chip technology, and automation technology, PDCO will become the most promising model to develop anticancer drugs and precision medicine.
Collapse
Affiliation(s)
- JuneSung Bae
- Department of Research and Development, OncoClew Co., Ltd., Seoul 04778, Korea; (J.B.); (Y.S.C.); (G.C.)
| | - Yun Sik Choi
- Department of Research and Development, OncoClew Co., Ltd., Seoul 04778, Korea; (J.B.); (Y.S.C.); (G.C.)
| | - Gunsik Cho
- Department of Research and Development, OncoClew Co., Ltd., Seoul 04778, Korea; (J.B.); (Y.S.C.); (G.C.)
| | - Se Jin Jang
- Department of Research and Development, OncoClew Co., Ltd., Seoul 04778, Korea; (J.B.); (Y.S.C.); (G.C.)
- Department of Pathology, Asan Medical Center, University of Ulsan College of Medicine, Seoul 05505, Korea
- Asan Center for Cancer Genome Discovery, Asan Institute for Life Sciences, Seoul 05505, Korea
- Correspondence: ; Tel.: +82-2-498-2644; Fax: +82-2-498-2655
| |
Collapse
|
11
|
Perera DDBD, Perera KML, Peiris DC. A Novel In Silico Benchmarked Pipeline Capable of Complete Protein Analysis: A Possible Tool for Potential Drug Discovery. BIOLOGY 2021; 10:biology10111113. [PMID: 34827106 PMCID: PMC8615085 DOI: 10.3390/biology10111113] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 10/16/2021] [Accepted: 10/25/2021] [Indexed: 01/11/2023]
Abstract
Simple Summary Protein interactions govern the majority of an organism’s biological processes. Therefore, to fully understand the functionality of an organism, we must know how proteins work at a molecular level. This study assembled a protocol that enables scientists to construct a protein’s tertiary structure easily and subsequently to investigate its mechanism and function. Each step involved in prediction, validation, and functional analysis of a protein is crucial to obtain an accurate result. We have dubbed this the trifecta analysis. It was clear early in our research that no single study in the literature had previously encompassed the complete trifecta analysis. In particular, studies that recommend free, open-source tools that have been benchmarked for each step are lacking. The present study ensures that predictions are accurate and validated and will greatly benefit new and experienced scientists alike in obtaining a strong understanding of the trifecta analysis, resulting in a domino effect that could lead to drug development. Abstract Current in silico proteomics require the trifecta analysis, namely, prediction, validation, and functional assessment of a modeled protein. The main drawback of this endeavor is the lack of a single protocol that utilizes a proper set of benchmarked open-source tools to predict a protein’s structure and function accurately. The present study rectifies this drawback through the design and development of such a protocol. The protocol begins with the characterization of a novel coding sequence to identify the expressed protein. It then recognizes and isolates evolutionarily conserved sequence motifs through phylogenetics. The next step is to predict the protein’s secondary structure, followed by the prediction, refinement, and validation of its three-dimensional tertiary structure. These steps enable the functional analysis of the macromolecule through protein docking, which facilitates the identification of the protein’s active site. Each of these steps is crucial for the complete characterization of the protein under study. We have dubbed this process the trifecta analysis. In this study, we have proven the effectiveness of our protocol using the cystatin C and AChE proteins. Beginning with just their sequences, we have characterized both proteins’ structures and functions, including identifying the cystatin C protein’s seven-residue active site and the AChE protein’s active-site gorge via protein–protein and protein–ligand docking, respectively. This process will greatly benefit new and experienced scientists alike in obtaining a strong understanding of the trifecta analysis, resulting in a domino effect that could expand drug development.
Collapse
Affiliation(s)
- D. D. B. D. Perera
- Department of Zoology, Faculty of Applied Sciences, University of Sri Jayewardenepura, Nugegoda 10250, Sri Lanka;
- Correspondence: (D.D.B.D.P.); (D.C.P.); Tel.: +94-714-018-537 (D.C.P.)
| | - K. Minoli L. Perera
- Department of Zoology, Faculty of Applied Sciences, University of Sri Jayewardenepura, Nugegoda 10250, Sri Lanka;
| | - Dinithi C. Peiris
- Genetics & Molecular Biology Unit (Center for Biotechnology), Department of Zoology, Faculty of Applied Sciences, University of Sri Jayewardenepura, Nugegoda 10250, Sri Lanka
- Correspondence: (D.D.B.D.P.); (D.C.P.); Tel.: +94-714-018-537 (D.C.P.)
| |
Collapse
|
12
|
Yang TH, Wang CY, Tsai HC, Liu CT. Human IRES Atlas: an integrative platform for studying IRES-driven translational regulation in humans. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021; 2021:6263636. [PMID: 33942874 PMCID: PMC8094437 DOI: 10.1093/database/baab025] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 04/16/2021] [Accepted: 04/23/2021] [Indexed: 11/13/2022]
Abstract
It is now known that cap-independent translation initiation facilitated by internal ribosome entry sites (IRESs) is vital in selective cellular protein synthesis under stress and different physiological conditions. However, three problems make it hard to understand transcriptome-wide cellular IRES-mediated translation initiation mechanisms: (i) complex interplay between IRESs and other translation initiation–related information, (ii) reliability issue of in silico cellular IRES investigation and (iii) labor-intensive in vivo IRES identification. In this research, we constructed the Human IRES Atlas database for a comprehensive understanding of cellular IRESs in humans. First, currently available and suitable IRES prediction tools (IRESfinder, PatSearch and IRESpy) were used to obtain transcriptome-wide human IRESs. Then, we collected eight genres of translation initiation–related features to help study the potential molecular mechanisms of each of the putative IRESs. Three functional tests (conservation, structural RNA–protein scores and conditional translation efficiency) were devised to evaluate the functionality of the identified putative IRESs. Moreover, an easy-to-use interface and an IRES–translation initiation interaction map for each gene transcript were implemented to help understand the interactions between IRESs and translation initiation–related features. Researchers can easily search/browse an IRES of interest using the web interface and deduce testable mechanism hypotheses of human IRES-driven translation initiation based on the integrated results. In summary, Human IRES Atlas integrates putative IRES elements and translation initiation–related experiments for better usage of these data and deduction of mechanism hypotheses. Database URL: http://cobishss0.im.nuk.edu.tw/Human_IRES_Atlas/
Collapse
Affiliation(s)
- Tzu-Hsien Yang
- Department of Information Management, National University of Kaohsiung, 700, Kaohsiung University Rd., Nanzih District, Kaohsiung, Taiwan 811, Republic of China
| | - Chung-Yu Wang
- Department of Information Management, National University of Kaohsiung, 700, Kaohsiung University Rd., Nanzih District, Kaohsiung, Taiwan 811, Republic of China
| | - Hsiu-Chun Tsai
- Department of Information Management, National University of Kaohsiung, 700, Kaohsiung University Rd., Nanzih District, Kaohsiung, Taiwan 811, Republic of China
| | - Cheng-Tse Liu
- Department of Information Management, National University of Kaohsiung, 700, Kaohsiung University Rd., Nanzih District, Kaohsiung, Taiwan 811, Republic of China
| |
Collapse
|
13
|
Conserved long-range base pairings are associated with pre-mRNA processing of human genes. Nat Commun 2021; 12:2300. [PMID: 33863890 PMCID: PMC8052449 DOI: 10.1038/s41467-021-22549-7] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 03/20/2021] [Indexed: 02/07/2023] Open
Abstract
The ability of nucleic acids to form double-stranded structures is essential for all living systems on Earth. Current knowledge on functional RNA structures is focused on locally-occurring base pairs. However, crosslinking and proximity ligation experiments demonstrated that long-range RNA structures are highly abundant. Here, we present the most complete to-date catalog of conserved complementary regions (PCCRs) in human protein-coding genes. PCCRs tend to occur within introns, suppress intervening exons, and obstruct cryptic and inactive splice sites. Double-stranded structure of PCCRs is supported by decreased icSHAPE nucleotide accessibility, high abundance of RNA editing sites, and frequent occurrence of forked eCLIP peaks. Introns with PCCRs show a distinct splicing pattern in response to RNAPII slowdown suggesting that splicing is widely affected by co-transcriptional RNA folding. The enrichment of 3'-ends within PCCRs raises the intriguing hypothesis that coupling between RNA folding and splicing could mediate co-transcriptional suppression of premature pre-mRNA cleavage and polyadenylation.
Collapse
|
14
|
Crtc modulates fasting programs associated with 1-C metabolism and inhibition of insulin signaling. Proc Natl Acad Sci U S A 2021; 118:2024865118. [PMID: 33723074 DOI: 10.1073/pnas.2024865118] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Fasting in mammals promotes increases in circulating glucagon and decreases in circulating insulin that stimulate catabolic programs and facilitate a transition from glucose to lipid burning. The second messenger cAMP mediates effects of glucagon on fasting metabolism, in part by promoting the phosphorylation of CREB and the dephosphorylation of the cAMP-regulated transcriptional coactivators (CRTCs) in hepatocytes. In Drosophila, fasting also triggers activation of the single Crtc homolog in neurons, via the PKA-mediated phosphorylation and inhibition of salt-inducible kinases. Crtc mutant flies are more sensitive to starvation and oxidative stress, although the underlying mechanism remains unclear. Here we use RNA sequencing to identify Crtc target genes that are up-regulated in response to starvation. We found that Crtc stimulates a subset of fasting-inducible genes that have conserved CREB binding sites. In keeping with its role in the starvation response, Crtc was found to induce the expression of genes that inhibit insulin secretion (Lst) and insulin signaling (Impl2). In parallel, Crtc also promoted the expression of genes involved in one-carbon (1-C) metabolism. Within the 1-C pathway, Crtc stimulated the expression of enzymes that encode modulators of S-adenosyl-methionine metabolism (Gnmt and Sardh) and purine synthesis (ade2 and AdSl) Collectively, our results point to an important role for the CREB/CRTC pathway in promoting energy balance in the context of nutrient stress.
Collapse
|
15
|
Denes CE, Newsome TP, Miranda-Saksena M, Cunningham AL, Diefenbach RJ. A putative WAVE regulatory complex (WRC) interacting receptor sequence (WIRS) in the cytoplasmic tail of HSV-1 gE does not function in WRC recruitment or neuronal transport. Access Microbiol 2021; 3:000206. [PMID: 34151161 PMCID: PMC8209697 DOI: 10.1099/acmi.0.000206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Accepted: 02/04/2021] [Indexed: 11/18/2022] Open
Abstract
HSV-1 envelope glycoprotein E (gE) is important for viral egress and cell-to-cell spread but the host protein(s) involved in these functions have yet to be determined. We aimed to investigate a role for the Arp2/3 complex and actin regulation in viral egress based on the identification of a WAVE Regulatory Complex (WRC) Interacting Receptor Sequence (WIRS) in the cytoplasmic tail (CT) of gE. A WIRS-dependent interaction between the gE(CT) and subunits of the WRC was demonstrated by GST-pulldown assay and a role for the Arp2/3 complex in cell-to-cell spread was also observed by plaque assay. Subsequent study of a recombinant HSV-1 gE WIRS-mutant found no significant changes to viral production and release based on growth kinetics studies, or changes to plaque and comet size in various cell types, suggesting no function for the motif in cell-to-cell spread. GFP-Trap pulldown and proximity ligation assays were unable to confirm a WIRS-dependent interaction between gE and the WRC in human cell lines though the WIRS-independent interaction observed in situ warrants further study. Confocal microscopy of infected cells of neuronal origin identified no impairment of gE WIRS-mutant HSV-1 anterograde transport along axons. We propose that the identified gE WIRS motif does not function directly in recruitment of the WRC in human cells, in cell-to-cell spread of virus or in anterograde transport along axons. Further studies are needed to understand how HSV-1 manipulates and traverses the actin cytoskeleton and how gE may contribute to these processes in a WIRS-independent manner.
Collapse
Affiliation(s)
- Christopher E Denes
- Centre for Virus Research, The Westmead Institute for Medical Research, The University of Sydney, Westmead, NSW 2145, Australia.,School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW 2006, Australia
| | - Timothy P Newsome
- School of Life and Environmental Sciences, Faculty of Science, The University of Sydney, Sydney, NSW 2006, Australia
| | - Monica Miranda-Saksena
- Centre for Virus Research, The Westmead Institute for Medical Research, The University of Sydney, Westmead, NSW 2145, Australia
| | - Anthony L Cunningham
- Centre for Virus Research, The Westmead Institute for Medical Research, The University of Sydney, Westmead, NSW 2145, Australia
| | - Russell J Diefenbach
- Centre for Virus Research, The Westmead Institute for Medical Research, The University of Sydney, Westmead, NSW 2145, Australia.,Department of Biomedical Sciences, Faculty of Medicine, Health and Human Sciences, Macquarie University, Sydney, NSW 2109, Australia
| |
Collapse
|
16
|
Millet-Boureima C, Selber-Hnatiw S, Gamberi C. Drug discovery and chemical probing in Drosophila. Genome 2020; 64:147-159. [PMID: 32551911 DOI: 10.1139/gen-2020-0037] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Flies are increasingly utilized in drug discovery and chemical probing in vivo, which are novel technologies complementary to genetic probing in fundamental biological studies. Excellent genetic conservation, small size, short generation time, and over one hundred years of genetics make Drosophila an attractive model for rapid assay readout and use of analytical amounts of compound, enabling the experimental iterations needed in early drug development at a fraction of time and costs. Here, we describe an effective drug-testing pipeline using adult flies that can be easily implemented to study several disease models and different genotypes to discover novel molecular insight, probes, quality lead compounds, and develop novel prototype drugs.
Collapse
Affiliation(s)
- Cassandra Millet-Boureima
- Biology Department, Concordia University, Montreal, QC H4B 1R6, Canada.,Biology Department, Concordia University, Montreal, QC H4B 1R6, Canada
| | - Susannah Selber-Hnatiw
- Biology Department, Concordia University, Montreal, QC H4B 1R6, Canada.,Biology Department, Concordia University, Montreal, QC H4B 1R6, Canada
| | - Chiara Gamberi
- Biology Department, Concordia University, Montreal, QC H4B 1R6, Canada.,Biology Department, Concordia University, Montreal, QC H4B 1R6, Canada
| |
Collapse
|
17
|
Huber CD, Kim BY, Lohmueller KE. Population genetic models of GERP scores suggest pervasive turnover of constrained sites across mammalian evolution. PLoS Genet 2020; 16:e1008827. [PMID: 32469868 PMCID: PMC7286533 DOI: 10.1371/journal.pgen.1008827] [Citation(s) in RCA: 68] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Revised: 06/10/2020] [Accepted: 05/05/2020] [Indexed: 01/20/2023] Open
Abstract
Comparative genomic approaches have been used to identify sites where mutations are under purifying selection and of functional consequence by searching for sequences that are conserved across distantly related species. However, the performance of these approaches has not been rigorously evaluated under population genetic models. Further, short-lived functional elements may not leave a footprint of sequence conservation across many species. We use simulations to study how one measure of conservation, the Genomic Evolutionary Rate Profiling (GERP) score, relates to the strength of selection (Nes). We show that the GERP score is related to the strength of purifying selection. However, changes in selection coefficients or functional elements over time (i.e. functional turnover) can strongly affect the GERP distribution, leading to unexpected relationships between GERP and Nes. Further, we show that for functional elements that have a high turnover rate, adding more species to the analysis does not necessarily increase statistical power. Finally, we use the distribution of GERP scores across the human genome to compare models with and without turnover of sites where mutations are under purifying selection. We show that mutations in 4.51% of the noncoding human genome are under purifying selection and that most of this sequence has likely experienced changes in selection coefficients throughout mammalian evolution. Our work reveals limitations to using comparative genomic approaches to identify deleterious mutations. Commonly used GERP score thresholds miss over half of the noncoding sites in the human genome where mutations are under purifying selection. One of the most significant and challenging tasks in modern genomics is to assess the functional consequences of a particular nucleotide change in a genome. A common approach to address this challenge prioritizes sequences that share similar nucleotides across distantly related species, with the rationale that mutations at such positions were deleterious and removed from the population by purifying natural selection. Our manuscript shows that one popular measure of sequence conservation, the GERP score, performs well at identifying selected mutations if mutations at a site were under selection across all of mammalian evolution. Changes in selection at a given site dramatically reduces the power of GERP to detect selected mutations in humans. We also combine population genetic models with the distribution of GERP scores at noncoding sites across the human genome to show that the degree of selection at individual sites has changed throughout mammalian evolution. Importantly, we demonstrate that at least 80 Mb of noncoding sequence under purifying selection in humans will not have extreme GERP scores and will likely be missed by modern comparative genomic approaches. Our work argues that new approaches, potentially based on genetic variation within species, will be required to identify deleterious mutations.
Collapse
Affiliation(s)
- Christian D. Huber
- School of Biological Sciences, University of Adelaide, Adelaide, South Australia, Australia
| | - Bernard Y. Kim
- Department of Biology, Stanford University, Stanford, California, United States of America
| | - Kirk E. Lohmueller
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California, United States of America
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, California, United States of America
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California, United States of America
- * E-mail:
| |
Collapse
|
18
|
Abstract
BACKGROUND RNA-binding proteins (RBPs) are crucial in modulating RNA metabolism in eukaryotes thereby controlling an extensive network of RBP-RNA interactions. Although previous studies on the conservation of RBP targets have been carried out in lower eukaryotes such as yeast, relatively little is known about the extent of conservation of the binding sites of RBPs across mammalian species. RESULTS In this study, we employ CLIP-seq datasets for 60 human RBPs and demonstrate that most binding sites for a third of these RBPs are conserved in at least 50% of the studied vertebrate species. Across the studied RBPs, binding sites were found to exhibit a median conservation of 58%, ~ 20% higher than random genomic locations, suggesting a significantly higher preservation of RBP-RNA interaction networks across vertebrates. RBP binding sites were highly conserved across primates with weak conservation profiles in birds and fishes. We also note that phylogenetic relationship between members of an RBP family does not explain the extent of conservation of their binding sites across species. Multivariate analysis to uncover features contributing to differences in the extents of conservation of binding sites across RBPs revealed RBP expression level and number of post-transcriptional targets to be the most prominent factors. Examination of the location of binding sites at the gene level confirmed that binding sites occurring on the 3' region of a gene are highly conserved across species with 90% of the RBPs exhibiting a significantly higher conservation of binding sites in 3' regions of a gene than those occurring in the 5'. Gene set enrichment analysis on the extent of conservation of binding sites to identify significantly associated human phenotypes revealed an enrichment for multiple developmental abnormalities. CONCLUSIONS Our results suggest that binding sites of human RBPs are highly conserved across primates with weak conservation profiles in lower vertebrates and evolutionary relationship between members of an RBP family does not explain the extent of conservation of their binding sites. Expression level and number of targets of an RBP are important factors contributing to the differences in the extent of conservation of binding sites. RBP binding sites on 3' ends of a gene are the most conserved across species. Phenotypic analysis on the extent of conservation of binding sites revealed the importance of lineage-specific developmental events in post-transcriptional regulatory network evolution.
Collapse
Affiliation(s)
- Aarthi Ramakrishnan
- Department of Biohealth Informatics, School of Informatics and Computing, Indiana University Purdue University, Indianapolis, IN, 46202, USA
| | - Sarath Chandra Janga
- Department of Biohealth Informatics, School of Informatics and Computing, Indiana University Purdue University, Indianapolis, IN, 46202, USA. .,Centre for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA. .,Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, 46202, USA.
| |
Collapse
|
19
|
Jiang Y, Wu C, Zhang Y, Zhang S, Yu S, Lei P, Lu Q, Xi Y, Wang H, Song Z. GTX.Digest.VCF: an online NGS data interpretation system based on intelligent gene ranking and large-scale text mining. BMC Med Genomics 2019; 12:193. [PMID: 31856831 PMCID: PMC6923899 DOI: 10.1186/s12920-019-0637-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Accepted: 11/26/2019] [Indexed: 02/07/2023] Open
Abstract
Background An important task in the interpretation of sequencing data is to highlight pathogenic genes (or detrimental variants) in the field of Mendelian diseases. It is still challenging despite the recent rapid development of genomics and bioinformatics. A typical interpretation workflow includes annotation, filtration, manual inspection and literature review. Those steps are time-consuming and error-prone in the absence of systematic support. Therefore, we developed GTX.Digest.VCF, an online DNA sequencing interpretation system, which prioritizes genes and variants for novel disease-gene relation discovery and integrates text mining results to provide literature evidence for the discovery. Its phenotype-driven ranking and biological data mining approach significantly speed up the whole interpretation process. Results The GTX.Digest.VCF system is freely available as a web portal at http://vcf.gtxlab.com for academic research. Evaluation on the DDD project dataset demonstrates an accuracy of 77% (235 out of 305 cases) for top-50 genes and an accuracy of 41.6% (127 out of 305 cases) for top-5 genes. Conclusions GTX.Digest.VCF provides an intelligent web portal for genomics data interpretation via the integration of bioinformatics tools, distributed parallel computing, biomedical text mining. It can facilitate the application of genomic analytics in clinical research and practices.
Collapse
Affiliation(s)
| | - Chengkun Wu
- State Key Laboratory of High-Performance Computing, College of Computer, National University of Defense Technology, Changsha, 410073, China
| | - Yanghui Zhang
- NHC key laboratory of birth defects research, prevention and treatment (Hunan Provincial Maternal and Child Health Care Hospital), NO.53 Xiangchun Road, Changsha, 410008, Hunan, China
| | - Shaowei Zhang
- Genetalks Biotech. Co., Ltd., Changsha, 410000, China
| | - Shuojun Yu
- Genetalks Biotech. Co., Ltd., Changsha, 410000, China
| | - Peng Lei
- Genetalks Biotech. Co., Ltd., Changsha, 410000, China
| | - Qin Lu
- Genetalks Biotech. Co., Ltd., Changsha, 410000, China
| | - Yanwei Xi
- Cytogenetics and Human Molecular Genetics Laboratories, Royal University Hospital, Saskatoon, SK, Canada
| | - Hua Wang
- NHC key laboratory of birth defects research, prevention and treatment (Hunan Provincial Maternal and Child Health Care Hospital), NO.53 Xiangchun Road, Changsha, 410008, Hunan, China. .,Hunan Provincial Maternal and Child Health Care Hospital, Changsha, 410073, China.
| | - Zhuo Song
- Genetalks Biotech. Co., Ltd., Changsha, 410000, China.
| |
Collapse
|
20
|
Choi H, Joe S, Nam H. Development of Tissue-Specific Age Predictors Using DNA Methylation Data. Genes (Basel) 2019; 10:genes10110888. [PMID: 31690030 PMCID: PMC6896025 DOI: 10.3390/genes10110888] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 11/01/2019] [Accepted: 11/01/2019] [Indexed: 12/17/2022] Open
Abstract
DNA methylation patterns have been shown to change throughout the normal aging process. Several studies have found epigenetic aging markers using age predictors, but these studies only focused on blood-specific or tissue-common methylation patterns. Here, we constructed nine tissue-specific age prediction models using methylation array data from normal samples. The constructed models predict the chronological age with good performance (mean absolute error of 5.11 years on average) and show better performance in the independent test than previous multi-tissue age predictors. We also compared tissue-common and tissue-specific aging markers and found that they had different characteristics. Firstly, the tissue-common group tended to contain more positive aging markers with methylation values that increased during the aging process, whereas the tissue-specific group tended to contain more negative aging markers. Secondly, many of the tissue-common markers were located in Cytosine-phosphate-Guanine (CpG) island regions, whereas the tissue-specific markers were located in CpG shore regions. Lastly, the tissue-common CpG markers tended to be located in more evolutionarily conserved regions. In conclusion, our prediction models identified CpG markers that capture both tissue-common and tissue-specific characteristics during the aging process.
Collapse
Affiliation(s)
- Heeyeon Choi
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science of Technology, Gwangju 61005, Korea.
| | - Soobok Joe
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science of Technology, Gwangju 61005, Korea.
| | - Hojung Nam
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science of Technology, Gwangju 61005, Korea.
| |
Collapse
|
21
|
Lenzini L, Di Patti F, Livi R, Fondi M, Fani R, Mengoni A. A Method for the Structure-Based, Genome-Wide Analysis of Bacterial Intergenic Sequences Identifies Shared Compositional and Functional Features. Genes (Basel) 2019; 10:genes10100834. [PMID: 31652625 PMCID: PMC6826451 DOI: 10.3390/genes10100834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Revised: 10/07/2019] [Accepted: 10/16/2019] [Indexed: 11/16/2022] Open
Abstract
In this paper, we propose a computational strategy for performing genome-wide analyses of intergenic sequences in bacterial genomes. Following similar directions of a previous paper, where a method for genome-wide analysis of eucaryotic Intergenic sequences was proposed, here we developed a tool for implementing similar concepts in bacteria genomes. This allows us to (i) classify intergenic sequences into clusters, characterized by specific global structural features and (ii) draw possible relations with their functional features.
Collapse
Affiliation(s)
- Leonardo Lenzini
- Dipartimento di Fisica e Astronomia, Università degli Studi di Firenze, Sesto Fiorentino, 50019, Italy.
- Istituto Nazionale di Fisica Nucleare, Sesto Fiorentino, 50019, Italy.
| | - Francesca Di Patti
- Dipartimento di Fisica e Astronomia, Università degli Studi di Firenze, Sesto Fiorentino, 50019, Italy.
- Centro Interdipartimentale per lo Studio delle Dinamiche Complesse, Sesto Fiorentino, 50019, Italy.
| | - Roberto Livi
- Dipartimento di Fisica e Astronomia, Università degli Studi di Firenze, Sesto Fiorentino, 50019, Italy.
- Istituto Nazionale di Fisica Nucleare, Sesto Fiorentino, 50019, Italy.
- Centro Interdipartimentale per lo Studio delle Dinamiche Complesse, Sesto Fiorentino, 50019, Italy.
- Istituto dei Sistemi Complessi, Consiglio Nazionale delle Ricerche, Sesto Fiorentino, 50019, Italy.
| | - Marco Fondi
- Dipartimento di Biologia, Università degli Studi di Firenze, Sesto Fiorentino, 50019, Italy.
| | - Renato Fani
- Istituto dei Sistemi Complessi, Consiglio Nazionale delle Ricerche, Sesto Fiorentino, 50019, Italy.
- Dipartimento di Biologia, Università degli Studi di Firenze, Sesto Fiorentino, 50019, Italy.
| | - Alessio Mengoni
- Dipartimento di Biologia, Università degli Studi di Firenze, Sesto Fiorentino, 50019, Italy.
| |
Collapse
|
22
|
Hoff K, Lemme M, Kahlert AK, Runde K, Audain E, Schuster D, Scheewe J, Attmann T, Pickardt T, Caliebe A, Siebert R, Kramer HH, Milting H, Hansen A, Ammerpohl O, Hitz MP. DNA methylation profiling allows for characterization of atrial and ventricular cardiac tissues and hiPSC-CMs. Clin Epigenetics 2019; 11:89. [PMID: 31186048 PMCID: PMC6560887 DOI: 10.1186/s13148-019-0679-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2018] [Accepted: 05/03/2019] [Indexed: 02/07/2023] Open
Abstract
Background Cardiac disease modelling using human-induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CM) requires thorough insight into cardiac cell type differentiation processes. However, current methods to discriminate different cardiac cell types are mostly time-consuming, are costly and often provide imprecise phenotypic evaluation. DNA methylation plays a critical role during early heart development and cardiac cellular specification. We therefore investigated the DNA methylation pattern in different cardiac tissues to identify CpG loci for further cardiac cell type characterization. Results An array-based genome-wide DNA methylation analysis using Illumina Infinium HumanMethylation450 BeadChips led to the identification of 168 differentially methylated CpG loci in atrial and ventricular human heart tissue samples (n = 49) from different patients with congenital heart defects (CHD). Systematic evaluation of atrial-ventricular DNA methylation pattern in cardiac tissues in an independent sample cohort of non-failing donor hearts and cardiac patients using bisulfite pyrosequencing helped us to define a subset of 16 differentially methylated CpG loci enabling precise characterization of human atrial and ventricular cardiac tissue samples. This defined set of reproducible cardiac tissue-specific DNA methylation sites allowed us to consistently detect the cellular identity of hiPSC-CM subtypes. Conclusion Testing DNA methylation of only a small set of defined CpG sites thus makes it possible to distinguish atrial and ventricular cardiac tissues and cardiac atrial and ventricular subtypes of hiPSC-CMs. This method represents a rapid and reliable system for phenotypic characterization of in vitro-generated cardiomyocytes and opens new opportunities for cardiovascular research and patient-specific therapy. Electronic supplementary material The online version of this article (10.1186/s13148-019-0679-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kirstin Hoff
- Department of Congenital Heart Disease and Pediatric Cardiology, University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany.,DZHK (German Centre for Cardiovascular Research), partner site Hamburg/Kiel/Lübeck, Hamburg, Germany
| | - Marta Lemme
- DZHK (German Centre for Cardiovascular Research), partner site Hamburg/Kiel/Lübeck, Hamburg, Germany.,Department of Experimental Pharmacology and Toxicology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Anne-Karin Kahlert
- Department of Congenital Heart Disease and Pediatric Cardiology, University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany.,DZHK (German Centre for Cardiovascular Research), partner site Hamburg/Kiel/Lübeck, Hamburg, Germany.,Institute for Clinical Genetics, Carl Gustav Carus Faculty of Medicine, Dresden, Germany
| | - Kerstin Runde
- Department of Congenital Heart Disease and Pediatric Cardiology, University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany
| | - Enrique Audain
- Department of Congenital Heart Disease and Pediatric Cardiology, University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany.,DZHK (German Centre for Cardiovascular Research), partner site Hamburg/Kiel/Lübeck, Hamburg, Germany
| | - Dorit Schuster
- Institute of Human Genetics, Christian-Albrechts-University Kiel & University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany
| | - Jens Scheewe
- Department of Congenital Heart Disease and Pediatric Cardiology, University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany
| | - Tim Attmann
- Department of Congenital Heart Disease and Pediatric Cardiology, University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany
| | - Thomas Pickardt
- National Register for Congenital Heart Defects, DZHK (German Centre for Cardiovascular Research), Berlin, Germany.,Competence Network for Congenital Heart Defects, DZHK (German Centre for Cardiovascular Research), Berlin, Germany
| | - Almuth Caliebe
- Institute of Human Genetics, Christian-Albrechts-University Kiel & University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany
| | - Reiner Siebert
- Institute of Human Genetics, University Hospital Ulm, Ulm, Germany
| | - Hans-Heiner Kramer
- Department of Congenital Heart Disease and Pediatric Cardiology, University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany.,DZHK (German Centre for Cardiovascular Research), partner site Hamburg/Kiel/Lübeck, Hamburg, Germany
| | - Hendrik Milting
- Erich and Hanna Klessmann Institute for Cardiovascular Research & Development (EHKI), Heart and Diabetes Center NRW, Ruhr University Bochum, Bad Oeynhausen, Germany
| | - Arne Hansen
- DZHK (German Centre for Cardiovascular Research), partner site Hamburg/Kiel/Lübeck, Hamburg, Germany.,Department of Experimental Pharmacology and Toxicology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Ole Ammerpohl
- Institute of Human Genetics, University Hospital Ulm, Ulm, Germany
| | - Marc-Phillip Hitz
- Department of Congenital Heart Disease and Pediatric Cardiology, University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany. .,DZHK (German Centre for Cardiovascular Research), partner site Hamburg/Kiel/Lübeck, Hamburg, Germany. .,Institute of Human Genetics, Christian-Albrechts-University Kiel & University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany. .,Wellcome Trust Sanger Institute, Cambridge, UK.
| |
Collapse
|
23
|
Chen Z, Omori Y, Koren S, Shirokiya T, Kuroda T, Miyamoto A, Wada H, Fujiyama A, Toyoda A, Zhang S, Wolfsberg TG, Kawakami K, Phillippy AM, NISC Comparative Sequencing Program, Mullikin JC, Burgess SM. De novo assembly of the goldfish ( Carassius auratus) genome and the evolution of genes after whole-genome duplication. SCIENCE ADVANCES 2019; 5:eaav0547. [PMID: 31249862 PMCID: PMC6594761 DOI: 10.1126/sciadv.aav0547] [Citation(s) in RCA: 123] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2018] [Accepted: 05/21/2019] [Indexed: 05/20/2023]
Abstract
For over a thousand years, the common goldfish (Carassius auratus) was raised throughout Asia for food and as an ornamental pet. As a very close relative of the common carp (Cyprinus carpio), goldfish share the recent genome duplication that occurred approximately 14 million years ago in their common ancestor. The combination of centuries of breeding and a wide array of interesting body morphologies provides an exciting opportunity to link genotype to phenotype and to understand the dynamics of genome evolution and speciation. We generated a high-quality draft sequence and gene annotations of a "Wakin" goldfish using 71X PacBio long reads. The two subgenomes in goldfish retained extensive synteny and collinearity between goldfish and zebrafish. However, genes were lost quickly after the carp whole-genome duplication, and the expression of 30% of the retained duplicated gene diverged substantially across seven tissues sampled. Loss of sequence identity and/or exons determined the divergence of the expression levels across all tissues, while loss of conserved noncoding elements determined expression variance between different tissues. This assembly provides an important resource for comparative genomics and understanding the causes of goldfish variants.
Collapse
Affiliation(s)
- Zelin Chen
- Translational and Functional Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Yoshihiro Omori
- Laboratory for Molecular and Developmental Biology, Institute for Protein Research, Osaka University, Suita, Osaka, Japan
| | - Sergey Koren
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Takuya Shirokiya
- Yatomi Station, Aichi Fisheries Research Institute, Yatomi, Aichi, Japan
| | - Takuo Kuroda
- Yatomi Station, Aichi Fisheries Research Institute, Yatomi, Aichi, Japan
| | - Atsushi Miyamoto
- Yatomi Station, Aichi Fisheries Research Institute, Yatomi, Aichi, Japan
| | - Hironori Wada
- Laboratory of Molecular and Developmental Biology, National Institute of Genetics, and Department of Genetics, SOKENDAI (The Graduate University for Advanced Studies), Mishima, Shizuoka, Japan
| | - Asao Fujiyama
- Advanced Genomics Center, National Institute of Genetics, Mishima, Shizuoka, Japan
| | - Atsushi Toyoda
- Advanced Genomics Center, National Institute of Genetics, Mishima, Shizuoka, Japan
- Center for Information Biology, National Institute of Genetics, Mishima, Shizuoka, Japan
| | - Suiyuan Zhang
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Tyra G. Wolfsberg
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Koichi Kawakami
- Laboratory of Molecular and Developmental Biology, National Institute of Genetics, and Department of Genetics, SOKENDAI (The Graduate University for Advanced Studies), Mishima, Shizuoka, Japan
| | - Adam M. Phillippy
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | | | - James C. Mullikin
- NIH Intramural Sequencing Center, National Human Genome Research Institute, Bethesda, MD, USA
- Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Shawn M. Burgess
- Translational and Functional Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
- Corresponding author.
| |
Collapse
|
24
|
Savel D, Koyutürk M. Characterizing human genomic coevolution in locus-gene regulatory interactions. BioData Min 2019; 12:8. [PMID: 30923571 PMCID: PMC6419833 DOI: 10.1186/s13040-019-0195-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Accepted: 02/19/2019] [Indexed: 11/10/2022] Open
Abstract
Background Coevolution has been used to identify and predict interactions and functional relationships between proteins of many different organisms including humans. Current efforts in annotating the human genome increasingly show that non-coding DNA sequence has important functional and regulatory interactions. Furthermore, regulatory elements do not necessarily reside in close proximity of the coding region for their target genes. Results We characterize coevolution as it appears in locus-gene interactions in the human genome, focusing on expression Quantitative Trait - Locus (eQTL) interactions. Our results show that in these interactions the conservation status of the loci is predictive of the conservation status of their target genes. Furthermore, comparing the phylogenetic histories of intra-chromosomal pairs of loci and transcription start sites, we find that pairs that appear coevolved are enriched for cis-eQTL interactions. Exploring this property we found that coevolution might be useful in prioritizing association tests in cis-eQTL detection. Conclusions The relationship between the conservation status of pairs of loci and protein coding transcription start sites reveal correlations with regulatory interactions. Pairs that appear coevolved are enriched for intra-chromosomal regulatory interactions, thus our results suggest that measures of coevolution can be useful for prediction and detection of new interactions. Measures of coevolution are genome-wide and could potentially be used to prioritize the detection of distant or inter-chromosomal interactions such as trans-eQTL interactions in the human genome.
Collapse
Affiliation(s)
- Daniel Savel
- 1Department of Electrical Engineering and Computer Science, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, 44106 OH USA
| | - Mehmet Koyutürk
- 1Department of Electrical Engineering and Computer Science, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, 44106 OH USA.,2Center for Proteomics and Bioinformatics, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, 44106 OH USA
| |
Collapse
|
25
|
You Z, Zhang Q, Liu C, Song J, Yang N, Lian L. Integrated analysis of lncRNA and mRNA repertoires in Marek's disease infected spleens identifies genes relevant to resistance. BMC Genomics 2019; 20:245. [PMID: 30922224 PMCID: PMC6438004 DOI: 10.1186/s12864-019-5625-1] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2018] [Accepted: 03/20/2019] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Marek's disease virus (MDV) is an oncogenic herpesvirus that can cause T-cell lymphomas in chicken. Long noncoding RNA (lncRNA) is strongly associated with various cancers and many other diseases. In chickens, lncRNAs have not been comprehensively identified. Here, we profiled mRNA and lncRNA repertoires in three groups of spleens from MDV-infected and non-infected chickens, including seven tumorous spleens (TS) from MDV-infected chickens, five spleens from the survivors (SS) without lesions after MDV infection, and five spleens from noninfected chickens (NS), to explore the underlying mechanism of host resistance in Marek's disease (MD). RESULTS By using a precise lncRNA identification pipeline, we identified 1315 putative lncRNAs and 1166 known lncRNAs in spleen tissue. Genomic features of putative lncRNAs were characterized. Differentially expressed (DE) mRNAs, putative lncRNAs, and known lncRNAs were profiled among three groups. We found that several specific intergroup differentially expressed genes were involved in important biological processes and pathways, including B cell activation and the Wnt signaling pathway; some of these genes were also found to be the hub genes in the co-expression network analyzed by WGCNA. Network analysis depicted both intergenic correlation and correlation between genes and MD traits. Five DE lncRNAs including MSTRG.360.1, MSTRG.6725.1, MSTRG.6754.1, MSTRG.15539.1, and MSTRG.7747.5 strongly correlated with MD-resistant candidate genes, such as IGF-I, CTLA4, HDAC9, SWAP70, CD72, JCHAIN, CXCL12, and CD8B, suggesting that lncRNAs may affect MD resistance and tumorigenesis in chicken spleens through their target genes. CONCLUSIONS Our results provide both transcriptomic and epigenetic insights on MD resistance and its pathological mechanism. The comprehensive lncRNA and mRNA transcriptomes in MDV-infected chicken spleens were profiled. Co-expression analysis identified integrated lncRNA-mRNA and gene-gene interaction networks, implying that hub genes or lncRNAs exert critical influence on MD resistance and tumorigenesis.
Collapse
Affiliation(s)
- Zhen You
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193 China
| | - Qinghe Zhang
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193 China
| | - Changjun Liu
- Division of Avian Infectious Diseases, Harbin Veterinary Research Institute of Chinese Academy of Agricultural Sciences, Harbin, 150001 China
| | - Jiuzhou Song
- Department of Animal & Avian Sciences, University of Maryland, College Park, MD 20742 USA
| | - Ning Yang
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193 China
| | - Ling Lian
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, 100193 China
| |
Collapse
|
26
|
Abstract
Whole-genome alignment (WGA) is the prediction of evolutionary relationships at the nucleotide level between two or more genomes. It combines aspects of both colinear sequence alignment and gene orthology prediction and is typically more challenging to address than either of these tasks due to the size and complexity of whole genomes. Despite the difficulty of this problem, numerous methods have been developed for its solution because WGAs are valuable for genome-wide analyses such as phylogenetic inference, genome annotation, and function prediction. In this chapter, we discuss the meaning and significance of WGA and present an overview of the methods that address it. We also examine the problem of evaluating whole-genome aligners and offer a set of methodological challenges that need to be tackled in order to make most effective use of our rapidly growing databases of whole genomes.
Collapse
Affiliation(s)
- Colin N Dewey
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
27
|
Hofstetter S, Seefried F, Häfliger IM, Jagannathan V, Leeb T, Drögemüller C. A non-coding regulatory variant in the 5'-region of the MITF gene is associated with white-spotted coat in Brown Swiss cattle. Anim Genet 2018; 50:27-32. [PMID: 30506810 DOI: 10.1111/age.12751] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/19/2018] [Indexed: 01/29/2023]
Abstract
Recently, the Swiss breeding association reported an increasing number of white-spotted cattle in the Brown Swiss breed, which is normally solid brown coloured. A total of 60 Brown Swiss cattle with variably sized white abdominal spots, facial markings and depigmented claws were collected for this study. A genome-wide association study using 40k SNP genotypes of 20 cases and 1619 controls enabled us to identify an associated genome region on chromosome 22 containing the MITF gene, encoding the melanogenesis associated transcription factor. Variants at the MITF locus have been reported before to be associated with white or white-spotted phenotypes in other species such as horses, dogs and mice. Whole-genome sequencing of a single white-spotted cow and subsequent genotyping of 172 Brown Swiss cattle revealed two significantly associated completely linked single nucleotide variants (rs722765315 and rs719139527). Both variants are located in the 5'-regulatory region of the bovine MITF gene, and comparative sequence analysis showed that the variant rs722765315, located 139 kb upstream of the transcription start site of the bovine melanocyte-specific MITF transcript, is situated in a multi-species conserved sequence element which is supposed to be regulatory important. Therefore, we hypothesize that rs722765315 represents the most likely causative variant for the white-spotting phenotype observed in Brown Swiss cattle. Presence of the mutant allele in a heterozygous or homozygous state supports a dominant mode of inheritance with incomplete penetrance and results in a variable extent of coat colour depigmentation.
Collapse
Affiliation(s)
- S Hofstetter
- Institute of Genetics, Vetsuisse Faculty, University of Bern, 3001, Bern, Switzerland
| | | | - I M Häfliger
- Institute of Genetics, Vetsuisse Faculty, University of Bern, 3001, Bern, Switzerland
| | - V Jagannathan
- Institute of Genetics, Vetsuisse Faculty, University of Bern, 3001, Bern, Switzerland
| | - T Leeb
- Institute of Genetics, Vetsuisse Faculty, University of Bern, 3001, Bern, Switzerland
| | - C Drögemüller
- Institute of Genetics, Vetsuisse Faculty, University of Bern, 3001, Bern, Switzerland
| |
Collapse
|
28
|
Song H, Lin K, Hu J, Pang E. An Updated Functional Annotation of Protein-Coding Genes in the Cucumber Genome. FRONTIERS IN PLANT SCIENCE 2018; 9:325. [PMID: 29599790 PMCID: PMC5863696 DOI: 10.3389/fpls.2018.00325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2017] [Accepted: 02/27/2018] [Indexed: 06/08/2023]
Abstract
Background: Although the cucumber reference genome and its annotation were published several years ago, the functional annotation of predicted genes, particularly protein-coding genes, still requires further improvement. In general, accurately determining orthologous relationships between genes allows for better and more robust functional assignments of predicted genes. As one of the most reliable strategies, the determination of collinearity information may facilitate reliable orthology inferences among genes from multiple related genomes. Currently, the identification of collinear segments has mainly been based on conservation of gene order and orientation. Over the course of plant genome evolution, various evolutionary events have disrupted or distorted the order of genes along chromosomes, making it difficult to use those genes as genome-wide markers for plant genome comparisons. Results: Using the localized LASTZ/MULTIZ analysis pipeline, we aligned 15 genomes, including cucumber and other related angiosperm plants, and identified a set of genomic segments that are short in length, stable in structure, uniform in distribution and highly conserved across all 15 plants. Compared with protein-coding genes, these conserved segments were more suitable for use as genomic markers for detecting collinear segments among distantly divergent plants. Guided by this set of identified collinear genomic segments, we inferred 94,486 orthologous protein-coding gene pairs (OPPs) between cucumber and 14 other angiosperm species, which were used as proxies for transferring functional terms to cucumber genes from the annotations of the other 14 genomes. In total, 10,885 protein-coding genes were assigned Gene Ontology (GO) terms which was nearly 1,300 more than results collected in Uniprot-proteomic database. Our results showed that annotation accuracy would been improved compared with other existing approaches. Conclusions: In this study, we provided an alternative resource for the functional annotation of predicted cucumber protein-coding genes, which we expect will be beneficial for the cucumber's biological study, accessible from http://cmb.bnu.edu.cn/functional_annotation. Meanwhile, using the cucumber reference genome as a case study, we presented an efficient strategy for transferring gene functional information from previously well-characterized protein-coding genes in model species to newly sequenced or "non-model" plant species.
Collapse
Affiliation(s)
- Hongtao Song
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, China
| | - Kui Lin
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, China
| | - Jinglu Hu
- Graduate School of Information, Production and Systems, Waseda University, Kitakyushu-shi, Japan
| | - Erli Pang
- MOE Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, China
| |
Collapse
|
29
|
Chen CK, Yu CP, Li SC, Wu SM, Lu MYJ, Chen YH, Chen DR, Ng CS, Ting CT, Li WH. Identification and evolutionary analysis of long non-coding RNAs in zebra finch. BMC Genomics 2017; 18:117. [PMID: 28143393 PMCID: PMC5282891 DOI: 10.1186/s12864-017-3506-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2016] [Accepted: 01/14/2017] [Indexed: 02/06/2023] Open
Abstract
Background Long non-coding RNAs (lncRNAs) are important in various biological processes, but very few studies on lncRNA have been conducted in birds. To identify IncRNAs expressed during feather development, we analyzed single-stranded RNA-seq (ssRNA-seq) data from the anterior and posterior dorsal regions during zebra finch (Taeniopygia guttata) embryonic development. Using published transcriptomic data, we further analyzed the evolutionary conservation of IncRNAs in birds and amniotes. Results A total of 1,081 lncRNAs, including 965 intergenic lncRNAs (lincRNAs), 59 intronic lncRNAs, and 57 antisense lncRNAs (lncNATs), were identified using our newly developed pipeline. These avian IncRNAs share similar characteristics with lncRNAs in mammals, such as shorter transcript length, lower exon number, lower average expression level and less sequence conservation than mRNAs. However, the proportion of lncRNAs overlapping with transposable elements in birds is much lower than that in mammals. We predicted the functions of IncRNAs based on the enriched functions of co-expressed protein-coding genes. Clusters of lncRNAs associated with natal down development were identified. The sequences and expression levels of candidate lncRNAs that shared conserved sequences among birds were validated by qPCR in both zebra finch and chicken. Finally, we identified three highly conserved lncRNAs that may be associated with natal down development. Conclusions Our study provides the first systematical identification of avian lncRNAs using ssRNA-seq analysis and offers a resource of embryonically expressed lncRNAs in zebra finch. We also predicted the biological function of identified lncRNAs. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3506-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Chih-Kuan Chen
- Institute of Ecology and Evolutionary Biology, National Taiwan University, Taipei, 10617, Taiwan.,Biodiversity Research Center, Academia Sinica, Taipei, 11529, Taiwan
| | - Chun-Ping Yu
- Biodiversity Research Center, Academia Sinica, Taipei, 11529, Taiwan
| | - Sung-Chou Li
- Department of Medical Research, Genomics and Proteomics Core Laboratory, Kaohsiung Chang Gung Memorial Hospital and Chang Gung University College of Medicine, Kaohsiung, 83301, Taiwan
| | - Siao-Man Wu
- Biodiversity Research Center, Academia Sinica, Taipei, 11529, Taiwan
| | - Mei-Yeh Jade Lu
- Biodiversity Research Center, Academia Sinica, Taipei, 11529, Taiwan
| | - Yi-Hua Chen
- Biodiversity Research Center, Academia Sinica, Taipei, 11529, Taiwan
| | - Di-Rong Chen
- Biodiversity Research Center, Academia Sinica, Taipei, 11529, Taiwan
| | - Chen Siang Ng
- Institute of Molecular and Cellular Biology & Department of Life Science, National Tsing Hua University, Hsinchu, 30013, Taiwan.
| | - Chau-Ti Ting
- Institute of Ecology and Evolutionary Biology, National Taiwan University, Taipei, 10617, Taiwan. .,Department of Life Science & Genome and Systems Biology Degree Program, National Taiwan University, Taipei, 10617, Taiwan. .,Research Center for Developmental Biology and Regenerative Medicine, National Taiwan University, Taipei, 10617, Taiwan.
| | - Wen-Hsiung Li
- Biodiversity Research Center, Academia Sinica, Taipei, 11529, Taiwan. .,Center for the Integrative and Evolutionary Galliformes Genomics (iEGG Center), National Chung Hsing University, Taichung, 40227, Taiwan. .,Department of Ecology and Evolution, University of Chicago, Chicago, IL, 60637, USA.
| |
Collapse
|
30
|
Clustering and evolutionary analysis of small RNAs identify regulatory siRNA clusters induced under drought stress in rice. BMC SYSTEMS BIOLOGY 2016; 10:115. [PMID: 28155667 PMCID: PMC5260113 DOI: 10.1186/s12918-016-0355-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Motivation Drought tolerance is an important trait related to growth and yield in crop. Until now, drought related research has focused on coding genes. However, non-coding RNAs also respond significantly to environmental stimuli such as drought stress. Unfortunately, characterizing the role of siRNAs under drought stress is difficult since a large number of heterogenous siRNA species are expressed under drought stress and non-coding RNAs have very weak evolutionary conservation. Thus, to characterize the role of siRNAs, we need a well designed biological and bioinformatics strategy. In this paper, to characterize the function of siRNAs we developed and used a bioinformatics pipeline that includes a genomic-location based clustering technique and an evolutionary conservation tool. Results By comparing the wild type Nipponbare and two drought resistant rice varities, we found that 21 nt and 24 nt siRNAs are significantly expressed in the three rice plants but at different time points under a short-term (0, 1, and 6 hrs) drought treatment. siRNAs were up-regulated in the wild type at an early stage while the up-regulation was delayed in the two drought tolerant plants. Genes targeted by up-regulated siRNAs were related to oxidation reduction and proteolysis, which are well known to be associated with water deficit phenotypes. More interestingly, we found that siRNAs were located in intronic regions as clusters and were of high evolutionary conservation among monocot grass plants. In summary, we show that siRNAs are important respondents to drought stress and regulate genes related to the drought tolerance in water deficit conditions. Electronic supplementary material The online version of this article (doi:10.1186/s12918-016-0355-3) contains supplementary material, which is available to authorized users.
Collapse
|
31
|
Hoffmann TJ, Keats BJ, Yoshikawa N, Schaefer C, Risch N, Lustig LR. A Large Genome-Wide Association Study of Age-Related Hearing Impairment Using Electronic Health Records. PLoS Genet 2016; 12:e1006371. [PMID: 27764096 PMCID: PMC5072625 DOI: 10.1371/journal.pgen.1006371] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2016] [Accepted: 09/16/2016] [Indexed: 01/22/2023] Open
Abstract
Age-related hearing impairment (ARHI), one of the most common sensory disorders, can be mitigated, but not cured or eliminated. To identify genetic influences underlying ARHI, we conducted a genome-wide association study of ARHI in 6,527 cases and 45,882 controls among the non-Hispanic whites from the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort. We identified two novel genome-wide significant SNPs: rs4932196 (odds ratio = 1.185, p = 4.0x10-11), 52Kb 3’ of ISG20, which replicated in a meta-analysis of the other GERA race/ethnicity groups (1,025 cases, 12,388 controls, p = 0.00094) and in a UK Biobank case-control analysis (30,802 self-reported cases, 78,586 controls, p = 0.015); and rs58389158 (odds ratio = 1.132, p = 1.8x10-9), which replicated in the UK Biobank (p = 0.00021). The latter SNP lies just outside exon 8 and is highly correlated (r2 = 0.96) with the missense SNP rs5756795 in exon 7 of TRIOBP, a gene previously associated with prelingual nonsyndromic hearing loss. We further tested these SNPs in phenotypes from audiologist notes available on a subset of GERA (4,903 individuals), stratified by case/control status, to construct an independent replication test, and found a significant effect of rs58389158 on speech reception threshold (SRT; overall GERA meta-analysis p = 1.9x10-6). We also tested variants within exons of 132 other previously-identified hearing loss genes, and identified two common additional significant SNPs: rs2877561 (synonymous change in ILDR1, p = 6.2x10-5), which replicated in the UK Biobank (p = 0.00057), and had a significant GERA SRT (p = 0.00019) and speech discrimination score (SDS; p = 0.0019); and rs9493627 (missense change in EYA4, p = 0.00011) which replicated in the UK Biobank (p = 0.0095), other GERA groups (p = 0.0080), and had a consistent significant result for SRT (p = 0.041) and suggestive result for SDS (p = 0.081). Large cohorts with GWAS data and electronic health records may be a useful method to characterize the genetic architecture of ARHI. Age-related hearing impairment (ARHI) is one of the most common sensory disorders. While ARHI effects can be mitigated with current technologies, it cannot be cured or eliminated. It is thus hoped that identification of genetic influences on ARHI may one day lead to curative therapies. Towards this goal, the current study utilized electronic health record data from non-Hispanic whites in the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort to conduct a genome-wide association study of ARHI, and tested the significant variants for replication in other GERA race/ethnicity groups, independent GERA phenotypes, and self-reported ARHI from the UK Biobank. We discovered two genome-wide significant SNPs. The first was novel and near ISG20. The second was in TRIOBP, a gene previously associated with prelingual nonsyndromic hearing loss. Motivated by our TRIOBP results, we also looked at exons in known hearing loss genes, and identified two additional SNPs, rs2877561 in ILDR1 and rs9493672 in EYA4 (at a significance threshold adjusted for number of SNPs in those regions). These results suggest that large cohorts with GWAS data and electronic health records may be a useful method to characterize the genetic architecture of ARHI.
Collapse
Affiliation(s)
- Thomas J. Hoffmann
- Department of Epidemiology and Biostatistics, and Institute for Human Genetics, University of California San Francisco, San Francisco, United States of America
- * E-mail:
| | - Bronya J. Keats
- Department of Genetics, Louisiana State University Health Sciences Center, New Orleans, United States of America
| | - Noriko Yoshikawa
- Department of Head and Neck Surgery, Kaiser Permanente Medical Center, Oakland, United States of America
| | - Catherine Schaefer
- Kaiser Permanente Northern California Division of Research, Oakland, United States of America
| | - Neil Risch
- Department of Epidemiology and Biostatistics, and Institute for Human Genetics, University of California San Francisco, San Francisco, United States of America
- Kaiser Permanente Northern California Division of Research, Oakland, United States of America
| | - Lawrence R. Lustig
- Department of Otolaryngology—Head and Neck Surgery, Columbia University Medical Center, Columbia, United States of America
- New York Presbyterian Hospital, New York, United States of America
| |
Collapse
|
32
|
Binet M, Gascuel O, Scornavacca C, Douzery EJP, Pardi F. Fast and accurate branch lengths estimation for phylogenomic trees. BMC Bioinformatics 2016; 17:23. [PMID: 26744021 PMCID: PMC4705742 DOI: 10.1186/s12859-015-0821-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2015] [Accepted: 11/02/2015] [Indexed: 01/26/2023] Open
Abstract
Background Branch lengths are an important attribute of phylogenetic trees, providing essential information for many studies in evolutionary biology. Yet, part of the current methodology to reconstruct a phylogeny from genomic information — namely supertree methods — focuses on the topology or structure of the phylogenetic tree, rather than the evolutionary divergences associated to it. Moreover, accurate methods to estimate branch lengths — typically based on probabilistic analysis of a concatenated alignment — are limited by large demands in memory and computing time, and may become impractical when the data sets are too large. Results Here, we present a novel phylogenomic distance-based method, named ERaBLE (Evolutionary Rates and Branch Length Estimation), to estimate the branch lengths of a given reference topology, and the relative evolutionary rates of the genes employed in the analysis. ERaBLE uses as input data a potentially very large collection of distance matrices, where each matrix is obtained from a different genomic region — either directly from its sequence alignment, or indirectly from a gene tree inferred from the alignment. Our experiments show that ERaBLE is very fast and fairly accurate when compared to other possible approaches for the same tasks. Specifically, it efficiently and accurately deals with large data sets, such as the OrthoMaM v8 database, composed of 6,953 exons from up to 40 mammals. Conclusions ERaBLE may be used as a complement to supertree methods — or it may provide an efficient alternative to maximum likelihood analysis of concatenated alignments — to estimate branch lengths from phylogenomic data sets. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0821-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Manuel Binet
- Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), CNRS, Université de Montpellier, Montpellier, France. .,Institut de Biologie Computationnelle, Montpellier, France. .,Institut des Sciences de l'Evolution de Montpellier, CNRS, IRD, EPHE, Université de Montpellier, France.
| | - Olivier Gascuel
- Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), CNRS, Université de Montpellier, Montpellier, France. .,Institut de Biologie Computationnelle, Montpellier, France.
| | - Celine Scornavacca
- Institut de Biologie Computationnelle, Montpellier, France. .,Institut des Sciences de l'Evolution de Montpellier, CNRS, IRD, EPHE, Université de Montpellier, France.
| | - Emmanuel J P Douzery
- Institut des Sciences de l'Evolution de Montpellier, CNRS, IRD, EPHE, Université de Montpellier, France.
| | - Fabio Pardi
- Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), CNRS, Université de Montpellier, Montpellier, France. .,Institut de Biologie Computationnelle, Montpellier, France.
| |
Collapse
|
33
|
Bahamonde MI, Serra SA, Drechsel O, Rahman R, Marcé-Grau A, Prieto M, Ossowski S, Macaya A, Fernández-Fernández JM. A Single Amino Acid Deletion (ΔF1502) in the S6 Segment of CaV2.1 Domain III Associated with Congenital Ataxia Increases Channel Activity and Promotes Ca2+ Influx. PLoS One 2015; 10:e0146035. [PMID: 26716990 PMCID: PMC4696675 DOI: 10.1371/journal.pone.0146035] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2015] [Accepted: 12/11/2015] [Indexed: 02/07/2023] Open
Abstract
Mutations in the CACNA1A gene, encoding the pore-forming CaV2.1 (P/Q-type) channel α1A subunit, result in heterogeneous human neurological disorders, including familial and sporadic hemiplegic migraine along with episodic and progressive forms of ataxia. Hemiplegic Migraine (HM) mutations induce gain-of-channel function, mainly by shifting channel activation to lower voltages, whereas ataxia mutations mostly produce loss-of-channel function. However, some HM-linked gain-of-function mutations are also associated to congenital ataxia and/or cerebellar atrophy, including the deletion of a highly conserved phenylalanine located at the S6 pore region of α1A domain III (ΔF1502). Functional studies of ΔF1502 CaV2.1 channels, expressed in Xenopus oocytes, using the non-physiological Ba2+ as the charge carrier have only revealed discrete alterations in channel function of unclear pathophysiological relevance. Here, we report a second case of congenital ataxia linked to the ΔF1502 α1A mutation, detected by whole-exome sequencing, and analyze its functional consequences on CaV2.1 human channels heterologously expressed in mammalian tsA-201 HEK cells, using the physiological permeant ion Ca2+. ΔF1502 strongly decreases the voltage threshold for channel activation (by ~ 21 mV), allowing significantly higher Ca2+ current densities in a range of depolarized voltages with physiological relevance in neurons, even though maximal Ca2+ current density through ΔF1502 CaV2.1 channels is 60% lower than through wild-type channels. ΔF1502 accelerates activation kinetics and slows deactivation kinetics of CaV2.1 within a wide range of voltage depolarization. ΔF1502 also slowed CaV2.1 inactivation kinetic and shifted the inactivation curve to hyperpolarized potentials (by ~ 28 mV). ΔF1502 effects on CaV2.1 activation and deactivation properties seem to be of high physiological relevance. Thus, ΔF1502 strongly promotes Ca2+ influx in response to either single or trains of action potential-like waveforms of different durations. Our observations support a causative role of gain-of-function CaV2.1 mutations in congenital ataxia, a neurodevelopmental disorder at the severe-most end of CACNA1A-associated phenotypic spectrum.
Collapse
Affiliation(s)
- Maria Isabel Bahamonde
- Laboratori de Fisiologia Molecular i Canalopaties, Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Spain
| | - Selma Angèlica Serra
- Laboratori de Fisiologia Molecular i Canalopaties, Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Spain
| | - Oliver Drechsel
- Genomic and Epigenomic Variation in Disease Group, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Rubayte Rahman
- Genomic and Epigenomic Variation in Disease Group, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Anna Marcé-Grau
- Pediatric Neurology Research Group, Vall d’Hebron Research Institute, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Marta Prieto
- Laboratori de Fisiologia Molecular i Canalopaties, Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Spain
| | - Stephan Ossowski
- Genomic and Epigenomic Variation in Disease Group, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Alfons Macaya
- Pediatric Neurology Research Group, Vall d’Hebron Research Institute, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - José M. Fernández-Fernández
- Laboratori de Fisiologia Molecular i Canalopaties, Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Spain
- * E-mail:
| |
Collapse
|
34
|
Dillman AR, Macchietto M, Porter CF, Rogers A, Williams B, Antoshechkin I, Lee MM, Goodwin Z, Lu X, Lewis EE, Goodrich-Blair H, Stock SP, Adams BJ, Sternberg PW, Mortazavi A. Comparative genomics of Steinernema reveals deeply conserved gene regulatory networks. Genome Biol 2015; 16:200. [PMID: 26392177 PMCID: PMC4578762 DOI: 10.1186/s13059-015-0746-6] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Accepted: 08/10/2015] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Parasitism is a major ecological niche for a variety of nematodes. Multiple nematode lineages have specialized as pathogens, including deadly parasites of insects that are used in biological control. We have sequenced and analyzed the draft genomes and transcriptomes of the entomopathogenic nematode Steinernema carpocapsae and four congeners (S. scapterisci, S. monticolum, S. feltiae, and S. glaseri). RESULTS We used these genomes to establish phylogenetic relationships, explore gene conservation across species, and identify genes uniquely expanded in insect parasites. Protein domain analysis in Steinernema revealed a striking expansion of numerous putative parasitism genes, including certain protease and protease inhibitor families, as well as fatty acid- and retinol-binding proteins. Stage-specific gene expression of some of these expanded families further supports the notion that they are involved in insect parasitism by Steinernema. We show that sets of novel conserved non-coding regulatory motifs are associated with orthologous genes in Steinernema and Caenorhabditis. CONCLUSIONS We have identified a set of expanded gene families that are likely to be involved in parasitism. We have also identified a set of non-coding motifs associated with groups of orthologous genes in Steinernema and Caenorhabditis involved in neurogenesis and embryonic development that are likely part of conserved protein-DNA relationships shared between these two genera.
Collapse
Affiliation(s)
- Adler R Dillman
- Department of Nematology, University of California, Riverside, CA, 92521, USA.
| | - Marissa Macchietto
- Department of Developmental and Cell Biology, University of California, Irvine, CA, 92697, USA.
| | - Camille F Porter
- Department of Biology and Evolutionary Ecology Laboratories, Brigham Young University, Provo, UT, 84602, USA.
| | - Alicia Rogers
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA.
| | - Brian Williams
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA.
| | - Igor Antoshechkin
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA.
| | - Ming-Min Lee
- Department of Entomology, University of Arizona, Tucson, AZ, 85721, USA.
| | - Zane Goodwin
- Division of Biology and Biomedical Sciences, Washington University, St Louis, MO, 63110, USA.
| | - Xiaojun Lu
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, 53706, USA.
| | - Edwin E Lewis
- Department of Entomology and Nematology, University of California, Davis, CA, 95616, USA.
| | - Heidi Goodrich-Blair
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, 53706, USA.
| | - S Patricia Stock
- Department of Entomology, University of Arizona, Tucson, AZ, 85721, USA.
| | - Byron J Adams
- Department of Biology and Evolutionary Ecology Laboratories, Brigham Young University, Provo, UT, 84602, USA.
| | - Paul W Sternberg
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, 91125, USA.
- Howard Hughes Medical Institute, Pasadena, CA, 91125, USA.
| | - Ali Mortazavi
- Department of Developmental and Cell Biology, University of California, Irvine, CA, 92697, USA.
| |
Collapse
|
35
|
Thompson D, Regev A, Roy S. Comparative analysis of gene regulatory networks: from network reconstruction to evolution. Annu Rev Cell Dev Biol 2015; 31:399-428. [PMID: 26355593 DOI: 10.1146/annurev-cellbio-100913-012908] [Citation(s) in RCA: 95] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Regulation of gene expression is central to many biological processes. Although reconstruction of regulatory circuits from genomic data alone is therefore desirable, this remains a major computational challenge. Comparative approaches that examine the conservation and divergence of circuits and their components across strains and species can help reconstruct circuits as well as provide insights into the evolution of gene regulatory processes and their adaptive contribution. In recent years, advances in genomic and computational tools have led to a wealth of methods for such analysis at the sequence, expression, pathway, module, and entire network level. Here, we review computational methods developed to study transcriptional regulatory networks using comparative genomics, from sequence to functional data. We highlight how these methods use evolutionary conservation and divergence to reliably detect regulatory components as well as estimate the extent and rate of divergence. Finally, we discuss the promise and open challenges in linking regulatory divergence to phenotypic divergence and adaptation.
Collapse
Affiliation(s)
- Dawn Thompson
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142
| | | | | |
Collapse
|
36
|
Trends in genome dynamics among major orders of insects revealed through variations in protein families. BMC Genomics 2015; 16:583. [PMID: 26251035 PMCID: PMC4528696 DOI: 10.1186/s12864-015-1771-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2014] [Accepted: 07/13/2015] [Indexed: 01/22/2023] Open
Abstract
Background Insects belong to a class that accounts for the majority of animals on earth. With over one million identified species, insects display a huge diversity and occupy extreme environments. At present, there are dozens of fully sequenced insect genomes that cover a range of habitats, social behavior and morphologies. In view of such diverse collection of genomes, revealing evolutionary trends and charting functional relationships of proteins remain challenging. Results We analyzed the relatedness of 17 complete proteomes representative of proteomes from insects including louse, bee, beetle, ants, flies and mosquitoes, as well as an out-group from the crustaceans. The analyzed proteomes mostly represented the orders of Hymenoptera and Diptera. The 287,405 protein sequences from the 18 proteomes were automatically clustered into 20,933 families, including 799 singletons. A comprehensive analysis based on statistical considerations identified the families that were significantly expanded or reduced in any of the studied organisms. Among all the tested species, ants are characterized by an exceptionally high rate of family gain and loss. By assigning annotations to hundreds of species-specific families, the functional diversity among species and between the major clades (Diptera and Hymenoptera) is revealed. We found that many species-specific families are associated with receptor signaling, stress-related functions and proteases. The highest variability among insects associates with the function of transposition and nucleic acids processes (collectively coined TNAP). Specifically, the wasp and ants have an order of magnitude more TNAP families and proteins relative to species that belong to Diptera (mosquitoes and flies). Conclusions An unsupervised clustering methodology combined with a comparative functional analysis unveiled proteomic signatures in the major clades of winged insects. We propose that the expansion of TNAP families in Hymenoptera potentially contributes to the accelerated genome dynamics that characterize the wasp and ants. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1771-2) contains supplementary material, which is available to authorized users.
Collapse
|
37
|
Chatagnon A, Veber P, Morin V, Bedo J, Triqueneaux G, Sémon M, Laudet V, d'Alché-Buc F, Benoit G. RAR/RXR binding dynamics distinguish pluripotency from differentiation associated cis-regulatory elements. Nucleic Acids Res 2015; 43:4833-54. [PMID: 25897113 PMCID: PMC4446430 DOI: 10.1093/nar/gkv370] [Citation(s) in RCA: 63] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2014] [Revised: 03/09/2015] [Accepted: 04/08/2015] [Indexed: 12/15/2022] Open
Abstract
In mouse embryonic cells, ligand-activated retinoic acid receptors (RARs) play a key role in inhibiting pluripotency-maintaining genes and activating some major actors of cell differentiation. To investigate the mechanism underlying this dual regulation, we performed joint RAR/RXR ChIP-seq and mRNA-seq time series during the first 48 h of the RA-induced Primitive Endoderm (PrE) differentiation process in F9 embryonal carcinoma (EC) cells. We show here that this dual regulation is associated with RAR/RXR genomic redistribution during the differentiation process. In-depth analysis of RAR/RXR binding sites occupancy dynamics and composition show that in undifferentiated cells, RAR/RXR interact with genomic regions characterized by binding of pluripotency-associated factors and high prevalence of the non-canonical DR0-containing RA response element. By contrast, in differentiated cells, RAR/RXR bound regions are enriched in functional Sox17 binding sites and are characterized with a higher frequency of the canonical DR5 motif. Our data offer an unprecedentedly detailed view on the action of RA in triggering pluripotent cell differentiation and demonstrate that RAR/RXR action is mediated via two different sets of regulatory regions tightly associated with cell differentiation status.
Collapse
Affiliation(s)
- Amandine Chatagnon
- Université de Lyon, Université Claude Bernard Lyon1, CGphiMC UMR CNRS 5534, 69622 Villeurbanne, France
| | - Philippe Veber
- Université de Lyon, Université Claude Bernard Lyon1, LBBE UMR CNRS 5558, 69622 Villeurbanne, France
| | - Valérie Morin
- Université de Lyon, Université Claude Bernard Lyon1, CGphiMC UMR CNRS 5534, 69622 Villeurbanne, France
| | - Justin Bedo
- Université d'Evry-Val d'Essonne, IBISC EA 4526, 91037 Evry, France
| | - Gérard Triqueneaux
- Université de Lyon, Université Claude Bernard Lyon1, CGphiMC UMR CNRS 5534, 69622 Villeurbanne, France
| | - Marie Sémon
- IGFL, Université de Lyon, Université Lyon 1, CNRS, INRA; Ecole Normale Supérieure de Lyon, 69007 Lyon, France
| | - Vincent Laudet
- IGFL, Université de Lyon, Université Lyon 1, CNRS, INRA; Ecole Normale Supérieure de Lyon, 69007 Lyon, France
| | | | - Gérard Benoit
- Université de Lyon, Université Claude Bernard Lyon1, CGphiMC UMR CNRS 5534, 69622 Villeurbanne, France
| |
Collapse
|
38
|
Tetreault M, Bareke E, Nadaf J, Alirezaie N, Majewski J. Whole-exome sequencing as a diagnostic tool: current challenges and future opportunities. Expert Rev Mol Diagn 2015; 15:749-60. [PMID: 25959410 DOI: 10.1586/14737159.2015.1039516] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Whole-exome sequencing (WES) represents a significant breakthrough in the field of human genetics. This technology has largely contributed to the identification of new disease-causing genes and is now entering clinical laboratories. WES represents a powerful tool for diagnosis and could reduce the 'diagnostic odyssey' for many patients. In this review, we present a technical overview of WES analysis, variants annotation and interpretation in a clinical setting. We evaluate the usefulness of clinical WES in different clinical indications, such as rare diseases, cancer and complex diseases. Finally, we discuss the efficacy of WES as a diagnostic tool and the impact on patient management.
Collapse
Affiliation(s)
- Martine Tetreault
- Department of Human Genetics, McGill University, Montreal, QC H3A 1B1, Canada
| | | | | | | | | |
Collapse
|
39
|
Naval-Sánchez M, Potier D, Hulselmans G, Christiaens V, Aerts S. Identification of Lineage-Specific Cis-Regulatory Modules Associated with Variation in Transcription Factor Binding and Chromatin Activity Using Ornstein-Uhlenbeck Models. Mol Biol Evol 2015; 32:2441-55. [PMID: 25944915 PMCID: PMC4540964 DOI: 10.1093/molbev/msv107] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Scoring the impact of noncoding variation on the function of cis-regulatory regions, on their chromatin state, and on the qualitative and quantitative expression levels of target genes is a fundamental problem in evolutionary genomics. A particular challenge is how to model the divergence of quantitative traits and to identify relationships between the changes across the different levels of the genome, the chromatin activity landscape, and the transcriptome. Here, we examine the use of the Ornstein-Uhlenbeck (OU) model to infer selection at the level of predicted cis-regulatory modules (CRMs), and link these with changes in transcription factor binding and chromatin activity. Using publicly available cross-species ChIP-Seq and STARR-Seq data we show how OU can be applied genome-wide to identify candidate transcription factors for which binding site and CRM turnover is correlated with changes in regulatory activity. Next, we profile open chromatin in the developing eye across three Drosophila species. We identify the recognition motifs of the chromatin remodelers, Trithorax-like and Grainyhead as mostly correlating with species-specific changes in open chromatin. In conclusion, we show in this study that CRM scores can be used as quantitative traits and that motif discovery approaches can be extended towards more complex models of divergence.
Collapse
Affiliation(s)
- Marina Naval-Sánchez
- Laboratory of Computational Biology, Department of Human Genetics, University of Leuven, Leuven, Belgium
| | - Delphine Potier
- Laboratory of Computational Biology, Department of Human Genetics, University of Leuven, Leuven, Belgium
| | - Gert Hulselmans
- Laboratory of Computational Biology, Department of Human Genetics, University of Leuven, Leuven, Belgium
| | - Valerie Christiaens
- Laboratory of Computational Biology, Department of Human Genetics, University of Leuven, Leuven, Belgium
| | - Stein Aerts
- Laboratory of Computational Biology, Department of Human Genetics, University of Leuven, Leuven, Belgium
| |
Collapse
|
40
|
Gulko B, Hubisz MJ, Gronau I, Siepel A. A method for calculating probabilities of fitness consequences for point mutations across the human genome. Nat Genet 2015; 47:276-83. [PMID: 25599402 PMCID: PMC4342276 DOI: 10.1038/ng.3196] [Citation(s) in RCA: 182] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2014] [Accepted: 12/19/2014] [Indexed: 12/17/2022]
Abstract
We describe a novel computational method for estimating the probability that a point mutation at each position in a genome will influence fitness. These fitness consequence (fitCons) scores serve as evolution-based measures of potential genomic function. Our approach is to cluster genomic positions into groups exhibiting distinct “fingerprints” based on high-throughput functional genomic data, then to estimate a probability of fitness consequences for each group from associated patterns of genetic polymorphism and divergence. We have generated fitCons scores for three human cell types based on public data from ENCODE. Compared with conventional conservation scores, fitCons scores show considerably improved prediction power for cis-regulatory elements. In addition, fitCons scores indicate that 4.2–7.5% of nucleotides in the human genome have influenced fitness since the human-chimpanzee divergence, and they suggest that recent evolutionary turnover has had limited impact on the functional content of the genome.
Collapse
Affiliation(s)
- Brad Gulko
- Graduate Field of Computer Science, Cornell University, Ithaca, New York, USA
| | - Melissa J Hubisz
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
| | - Ilan Gronau
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
| | - Adam Siepel
- 1] Graduate Field of Computer Science, Cornell University, Ithaca, New York, USA. [2] Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, USA
| |
Collapse
|
41
|
Taher L, Narlikar L, Ovcharenko I. Identification and computational analysis of gene regulatory elements. Cold Spring Harb Protoc 2015; 2015:pdb.top083642. [PMID: 25561628 PMCID: PMC5885252 DOI: 10.1101/pdb.top083642] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Over the last two decades, advances in experimental and computational technologies have greatly facilitated genomic research. Next-generation sequencing technologies have made de novo sequencing of large genomes affordable, and powerful computational approaches have enabled accurate annotations of genomic DNA sequences. Charting functional regions in genomes must account for not only the coding sequences, but also noncoding RNAs, repetitive elements, chromatin states, epigenetic modifications, and gene regulatory elements. A mix of comparative genomics, high-throughput biological experiments, and machine learning approaches has played a major role in this truly global effort. Here we describe some of these approaches and provide an account of our current understanding of the complex landscape of the human genome. We also present overviews of different publicly available, large-scale experimental data sets and computational tools, which we hope will prove beneficial for researchers working with large and complex genomes.
Collapse
Affiliation(s)
- Leila Taher
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894
- Institute for Biostatistics and Informatics in Medicine and Ageing Research, University of Rostock, 18051 Rostock, Germany
| | - Leelavati Narlikar
- Chemical Engineering and Process Development Division, National Chemical Laboratory, CSIR, Pune 411008, India
| | - Ivan Ovcharenko
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894
| |
Collapse
|
42
|
Modolo L, Picard F, Lerat E. A new genome-wide method to track horizontally transferred sequences: application to Drosophila. Genome Biol Evol 2015; 6:416-32. [PMID: 24497602 PMCID: PMC3942030 DOI: 10.1093/gbe/evu026] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Because of methodological breakthroughs and the availability of an increasing amount of whole-genome sequence data, horizontal transfers (HTs) in eukaryotes have received much attention recently. Contrary to similar analyses in prokaryotes, most studies in eukaryotes usually investigate particular sequences corresponding to transposable elements (TEs), neglecting the other components of the genome. We present a new methodological framework for the genome-wide detection of all putative horizontally transferred sequences between two species that requires no prior knowledge of the transferred sequences. This method provides a broader picture of HTs in eukaryotes by fully exploiting complete-genome sequence data. In contrast to previous genome-wide approaches, we used a well-defined statistical framework to control for the number of false positives in the results, and we propose two new validation procedures to control for confounding factors. The first validation procedure relies on a comparative analysis with other species of the phylogeny to validate HTs for the nonrepeated sequences detected, whereas the second one built upon the study of the dynamics of the detected TEs. We applied our method to two closely related Drosophila species, Drosophila melanogaster and D. simulans, in which we discovered 10 new HTs in addition to all the HTs previously detected in different studies, which underscores our method’s high sensitivity and specificity. Our results favor the hypothesis of multiple independent HTs of TEs while unraveling a small portion of the network of HTs in the Drosophila phylogeny.
Collapse
Affiliation(s)
- Laurent Modolo
- Université de Lyon, France, Université Lyon 1, CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, VIlleurbanne, France
| | | | | |
Collapse
|
43
|
Lim E, Liu Y, Chan Y, Tiinamaija T, Käräjämäki A, Madsen E, Altshuler D, Raychaudhuri S, Groop L, Flannick J, Hirschhorn J, Katsanis N, Daly M, Daly MJ. A novel test for recessive contributions to complex diseases implicates Bardet-Biedl syndrome gene BBS10 in idiopathic type 2 diabetes and obesity. Am J Hum Genet 2014; 95:509-20. [PMID: 25439097 DOI: 10.1016/j.ajhg.2014.09.015] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2014] [Accepted: 09/22/2014] [Indexed: 12/22/2022] Open
Abstract
Rare-variant association studies in common, complex diseases are customarily conducted under an additive risk model in both single-variant and burden testing. Here, we describe a method to improve detection of rare recessive variants in complex diseases termed RAFT (recessive-allele-frequency-based test). We found that RAFT outperforms existing approaches when the variant influences disease risk in a recessive manner on simulated data. We then applied our method to 1,791 Finnish individuals with type 2 diabetes (T2D) and 2,657 matched control subjects. In BBS10, we discovered a rare variant (c.1189A>G [p.Ile397Val]; rs202042386) that confers risk of T2D in a recessive state (p = 1.38 × 10(-6)) and would be missed by conventional methods. Testing of this variant in an established in vivo zebrafish model confirmed the variant to be pathogenic. Taken together, these data suggest that RAFT can effectively reveal rare recessive contributions to complex diseases overlooked by conventional association tests.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | - Mark J Daly
- Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA 02114, USA.
| |
Collapse
|
44
|
Yokoyama KD, Zhang Y, Ma J. Tracing the evolution of lineage-specific transcription factor binding sites in a birth-death framework. PLoS Comput Biol 2014; 10:e1003771. [PMID: 25144359 PMCID: PMC4140645 DOI: 10.1371/journal.pcbi.1003771] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Accepted: 06/27/2014] [Indexed: 11/24/2022] Open
Abstract
Changes in cis-regulatory element composition that result in novel patterns of gene expression are thought to be a major contributor to the evolution of lineage-specific traits. Although transcription factor binding events show substantial variation across species, most computational approaches to study regulatory elements focus primarily upon highly conserved sites, and rely heavily upon multiple sequence alignments. However, sequence conservation based approaches have limited ability to detect lineage-specific elements that could contribute to species-specific traits. In this paper, we describe a novel framework that utilizes a birth-death model to trace the evolution of lineage-specific binding sites without relying on detailed base-by-base cross-species alignments. Our model was applied to analyze the evolution of binding sites based on the ChIP-seq data for six transcription factors (GATA1, SOX2, CTCF, MYC, MAX, ETS1) along the lineage toward human after human-mouse common ancestor. We estimate that a substantial fraction of binding sites (∼58–79% for each factor) in humans have origins since the divergence with mouse. Over 15% of all binding sites are unique to hominids. Such elements are often enriched near genes associated with specific pathways, and harbor more common SNPs than older binding sites in the human genome. These results support the ability of our method to identify lineage-specific regulatory elements and help understand their roles in shaping variation in gene regulation across species. Recent experimental studies showed that the evolution of transcription factor binding sites (TFBS) is highly dynamic, with sites differing a great deal even between closely related mammalian species. Despite the substantial experimental evidence for rapid divergence of regulatory protein-binding events across species, computational methods designed to analyze regulatory elements evolution have focused primarily on phylogenetic footprinting approaches, in which putative functional regulatory elements are identified according to strong sequence conservation. Cross-species comparisons of non-coding sequences are limited in their ability to fully understand the evolution of regulatory sequences, particularly in cases where the elements are selected for novelty or species-specific. We have developed a novel framework to reconstruct the history of lineage-specific TFBS and showed that large amount of TFBS in human were born after human-mouse divergence. These elements also have distinct biological implications as compared to more ancient ones. This method can help understand the roles of lineage-specific TFBS in shaping gene regulation across different species.
Collapse
Affiliation(s)
- Ken Daigoro Yokoyama
- Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Yang Zhang
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Jian Ma
- Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- * E-mail:
| |
Collapse
|
45
|
Macossay-Castillo M, Kosol S, Tompa P, Pancsa R. Synonymous constraint elements show a tendency to encode intrinsically disordered protein segments. PLoS Comput Biol 2014; 10:e1003607. [PMID: 24809503 PMCID: PMC4014394 DOI: 10.1371/journal.pcbi.1003607] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2014] [Accepted: 03/17/2014] [Indexed: 01/22/2023] Open
Abstract
Synonymous constraint elements (SCEs) are protein-coding genomic regions with very low synonymous mutation rates believed to carry additional, overlapping functions. Thousands of such potentially multi-functional elements were recently discovered by analyzing the levels and patterns of evolutionary conservation in human coding exons. These elements provide a good opportunity to improve our understanding of how the redundant nature of the genetic code is exploited in the cell. Our premise is that the protein segments encoded by such elements might better comply with the increased functional demands if they are structurally less constrained (i.e. intrinsically disordered). To test this idea, we investigated the protein segments encoded by SCEs with computational tools to describe the underlying structural properties. In addition to SCEs, we examined the level of disorder, secondary structure, and sequence complexity of protein regions overlapping with experimentally validated splice regulatory sites. We show that multi-functional gene regions translate into protein segments that are significantly enriched in structural disorder and compositional bias, while they are depleted in secondary structure and domain annotations compared to reference segments of similar lengths. This tendency suggests that relaxed protein structural constraints provide an advantage when accommodating multiple overlapping functions in coding regions.
Collapse
Affiliation(s)
- Mauricio Macossay-Castillo
- Vlaams Instituut voor Biotechnologie (VIB) Department of Structural Biology, Vrije Universiteit Brussel, Brussels, Belgium
| | - Simone Kosol
- Vlaams Instituut voor Biotechnologie (VIB) Department of Structural Biology, Vrije Universiteit Brussel, Brussels, Belgium
| | - Peter Tompa
- Vlaams Instituut voor Biotechnologie (VIB) Department of Structural Biology, Vrije Universiteit Brussel, Brussels, Belgium
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| | - Rita Pancsa
- Vlaams Instituut voor Biotechnologie (VIB) Department of Structural Biology, Vrije Universiteit Brussel, Brussels, Belgium
- * E-mail:
| |
Collapse
|
46
|
Genome-wide analysis of promoters: clustering by alignment and analysis of regular patterns. PLoS One 2014; 9:e85260. [PMID: 24465517 PMCID: PMC3898993 DOI: 10.1371/journal.pone.0085260] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2013] [Accepted: 11/26/2013] [Indexed: 01/08/2023] Open
Abstract
In this paper we perform a genome-wide analysis of H. sapiens promoters. To this aim, we developed and combined two mathematical methods that allow us to (i) classify promoters into groups characterized by specific global structural features, and (ii) recover, in full generality, any regular sequence in the different classes of promoters. One of the main findings of this analysis is that H. sapiens promoters can be classified into three main groups. Two of them are distinguished by the prevalence of weak or strong nucleotides and are characterized by short compositionally biased sequences, while the most frequent regular sequences in the third group are strongly correlated with transposons. Taking advantage of the generality of these mathematical procedures, we have compared the promoter database of H. sapiens with those of other species. We have found that the above-mentioned features characterize also the evolutionary content appearing in mammalian promoters, at variance with ancestral species in the phylogenetic tree, that exhibit a definitely lower level of differentiation among promoters.
Collapse
|
47
|
Eren AM, Maignien L, Sul WJ, Murphy LG, Grim SL, Morrison HG, Sogin ML. Oligotyping: Differentiating between closely related microbial taxa using 16S rRNA gene data. Methods Ecol Evol 2013; 4. [PMID: 24358444 PMCID: PMC3864673 DOI: 10.1111/2041-210x.12114] [Citation(s) in RCA: 444] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Bacteria comprise the most diverse domain of life on Earth, where they occupy nearly every possible ecological niche and play key roles in biological and chemical processes. Studying the composition and ecology of bacterial ecosystems and understanding their function are of prime importance. High-throughput sequencing technologies enable nearly comprehensive descriptions of bacterial diversity through 16S ribosomal RNA gene amplicons. Analyses of these communities generally rely upon taxonomic assignments through reference data bases or clustering approaches using de facto sequence similarity thresholds to identify operational taxonomic units. However, these methods often fail to resolve ecologically meaningful differences between closely related organisms in complex microbial data sets. In this paper, we describe oligotyping, a novel supervised computational method that allows researchers to investigate the diversity of closely related but distinct bacterial organisms in final operational taxonomic units identified in environmental data sets through 16S ribosomal RNA gene data by the canonical approaches. Our analysis of two data sets from two different environments demonstrates the capacity of oligotyping at discriminating distinct microbial populations of ecological importance. Oligotyping can resolve the distribution of closely related organisms across environments and unveil previously overlooked ecological patterns for microbial communities. The URL http://oligotyping.org offers an open-source software pipeline for oligotyping.
Collapse
Affiliation(s)
- A Murat Eren
- Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, MA 02543 USA
| | - Loïs Maignien
- Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, MA 02543 USA
| | - Woo Jun Sul
- Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, MA 02543 USA
| | - Leslie G Murphy
- Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, MA 02543 USA
| | - Sharon L Grim
- Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, MA 02543 USA
| | - Hilary G Morrison
- Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, MA 02543 USA
| | - Mitchell L Sogin
- Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, MA 02543 USA
| |
Collapse
|
48
|
Haudry A, Platts AE, Vello E, Hoen DR, Leclercq M, Williamson RJ, Forczek E, Joly-Lopez Z, Steffen JG, Hazzouri KM, Dewar K, Stinchcombe JR, Schoen DJ, Wang X, Schmutz J, Town CD, Edger PP, Pires JC, Schumaker KS, Jarvis DE, Mandáková T, Lysak MA, van den Bergh E, Schranz ME, Harrison PM, Moses AM, Bureau TE, Wright SI, Blanchette M. An atlas of over 90,000 conserved noncoding sequences provides insight into crucifer regulatory regions. Nat Genet 2013; 45:891-8. [PMID: 23817568 DOI: 10.1038/ng.2684] [Citation(s) in RCA: 227] [Impact Index Per Article: 18.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2012] [Accepted: 06/04/2013] [Indexed: 12/17/2022]
Abstract
Despite the central importance of noncoding DNA to gene regulation and evolution, understanding of the extent of selection on plant noncoding DNA remains limited compared to that of other organisms. Here we report sequencing of genomes from three Brassicaceae species (Leavenworthia alabamica, Sisymbrium irio and Aethionema arabicum) and their joint analysis with six previously sequenced crucifer genomes. Conservation across orthologous bases suggests that at least 17% of the Arabidopsis thaliana genome is under selection, with nearly one-quarter of the sequence under selection lying outside of coding regions. Much of this sequence can be localized to approximately 90,000 conserved noncoding sequences (CNSs) that show evidence of transcriptional and post-transcriptional regulation. Population genomics analyses of two crucifer species, A. thaliana and Capsella grandiflora, confirm that most of the identified CNSs are evolving under medium to strong purifying selection. Overall, these CNSs highlight both similarities and several key differences between the regulatory DNA of plants and other species.
Collapse
Affiliation(s)
- Annabelle Haudry
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Effect of genetic regions on the correlation between single point mutation variability and morbidity. Comput Biol Med 2013; 43:594-9. [DOI: 10.1016/j.compbiomed.2013.01.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2011] [Revised: 07/27/2012] [Accepted: 01/19/2013] [Indexed: 11/19/2022]
|
50
|
Oct4 switches partnering from Sox2 to Sox17 to reinterpret the enhancer code and specify endoderm. EMBO J 2013; 32:938-53. [PMID: 23474895 DOI: 10.1038/emboj.2013.31] [Citation(s) in RCA: 146] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2012] [Accepted: 01/24/2013] [Indexed: 01/04/2023] Open
Abstract
How regulatory information is encoded in the genome is poorly understood and poses a challenge when studying biological processes. We demonstrate here that genomic redistribution of Oct4 by alternative partnering with Sox2 and Sox17 is a fundamental regulatory event of endodermal specification. We show that Sox17 partners with Oct4 and binds to a unique 'compressed' Sox/Oct motif that earmarks endodermal genes. This is in contrast to the pluripotent state where Oct4 selectively partners with Sox2 at 'canonical' binding sites. The distinct selection of binding sites by alternative Sox/Oct partnering is underscored by our demonstration that rationally point-mutated Sox17 partners with Oct4 on pluripotency genes earmarked by the canonical Sox/Oct motif. In an endodermal differentiation assay, we demonstrate that the compressed motif is required for proper expression of endodermal genes. Evidently, Oct4 drives alternative developmental programs by switching Sox partners that affects enhancer selection, leading to either an endodermal or pluripotent cell fate. This work provides insights in understanding cell fate transcriptional regulation by highlighting the direct link between the DNA sequence of an enhancer and a developmental outcome.
Collapse
|