1
|
Sturgill D, Wang L, Arda HE. PancrESS - a meta-analysis resource for understanding cell-type specific expression in the human pancreas. BMC Genomics 2024; 25:76. [PMID: 38238687 PMCID: PMC10797729 DOI: 10.1186/s12864-024-09964-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 01/03/2024] [Indexed: 01/22/2024] Open
Abstract
BACKGROUND The human pancreas is composed of specialized cell types producing hormones and enzymes critical to human health. These specialized functions are the result of cell type-specific transcriptional programs which manifest in cell-specific gene expression. Understanding these programs is essential to developing therapies for pancreatic disorders. Transcription in the human pancreas has been widely studied by single-cell RNA technologies, however the diversity of protocols and analysis methods hinders their interpretability in the aggregate. RESULTS In this work, we perform a meta-analysis of pancreatic single-cell RNA sequencing data. We present a database for reference transcriptome abundances and cell-type specificity metrics. This database facilitates the identification and definition of marker genes within the pancreas. Additionally, we introduce a versatile tool which is freely available as an R package, and should permit integration into existing workflows. Our tool accepts count data files generated by widely-used single-cell gene expression platforms in their original format, eliminating an additional pre-formatting step. Although we designed it to calculate expression specificity of pancreas cell types, our tool is agnostic to the biological source of count data, extending its applicability to other biological systems. CONCLUSIONS Our findings enhance the current understanding of expression specificity within the pancreas, surpassing previous work in terms of scope and detail. Furthermore, our database and tool enable researchers to perform similar calculations in diverse biological systems, expanding the applicability of marker gene identification and facilitating comparative analyses.
Collapse
Affiliation(s)
- David Sturgill
- Laboratory of Receptor Biology and Gene Expression, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, 20892, USA
| | - Li Wang
- Laboratory of Receptor Biology and Gene Expression, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, 20892, USA
| | - H Efsun Arda
- Laboratory of Receptor Biology and Gene Expression, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, 20892, USA.
| |
Collapse
|
2
|
Petrusca DN, Mulcrone PL, Macar DA, Bishop RT, Berdyshev E, Suvannasankha A, Anderson JL, Sun Q, Auron PE, Galson DL, Roodman GD. GFI1-Dependent Repression of SGPP1 Increases Multiple Myeloma Cell Survival. Cancers (Basel) 2022; 14:cancers14030772. [PMID: 35159039 PMCID: PMC8833953 DOI: 10.3390/cancers14030772] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 01/27/2022] [Accepted: 01/31/2022] [Indexed: 11/16/2022] Open
Abstract
Simple Summary New therapies have greatly improved the progression-free and overall survival for patients with “standard risk” multiple myeloma (MM). However, patients with “high risk” MM, in particular patients whose MM cells harbor non-functional p53, have very short survival times because of the early relapse and rapid development of highly therapy-resistant MM. In this report, we identify a novel mechanism responsible for Growth Factor Independence-1 (GFI1) regulation of the growth and survival of MM cells through its modulation of sphingolipid metabolism, regardless of their p53 status. We identify the Sphingosine-1-Phosphate Phosphatase (SGPP1) gene as a novel direct target of GFI1 transcriptional repression in MM cells, thus increasing intracellular sphingosine-1-phosphate levels, which stabilizes c-Myc. Our results support GFI1 as an attractive therapeutic target for all types of MM, including the “high risk” patient population with non-functional p53, as well as a possible therapeutic approach for other types of cancers expressing high levels of c-Myc. Abstract Multiple myeloma (MM) remains incurable for most patients due to the emergence of drug resistant clones. Here we report a p53-independent mechanism responsible for Growth Factor Independence-1 (GFI1) support of MM cell survival by its modulation of sphingolipid metabolism to increase the sphingosine-1-phosphate (S1P) level regardless of the p53 status. We found that expression of enzymes that control S1P biosynthesis, SphK1, dephosphorylation, and SGPP1 were differentially correlated with GFI1 levels in MM cells. We detected GFI1 occupancy on the SGGP1 gene in MM cells in a predicted enhancer region at the 5’ end of intron 1, which correlated with decreased SGGP1 expression and increased S1P levels in GFI1 overexpressing cells, regardless of their p53 status. The high S1P:Ceramide intracellular ratio in MM cells protected c-Myc protein stability in a PP2A-dependent manner. The decreased MM viability by SphK1 inhibition was dependent on the induction of autophagy in both p53WT and p53mut MM. An autophagic blockade prevented GFI1 support for viability only in p53mut MM, demonstrating that GFI1 increases MM cell survival via both p53WT inhibition and upregulation of S1P independently. Therefore, GFI1 may be a key therapeutic target for all types of MM that may significantly benefit patients that are highly resistant to current therapies.
Collapse
Affiliation(s)
- Daniela N. Petrusca
- Department of Medicine, Hematology/Oncology Division, Indiana University School of Medicine, 980 Walnut St., Indianapolis, IN 46202, USA; (P.L.M.); (A.S.); (J.L.A.); (G.D.R.)
- Correspondence: ; Tel.: +1-(317)-278-5548
| | - Patrick L. Mulcrone
- Department of Medicine, Hematology/Oncology Division, Indiana University School of Medicine, 980 Walnut St., Indianapolis, IN 46202, USA; (P.L.M.); (A.S.); (J.L.A.); (G.D.R.)
| | - David A. Macar
- Department of Biological Sciences, Duquesne University, 600 Forbes Ave., Pittsburgh, PA 15219, USA; (D.A.M.); (P.E.A.)
| | - Ryan T. Bishop
- Department of Tumor Biology, H. Lee Moffitt Cancer Research Center and Institute, 12902 USF Magnolia Drive, Tampa, FL 33612, USA;
| | - Evgeny Berdyshev
- Department of Medicine, National Jewish Health, 1400 Jackson Street, Denver, CO 80206, USA;
| | - Attaya Suvannasankha
- Department of Medicine, Hematology/Oncology Division, Indiana University School of Medicine, 980 Walnut St., Indianapolis, IN 46202, USA; (P.L.M.); (A.S.); (J.L.A.); (G.D.R.)
- Richard L. Rodebush Veterans Affairs Medical Center, 1481 W 10th St., Indianapolis, IN 46202, USA
| | - Judith L. Anderson
- Department of Medicine, Hematology/Oncology Division, Indiana University School of Medicine, 980 Walnut St., Indianapolis, IN 46202, USA; (P.L.M.); (A.S.); (J.L.A.); (G.D.R.)
| | - Quanhong Sun
- Department of Medicine, Division of Hematology/Oncology, McGowan Institute for Regenerative Medicine, University of Pittsburgh, UPMC Hillman Cancer Center Research Pavilion, 5117 Centre Ave, Pittsburgh, PA 15213, USA; (Q.S.); (D.L.G.)
| | - Philip E. Auron
- Department of Biological Sciences, Duquesne University, 600 Forbes Ave., Pittsburgh, PA 15219, USA; (D.A.M.); (P.E.A.)
| | - Deborah L. Galson
- Department of Medicine, Division of Hematology/Oncology, McGowan Institute for Regenerative Medicine, University of Pittsburgh, UPMC Hillman Cancer Center Research Pavilion, 5117 Centre Ave, Pittsburgh, PA 15213, USA; (Q.S.); (D.L.G.)
| | - G. David Roodman
- Department of Medicine, Hematology/Oncology Division, Indiana University School of Medicine, 980 Walnut St., Indianapolis, IN 46202, USA; (P.L.M.); (A.S.); (J.L.A.); (G.D.R.)
- Richard L. Rodebush Veterans Affairs Medical Center, 1481 W 10th St., Indianapolis, IN 46202, USA
| |
Collapse
|
3
|
Lin LH, Chou CH, Cheng HW, Chang KW, Liu CJ. Precise Identification of Recurrent Somatic Mutations in Oral Cancer Through Whole-Exome Sequencing Using Multiple Mutation Calling Pipelines. Front Oncol 2021; 11:741626. [PMID: 34912705 PMCID: PMC8666431 DOI: 10.3389/fonc.2021.741626] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 11/11/2021] [Indexed: 01/18/2023] Open
Abstract
Understanding the genomic alterations in oral carcinogenesis remains crucial for the appropriate diagnosis and treatment of oral squamous cell carcinoma (OSCC). To unveil the mutational spectrum, in this study, we conducted whole-exome sequencing (WES), using six mutation calling pipelines and multiple filtering criteria applied to 50 paired OSCC samples. The tumor mutation burden extracted from the data set of somatic variations was significantly associated with age, tumor staging, and survival. Several genes (MUC16, MUC19, KMT2D, TTN, HERC2) with a high frequency of false positive mutations were identified. Moreover, known (TP53, FAT1, EPHA2, NOTCH1, CASP8, and PIK3CA) and novel (HYDIN, ALPK3, ASXL1, USP9X, SKOR2, CPLANE1, STARD9, and NSD2) genes have been found to be significantly and frequently mutated in OSCC. Further analysis of gene alteration status with clinical parameters revealed that canonical pathways, including clathrin-mediated endocytotic signaling, NFκB signaling, PEDF signaling, and calcium signaling were associated with OSCC prognosis. Defining a catalog of targetable genomic alterations showed that 58% of the tumors carried at least one aberrant event that may potentially be targeted by approved therapeutic agents. We found molecular OSCC subgroups which were correlated with etiology and prognosis while defining the landscape of major altered events in the coding regions of OSCC genomes. These findings provide information that will be helpful in the design of clinical trials on targeted therapies and in the stratification of patients with OSCC according to therapeutic efficacy.
Collapse
Affiliation(s)
- Li-Han Lin
- Department of Medical Research, MacKay Memorial Hospital, Taipei, Taiwan
| | - Chung-Hsien Chou
- Institute of Oral Biology, School of Dentistry, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Hui-Wen Cheng
- Department of Medical Research, MacKay Memorial Hospital, Taipei, Taiwan
| | - Kuo-Wei Chang
- Institute of Oral Biology, School of Dentistry, National Yang Ming Chiao Tung University, Taipei, Taiwan.,Department of Stomatology, Taipei Veterans General Hospital, Taipei, Taiwan
| | - Chung-Ji Liu
- Department of Medical Research, MacKay Memorial Hospital, Taipei, Taiwan.,Department of Oral and Maxillofacial Surgery, Taipei MacKay Memorial Hospital, Taipei, Taiwan
| |
Collapse
|
4
|
Abstract
µ-Crystallin is a NADPH-regulated thyroid hormone binding protein encoded by the CRYM gene in humans. It is primarily expressed in the brain, muscle, prostate, and kidney, where it binds thyroid hormones, which regulate metabolism and thermogenesis. It also acts as a ketimine reductase in the lysine degradation pathway when it is not bound to thyroid hormone. Mutations in CRYM can result in non-syndromic deafness, while its aberrant expression, predominantly in the brain but also in other tissues, has been associated with psychiatric, neuromuscular, and inflammatory diseases. CRYM expression is highly variable in human skeletal muscle, with 15% of individuals expressing ≥13 fold more CRYM mRNA than the median level. Ablation of the Crym gene in murine models results in the hypertrophy of fast twitch muscle fibers and an increase in fat mass of mice fed a high fat diet. Overexpression of Crym in mice causes a shift in energy utilization away from glycolysis towards an increase in the catabolism of fat via β-oxidation, with commensurate changes of metabolically involved transcripts and proteins. The history, attributes, functions, and diseases associated with CRYM, an important modulator of metabolism, are reviewed.
Collapse
Affiliation(s)
- Christian J Kinney
- Department of Physiology School of Medicine, University of Maryland, Baltimore, Baltimore, MD 21201
| | - Robert J Bloch
- Department of Physiology School of Medicine, University of Maryland, Baltimore, Baltimore, MD 21201
| |
Collapse
|
5
|
Collobert M, Bocher O, Le Nabec A, Génin E, Férec C, Moisan S. CFTR Cooperative Cis-Regulatory Elements in Intestinal Cells. Int J Mol Sci 2021; 22:ijms22052599. [PMID: 33807548 PMCID: PMC7961337 DOI: 10.3390/ijms22052599] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 02/25/2021] [Accepted: 02/27/2021] [Indexed: 11/16/2022] Open
Abstract
About 8% of the human genome is covered with candidate cis-regulatory elements (cCREs). Disruptions of CREs, described as "cis-ruptions" have been identified as being involved in various genetic diseases. Thanks to the development of chromatin conformation study techniques, several long-range cystic fibrosis transmembrane conductance regulator (CFTR) regulatory elements were identified, but the regulatory mechanisms of the CFTR gene have yet to be fully elucidated. The aim of this work is to improve our knowledge of the CFTR gene regulation, and to identity factors that could impact the CFTR gene expression, and potentially account for the variability of the clinical presentation of cystic fibrosis as well as CFTR-related disorders. Here, we apply the robust GWAS3D score to determine which of the CFTR introns could be involved in gene regulation. This approach highlights four particular CFTR introns of interest. Using reporter gene constructs in intestinal cells, we show that two new introns display strong cooperative effects in intestinal cells. Chromatin immunoprecipitation analyses further demonstrate fixation of transcription factors network. These results provide new insights into our understanding of the CFTR gene regulation and allow us to suggest a 3D CFTR locus structure in intestinal cells. A better understand of regulation mechanisms of the CFTR gene could elucidate cases of patients where the phenotype is not yet explained by the genotype. This would thus help in better diagnosis and therefore better management. These cis-acting regions may be a therapeutic challenge that could lead to the development of specific molecules capable of modulating gene expression in the future.
Collapse
Affiliation(s)
- Mégane Collobert
- Univ. Brest, Inserm, EFS, UMR 1078, GGB, F-29200 Brest, France; (O.B.); (A.L.N.); (E.G.); (C.F.)
- Correspondence: (M.C.); (S.M.); Tel.: +33-298-0165-67 (M.C.)
| | - Ozvan Bocher
- Univ. Brest, Inserm, EFS, UMR 1078, GGB, F-29200 Brest, France; (O.B.); (A.L.N.); (E.G.); (C.F.)
| | - Anaïs Le Nabec
- Univ. Brest, Inserm, EFS, UMR 1078, GGB, F-29200 Brest, France; (O.B.); (A.L.N.); (E.G.); (C.F.)
| | - Emmanuelle Génin
- Univ. Brest, Inserm, EFS, UMR 1078, GGB, F-29200 Brest, France; (O.B.); (A.L.N.); (E.G.); (C.F.)
| | - Claude Férec
- Univ. Brest, Inserm, EFS, UMR 1078, GGB, F-29200 Brest, France; (O.B.); (A.L.N.); (E.G.); (C.F.)
- Department of Molecular Genetics and Reproduction Biology, CHRU Brest, F-29200 Brest, France
| | - Stéphanie Moisan
- Univ. Brest, Inserm, EFS, UMR 1078, GGB, F-29200 Brest, France; (O.B.); (A.L.N.); (E.G.); (C.F.)
- Department of Molecular Genetics and Reproduction Biology, CHRU Brest, F-29200 Brest, France
- Correspondence: (M.C.); (S.M.); Tel.: +33-298-0165-67 (M.C.)
| |
Collapse
|
6
|
Miga KH. Centromere studies in the era of 'telomere-to-telomere' genomics. Exp Cell Res 2020; 394:112127. [PMID: 32504677 DOI: 10.1016/j.yexcr.2020.112127] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Revised: 05/23/2020] [Accepted: 05/30/2020] [Indexed: 12/17/2022]
Abstract
We are entering into an exciting era of genomics where truly complete, high-quality assemblies of human chromosomes are available end-to-end, or from 'telomere-to-telomere' (T2T). This technological advance offers a new opportunity to include endogenous human centromeric regions in high-resolution, sequence-based studies. These emerging reference maps are expected to reveal a new functional landscape in the human genome, where centromere proteins, transcriptional regulation, and spatial organization can be examined with base-level resolution across different stages of development and disease. Such studies will depend on innovative assembly methods of extremely long tandem repeats (ETRs), or satellite DNAs, paired with the development of new, orthogonal validation methods to ensure accuracy and completeness. This review reflects the progress in centromere genomics, credited by recent advancements in long-read sequencing and assembly methods. In doing so, I will discuss the challenges that remain and the promise for a new period of scientific discovery for satellite DNA biology and centromere function.
Collapse
Affiliation(s)
- Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, CA, 95064, USA.
| |
Collapse
|
7
|
Umbreit NT, Zhang CZ, Lynch LD, Blaine LJ, Cheng AM, Tourdot R, Sun L, Almubarak HF, Judge K, Mitchell TJ, Spektor A, Pellman D. Mechanisms generating cancer genome complexity from a single cell division error. Science 2020; 368:eaba0712. [PMID: 32299917 PMCID: PMC7347108 DOI: 10.1126/science.aba0712] [Citation(s) in RCA: 231] [Impact Index Per Article: 57.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2019] [Accepted: 03/04/2020] [Indexed: 12/12/2022]
Abstract
The chromosome breakage-fusion-bridge (BFB) cycle is a mutational process that produces gene amplification and genome instability. Signatures of BFB cycles can be observed in cancer genomes alongside chromothripsis, another catastrophic mutational phenomenon. We explain this association by elucidating a mutational cascade that is triggered by a single cell division error-chromosome bridge formation-that rapidly increases genomic complexity. We show that actomyosin forces are required for initial bridge breakage. Chromothripsis accumulates, beginning with aberrant interphase replication of bridge DNA. A subsequent burst of DNA replication in the next mitosis generates extensive DNA damage. During this second cell division, broken bridge chromosomes frequently missegregate and form micronuclei, promoting additional chromothripsis. We propose that iterations of this mutational cascade generate the continuing evolution and subclonal heterogeneity characteristic of many human cancers.
Collapse
Affiliation(s)
- Neil T Umbreit
- Howard Hughes Medical Institute, Chevy Chase, MD, USA.
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Cheng-Zhong Zhang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Luke D Lynch
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Logan J Blaine
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Anna M Cheng
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Richard Tourdot
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Lili Sun
- Single-Cell Sequencing Program, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Hannah F Almubarak
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Kim Judge
- Wellcome Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Thomas J Mitchell
- Wellcome Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
- Cambridge University Hospitals NHS Foundation Trust, Cambridge CB2 0QQ, UK
| | - Alexander Spektor
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Radiation Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - David Pellman
- Howard Hughes Medical Institute, Chevy Chase, MD, USA.
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| |
Collapse
|
8
|
Systematic microsatellite repeat expansion cloning and validation. Hum Genet 2020; 139:1233-1246. [PMID: 32277284 DOI: 10.1007/s00439-020-02165-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Accepted: 04/04/2020] [Indexed: 10/24/2022]
Abstract
Approximately 3% of the human genome is composed of short tandem repeat (STR) DNA sequence known as microsatellites, which can be found in both coding and non-coding regions. When associated with genic regions, expansion of microsatellite repeats beyond a critical threshold causes dozens of neurological repeat expansion disorders. To better understand the molecular pathology of repeat expansion disorders, precise cloning of microsatellite repeat sequence and expansion size is highly valuable. Unfortunately, cloning repeat expansions is often challenging and presents a significant bottleneck to practical investigation. Here, we describe a clear method for seamless and systematic cloning of practically any microsatellite repeat expansion. We use cloning and expansion of GGGGCC repeats, which are the leading genetic cause of amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD), as an example. We employ a recursive directional ligation (RDL) technique to build multiple GGGGCC repeat-containing vectors. We describe methods to validate repeat expansion cloning, including diagnostic restriction digestion, PCR across the repeat, and next-generation long-read MinION nanopore sequencing. Validated cloning of microsatellite repeats beyond the critical expansion threshold can facilitate step-by-step characterization of disease mechanisms at the cellular and molecular level.
Collapse
|
9
|
Chung CH, Allen AG, Sullivan NT, Atkins A, Nonnemacher MR, Wigdahl B, Dampier W. Computational Analysis Concerning the Impact of DNA Accessibility on CRISPR-Cas9 Cleavage Efficiency. Mol Ther 2020; 28:19-28. [PMID: 31672284 PMCID: PMC6953893 DOI: 10.1016/j.ymthe.2019.10.008] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2019] [Revised: 09/26/2019] [Accepted: 10/10/2019] [Indexed: 12/15/2022] Open
Abstract
Defining the variables that impact the specificity of CRISPR/Cas9 has been a major research focus. Whereas sequence complementarity between guide RNA and target DNA substantially dictates cleavage efficiency, DNA accessibility of the targeted loci has also been hypothesized to be an important factor. In this study, functional data from two genome-wide assays, genome-wide, unbiased identification of DSBs enabled by sequencing (GUIDE-seq) and circularization for in vitro reporting of cleavage effects by sequencing (CIRCLE-seq), have been computationally analyzed in conjunction with DNA accessibility determined via DNase I-hypersensitive sequencing from the Encyclopedia of DNA Elements (ENCODE) Database and transcriptome from the Sequence Read Archive to determine whether cellular factors influence CRISPR-induced cleavage efficiency. CIRCLE-seq and GUIDE-seq datasets were selected to represent the absence and presence of cellular factors, respectively. Data analysis revealed that correlations between sequence similarity and CRISPR-induced cleavage frequency were altered by the presence of cellular factors that modulated the level of DNA accessibility. The above-mentioned correlation was abolished when cleavage sites were located in less accessible regions. Furthermore, CRISPR-mediated edits were permissive even at regions that were insufficient for most endogenous genes to be expressed. These results provide a strong basis to dissect the contribution of local chromatin modulation markers on CRISPR-induced cleavage efficiency.
Collapse
Affiliation(s)
- Cheng-Han Chung
- Department of Microbiology and Immunology, Drexel University College of Medicine, Philadelphia, PA 19129, USA; Center for Molecular Virology and Translational Neuroscience, Institute for Molecular Medicine and Infectious Disease, Drexel University College of Medicine, Philadelphia, PA 19129, USA
| | - Alexander G Allen
- Department of Microbiology and Immunology, Drexel University College of Medicine, Philadelphia, PA 19129, USA; Center for Molecular Virology and Translational Neuroscience, Institute for Molecular Medicine and Infectious Disease, Drexel University College of Medicine, Philadelphia, PA 19129, USA
| | - Neil T Sullivan
- Department of Microbiology and Immunology, Drexel University College of Medicine, Philadelphia, PA 19129, USA; Center for Molecular Virology and Translational Neuroscience, Institute for Molecular Medicine and Infectious Disease, Drexel University College of Medicine, Philadelphia, PA 19129, USA
| | - Andrew Atkins
- Department of Microbiology and Immunology, Drexel University College of Medicine, Philadelphia, PA 19129, USA; Center for Molecular Virology and Translational Neuroscience, Institute for Molecular Medicine and Infectious Disease, Drexel University College of Medicine, Philadelphia, PA 19129, USA
| | - Michael R Nonnemacher
- Department of Microbiology and Immunology, Drexel University College of Medicine, Philadelphia, PA 19129, USA; Center for Molecular Virology and Translational Neuroscience, Institute for Molecular Medicine and Infectious Disease, Drexel University College of Medicine, Philadelphia, PA 19129, USA; Sidney Kimmel Cancer Center, Thomas Jefferson University, Philadelphia, PA 19107, USA
| | - Brian Wigdahl
- Department of Microbiology and Immunology, Drexel University College of Medicine, Philadelphia, PA 19129, USA; Center for Molecular Virology and Translational Neuroscience, Institute for Molecular Medicine and Infectious Disease, Drexel University College of Medicine, Philadelphia, PA 19129, USA; Sidney Kimmel Cancer Center, Thomas Jefferson University, Philadelphia, PA 19107, USA.
| | - Will Dampier
- Department of Microbiology and Immunology, Drexel University College of Medicine, Philadelphia, PA 19129, USA; Center for Molecular Virology and Translational Neuroscience, Institute for Molecular Medicine and Infectious Disease, Drexel University College of Medicine, Philadelphia, PA 19129, USA; School of Biomedical Engineering, Science, and Health Systems, Drexel University, Philadelphia, PA 19104, USA.
| |
Collapse
|
10
|
Nagasaki M, Kuroki Y, Shibata TF, Katsuoka F, Mimori T, Kawai Y, Minegishi N, Hozawa A, Kuriyama S, Suzuki Y, Kawame H, Nagami F, Takai-Igarashi T, Ogishima S, Kojima K, Misawa K, Tanabe O, Fuse N, Tanaka H, Yaegashi N, Kinoshita K, Kure S, Yasuda J, Yamamoto M. Construction of JRG (Japanese reference genome) with single-molecule real-time sequencing. Hum Genome Var 2019; 6:27. [PMID: 31231536 PMCID: PMC6555796 DOI: 10.1038/s41439-019-0057-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Revised: 01/28/2019] [Accepted: 03/15/2019] [Indexed: 12/14/2022] Open
Abstract
In recent genome analyses, population-specific reference panels have indicated important. However, reference panels based on short-read sequencing data do not sufficiently cover long insertions. Therefore, the nature of long insertions has not been well documented. Here, we assembled a Japanese genome using single-molecule real-time sequencing data and characterized insertions found in the assembled genome. We identified 3691 insertions ranging from 100 bps to ~10,000 bps in the assembled genome relative to the international reference sequence (GRCh38). To validate and characterize these insertions, we mapped short-reads from 1070 Japanese individuals and 728 individuals from eight other populations to insertions integrated into GRCh38. With this result, we constructed JRGv1 (Japanese Reference Genome version 1) by integrating the 903 verified insertions, totaling 1,086,173 bases, shared by at least two Japanese individuals into GRCh38. We also constructed decoyJRGv1 by concatenating 3559 verified insertions, totaling 2,536,870 bases, shared by at least two Japanese individuals or by six other assemblies. This assembly improved the alignment ratio by 0.4% on average. These results demonstrate the importance of refining the reference assembly and creating a population-specific reference genome. JRGv1 and decoyJRGv1 are available at the JRG website. Researchers in Japan have assembled a Japanese reference genome, which includes sequences missing from the international reference genome, as well as others specific to East Asian populations. A team led by Masao Nagasaki and Masayuki Yamamoto sequenced a Japanese individual using a method, which produces longer sequences than previous technologies. Using this approach, they identified thousands of sequences spanning 2.5 million bases, which were absent in the international reference genome. Many of these were sequences able to move within the genome. They showed that the majority of these sequences are also present in early humans and chimpanzees, demonstrating that their absence from the current reference is due to deletions or limitations of earlier sequencing methodologies. In addition to providing a population-specific reference, these findings demonstrate the importance of continually improving the international reference genome.
Collapse
Affiliation(s)
- Masao Nagasaki
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan.,3Graduate School of Information Sciences, Tohoku University, Sendai, Japan
| | - Yoko Kuroki
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan.,4Department of Genome Medicine, National Center for Child Health and Development, Tokyo, Japan
| | - Tomoko F Shibata
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan
| | - Fumiki Katsuoka
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan
| | - Takahiro Mimori
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan
| | - Yosuke Kawai
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan.,3Graduate School of Information Sciences, Tohoku University, Sendai, Japan
| | - Naoko Minegishi
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan
| | - Atsushi Hozawa
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan
| | - Shinichi Kuriyama
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan.,5International Research Institute of Disaster Science, Tohoku University, Sendai, Japan
| | - Yoichi Suzuki
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan
| | - Hiroshi Kawame
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan
| | - Fuji Nagami
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan
| | | | - Soichi Ogishima
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan
| | - Kaname Kojima
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan.,3Graduate School of Information Sciences, Tohoku University, Sendai, Japan
| | - Kazuharu Misawa
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan
| | - Osamu Tanabe
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan
| | - Nobuo Fuse
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,6Tohoku University Hospital, Tohoku University, Sendai, Japan
| | - Hiroshi Tanaka
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan
| | - Nobuo Yaegashi
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan.,6Tohoku University Hospital, Tohoku University, Sendai, Japan
| | - Kengo Kinoshita
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,3Graduate School of Information Sciences, Tohoku University, Sendai, Japan
| | - Shiego Kure
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan.,6Tohoku University Hospital, Tohoku University, Sendai, Japan
| | - Jun Yasuda
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan
| | - Masayuki Yamamoto
- 1Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan.,2Graduate School of Medicine, Tohoku University, Sendai, Japan
| |
Collapse
|
11
|
Duda Z, Trusiak S, O'Neill R. Centromere Transcription: Means and Motive. PROGRESS IN MOLECULAR AND SUBCELLULAR BIOLOGY 2019; 56:257-281. [PMID: 28840241 DOI: 10.1007/978-3-319-58592-5_11] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
The chromosome biology field at large has benefited from studies of the cell cycle components, protein cascades and genomic landscape that are required for centromere identity, assembly and stable transgenerational inheritance. Research over the past 20 years has challenged the classical descriptions of a centromere as a stable, unmutable, and transcriptionally silent chromosome component. Instead, based on studies from a broad range of eukaryotic species, including yeast, fungi, plants, and animals, the centromere has been redefined as one of the more dynamic areas of the eukaryotic genome, requiring coordination of protein complex assembly, chromatin assembly, and transcriptional activity in a cell cycle specific manner. What has emerged from more recent studies is the realization that the transcription of specific types of nucleic acids is a key process in defining centromere integrity and function. To illustrate the transcriptional landscape of centromeres across eukaryotes, we focus this review on how transcripts interact with centromere proteins, when in the cell cycle centromeric transcription occurs, and what types of sequences are being transcribed. Utilizing data from broadly different organisms, a picture emerges that places centromeric transcription as an integral component of centromere function.
Collapse
Affiliation(s)
- Zachary Duda
- Department of Molecular and Cell Biology, The Institute for Systems Genomics, University of Connecticut, Storrs, CT, 06269, USA
| | - Sarah Trusiak
- Department of Molecular and Cell Biology, The Institute for Systems Genomics, University of Connecticut, Storrs, CT, 06269, USA
| | - Rachel O'Neill
- Department of Molecular and Cell Biology, The Institute for Systems Genomics, University of Connecticut, Storrs, CT, 06269, USA.
| |
Collapse
|
12
|
Miga KH. Centromeric Satellite DNAs: Hidden Sequence Variation in the Human Population. Genes (Basel) 2019; 10:E352. [PMID: 31072070 PMCID: PMC6562703 DOI: 10.3390/genes10050352] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 05/03/2019] [Accepted: 05/03/2019] [Indexed: 12/30/2022] Open
Abstract
The central goal of medical genomics is to understand the inherited basis of sequence variation that underlies human physiology, evolution, and disease. Functional association studies currently ignore millions of bases that span each centromeric region and acrocentric short arm. These regions are enriched in long arrays of tandem repeats, or satellite DNAs, that are known to vary extensively in copy number and repeat structure in the human population. Satellite sequence variation in the human genome is often so large that it is detected cytogenetically, yet due to the lack of a reference assembly and informatics tools to measure this variability, contemporary high-resolution disease association studies are unable to detect causal variants in these regions. Nevertheless, recently uncovered associations between satellite DNA variation and human disease support that these regions present a substantial and biologically important fraction of human sequence variation. Therefore, there is a pressing and unmet need to detect and incorporate this uncharacterized sequence variation into broad studies of human evolution and medical genomics. Here I discuss the current knowledge of satellite DNA variation in the human genome, focusing on centromeric satellites and their potential implications for disease.
Collapse
Affiliation(s)
- Karen H Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California, CA 95064, USA.
| |
Collapse
|
13
|
Breitwieser FP, Pertea M, Zimin AV, Salzberg SL. Human contamination in bacterial genomes has created thousands of spurious proteins. Genome Res 2019; 29:954-960. [PMID: 31064768 PMCID: PMC6581058 DOI: 10.1101/gr.245373.118] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Accepted: 05/03/2019] [Indexed: 01/22/2023]
Abstract
Contaminant sequences that appear in published genomes can cause numerous problems for downstream analyses, particularly for evolutionary studies and metagenomics projects. Our large-scale scan of complete and draft bacterial and archaeal genomes in the NCBI RefSeq database reveals that 2250 genomes are contaminated by human sequence. The contaminant sequences derive primarily from high-copy human repeat regions, which themselves are not adequately represented in the current human reference genome, GRCh38. The absence of the sequences from the human assembly offers a likely explanation for their presence in bacterial assemblies. In some cases, the contaminating contigs have been erroneously annotated as containing protein-coding sequences, which over time have propagated to create spurious protein “families” across multiple prokaryotic and eukaryotic genomes. As a result, 3437 spurious protein entries are currently present in the widely used nr and TrEMBL protein databases. We report here an extensive list of contaminant sequences in bacterial genome assemblies and the proteins associated with them. We found that nearly all contaminants occurred in small contigs in draft genomes, which suggests that filtering out small contigs from draft genome assemblies may mitigate the issue of contamination while still keeping nearly all of the genuine genomic sequences.
Collapse
Affiliation(s)
- Florian P Breitwieser
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, Maryland 21205, USA
| | - Mihaela Pertea
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, Maryland 21205, USA.,Department of Computer Science, Whiting School of Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Aleksey V Zimin
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, Maryland 21205, USA.,Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Steven L Salzberg
- Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, Maryland 21205, USA.,Department of Computer Science, Whiting School of Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA.,Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA.,Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland 21205, USA
| |
Collapse
|
14
|
Zhu Q, Hoong N, Aslanian A, Hara T, Benner C, Heinz S, Miga KH, Ke E, Verma S, Soroczynski J, Yates JR, Hunter T, Verma IM. Heterochromatin-Encoded Satellite RNAs Induce Breast Cancer. Mol Cell 2018; 70:842-853.e7. [PMID: 29861157 DOI: 10.1016/j.molcel.2018.04.023] [Citation(s) in RCA: 77] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2017] [Revised: 02/22/2018] [Accepted: 04/26/2018] [Indexed: 12/19/2022]
Abstract
Heterochromatic repetitive satellite RNAs are extensively transcribed in a variety of human cancers, including BRCA1 mutant breast cancer. Aberrant expression of satellite RNAs in cultured cells induces the DNA damage response, activates cell cycle checkpoints, and causes defects in chromosome segregation. However, the mechanism by which satellite RNA expression leads to genomic instability is not well understood. Here we provide evidence that increased levels of satellite RNAs in mammary glands induce tumor formation in mice. Using mass spectrometry, we further show that genomic instability induced by satellite RNAs occurs through interactions with BRCA1-associated protein networks required for the stabilization of DNA replication forks. Additionally, de-stabilized replication forks likely promote the formation of RNA-DNA hybrids in cells expressing satellite RNAs. These studies lay the foundation for developing novel therapeutic strategies that block the effects of non-coding satellite RNAs in cancer cells.
Collapse
Affiliation(s)
- Quan Zhu
- Laboratory of Genetics, Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Nien Hoong
- Laboratory of Genetics, Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Aaron Aslanian
- Molecular and Cell Biology Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037, USA; Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Toshiro Hara
- Laboratory of Genetics, Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Christopher Benner
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Sven Heinz
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Karen H Miga
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Eugene Ke
- Laboratory of Genetics, Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Sachin Verma
- Laboratory of Genetics, Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Jan Soroczynski
- Molecular and Cell Biology Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - John R Yates
- Department of Chemical Physiology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | - Tony Hunter
- Molecular and Cell Biology Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037, USA.
| | - Inder M Verma
- Laboratory of Genetics, Salk Institute for Biological Studies, La Jolla, CA 92037, USA.
| |
Collapse
|
15
|
High-depth whole genome sequencing of an Ashkenazi Jewish reference panel: enhancing sensitivity, accuracy, and imputation. Hum Genet 2018; 137:343-355. [PMID: 29705978 DOI: 10.1007/s00439-018-1886-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2018] [Accepted: 04/21/2018] [Indexed: 12/31/2022]
Abstract
While increasingly large reference panels for genome-wide imputation have been recently made available, the degree to which imputation accuracy can be enhanced by population-specific reference panels remains an open question. Here, we sequenced at full-depth (≥ 30×), across two platforms (Illumina X Ten and Complete Genomics, Inc.), a moderately large (n = 738) cohort of samples drawn from the Ashkenazi Jewish population. We developed a series of quality control steps to optimize sensitivity, specificity, and comprehensiveness of variant calls in the reference panel, and then tested the accuracy of imputation against target cohorts drawn from the same population. Quality control (QC) thresholds for the Illumina X Ten platform were identified that permitted highly accurate calling of single nucleotide variants across 94% of the genome. QC procedures also identified numerous regions that are poorly mapped using current reference or alternate assemblies. After stringent QC, the population-specific reference panel produced more accurate and comprehensive imputation results relative to publicly available, large cosmopolitan reference panels, especially in the range of rare variants that may be most critical to further progress in mapping of complex phenotypes. The population-specific reference panel also permitted enhanced filtering of clinically irrelevant variants from personal genomes.
Collapse
|
16
|
Schneider VA, Graves-Lindsay T, Howe K, Bouk N, Chen HC, Kitts PA, Murphy TD, Pruitt KD, Thibaud-Nissen F, Albracht D, Fulton RS, Kremitzki M, Magrini V, Markovic C, McGrath S, Steinberg KM, Auger K, Chow W, Collins J, Harden G, Hubbard T, Pelan S, Simpson JT, Threadgold G, Torrance J, Wood JM, Clarke L, Koren S, Boitano M, Peluso P, Li H, Chin CS, Phillippy AM, Durbin R, Wilson RK, Flicek P, Eichler EE, Church DM. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res 2017; 27:849-864. [PMID: 28396521 PMCID: PMC5411779 DOI: 10.1101/gr.213611.116] [Citation(s) in RCA: 509] [Impact Index Per Article: 72.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2016] [Accepted: 03/14/2017] [Indexed: 11/24/2022]
Abstract
The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.
Collapse
Affiliation(s)
- Valerie A Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Tina Graves-Lindsay
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | - Kerstin Howe
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Nathan Bouk
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Hsiu-Chuan Chen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Paul A Kitts
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Derek Albracht
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | - Robert S Fulton
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | - Milinn Kremitzki
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | - Vincent Magrini
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | - Chris Markovic
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | - Sean McGrath
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | | | - Kate Auger
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - William Chow
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Joanna Collins
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Glenn Harden
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Timothy Hubbard
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Sarah Pelan
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Jared T Simpson
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Glen Threadgold
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - James Torrance
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Jonathan M Wood
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Laura Clarke
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Sergey Koren
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | | | - Paul Peluso
- Pacific Biosciences, Menlo Park, California 94025, USA
| | - Heng Li
- Broad Institute, Cambridge, Massachusetts 02142, USA
| | | | - Adam M Phillippy
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Richard Durbin
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Richard K Wilson
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - Deanna M Church
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| |
Collapse
|
17
|
Miga KH. The Promises and Challenges of Genomic Studies of Human Centromeres. PROGRESS IN MOLECULAR AND SUBCELLULAR BIOLOGY 2017; 56:285-304. [PMID: 28840242 DOI: 10.1007/978-3-319-58592-5_12] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Human centromeres are genomic regions that act as sites of kinetochore assembly to ensure proper chromosome segregation during mitosis and meiosis. Although the biological importance of centromeres in genome stability, and ultimately, cell viability are well understood, the complete sequence content and organization in these multi-megabase-sized regions remains unknown. The lack of a high-resolution reference assembly inhibits standard bioinformatics protocols, and as a result, sequence-based studies involving human centromeres lag far behind the advances made for the non-repetitive sequences in the human genome. In this chapter, I introduce what is known about the genomic organization in the highly repetitive regions spanning human centromeres, and discuss the challenges these sequences pose for assembly, alignment, and data interpretation. Overcoming these obstacles is expected to issue a new era for centromere genomics, which will offer new discoveries in basic cell biology and human biomedical research.
Collapse
Affiliation(s)
- Karen H Miga
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, CA, USA.
| |
Collapse
|
18
|
CENP-A and H3 Nucleosomes Display a Similar Stability to Force-Mediated Disassembly. PLoS One 2016; 11:e0165078. [PMID: 27820823 PMCID: PMC5098787 DOI: 10.1371/journal.pone.0165078] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2016] [Accepted: 10/05/2016] [Indexed: 12/12/2022] Open
Abstract
Centromere-specific nucleosomes are a central feature of the kinetochore complex during mitosis, in which microtubules exert pulling and pushing forces upon the centromere. CENP-A nucleosomes have been assumed to be structurally unique, thereby providing resilience under tension relative to their H3 canonical counterparts. Here, we directly test this hypothesis by subjecting CENP-A and H3 octameric nucleosomes, assembled on random or on centromeric DNA sequences, to varying amounts of applied force by using single-molecule magnetic tweezers. We monitor individual disassembly events of CENP-A and H3 nucleosomes. Regardless of the DNA sequence, the force-mediated disassembly experiments for CENP-A and H3 nucleosomes demonstrate similar rupture forces, life time residency and disassembly steps. From these experiments, we conclude that CENP-A does not, by itself, contribute unique structural features to the nucleosome that lead to a significant resistance against force-mediated disruption. The data present insights into the mechanistic basis for how CENP-A nucleosomes might contribute to the structural foundation of the centromere in vivo.
Collapse
|
19
|
Abstract
Genomic studies rely on accurate chromosome assemblies to explore sequence-based models of cell biology, evolution and biomedical disease. However, even the extensively studied human genome has not yet reached a complete, 'telomere-to-telomere', chromosome assembly. The largest assembly gaps remain in centromeric regions and acrocentric short arms, sites known to contain megabase-sized arrays of tandem repeats, or satellite DNAs. This review aims to briefly address the progress and challenges of generating correct assemblies of satellite DNA arrays. Although the focus is placed on the human genome, many concepts presented here are applicable to other genomes.
Collapse
Affiliation(s)
- Karen H Miga
- Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, CA, 95064, USA.
| |
Collapse
|
20
|
Popitsch N, Schuh A, Taylor JC. ReliableGenome: annotation of genomic regions with high/low variant calling concordance. Bioinformatics 2016; 33:155-160. [PMID: 27605105 PMCID: PMC5903559 DOI: 10.1093/bioinformatics/btw587] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2016] [Revised: 08/12/2016] [Accepted: 09/04/2016] [Indexed: 12/30/2022] Open
Abstract
Motivation The increasing adoption of clinical whole-genome resequencing (WGS) demands for highly accurate and reproducible variant calling (VC) methods. The observed discordance between state-of-the-art VC pipelines, however, indicates that the current practice still suffers from non-negligible numbers of false positive and negative SNV and INDEL calls that were shown to be enriched among discordant calls but also in genomic regions with low sequence complexity. Results Here, we describe our method ReliableGenome (RG) for partitioning genomes into high and low concordance regions with respect to a set of surveyed VC pipelines. Our method combines call sets derived by multiple pipelines from arbitrary numbers of datasets and interpolates expected concordance for genomic regions without data. By applying RG to 219 deep human WGS datasets, we demonstrate that VC concordance depends predominantly on genomic context rather than the actual sequencing data which manifests in high recurrence of regions that can/cannot be reliably genotyped by a single method. This enables the application of pre-computed regions to other data created with comparable sequencing technology and software. RG outperforms comparable efforts in predicting VC concordance and false positive calls in low-concordance regions which underlines its usefulness for variant filtering, annotation and prioritization. RG allows focusing resource-intensive algorithms (e.g. consensus calling methods) on the smaller, discordant share of the genome (20–30%) which might result in increased overall accuracy at reasonable costs. Our method and analysis of discordant calls may further be useful for development, benchmarking and optimization of VC algorithms and for the relative comparison of call sets between different studies/pipelines. Availability and Implementation RG was implemented in Java, source code and binaries are freely available for non-commercial use at https://github.com/popitsch/wtchg-rg/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Niko Popitsch
- Wellcome Trust Centre of Human Genetics, University of Oxford, Oxford OX3 7BN, UK.,National Institute for Health Research (NIHR) Oxford Biomedical Research Centre, The Churchill Hospital, Old Road OX3 7LE, UK
| | | | - Anna Schuh
- National Institute for Health Research (NIHR) Oxford Biomedical Research Centre, The Churchill Hospital, Old Road OX3 7LE, UK.,Department of Oncology, University of Oxford, Oxford OX3 7DQ, UK
| | - Jenny C Taylor
- Wellcome Trust Centre of Human Genetics, University of Oxford, Oxford OX3 7BN, UK.,National Institute for Health Research (NIHR) Oxford Biomedical Research Centre, The Churchill Hospital, Old Road OX3 7LE, UK
| |
Collapse
|
21
|
Aldrup-MacDonald ME, Kuo ME, Sullivan LL, Chew K, Sullivan BA. Genomic variation within alpha satellite DNA influences centromere location on human chromosomes with metastable epialleles. Genome Res 2016; 26:1301-1311. [PMID: 27510565 PMCID: PMC5052062 DOI: 10.1101/gr.206706.116] [Citation(s) in RCA: 75] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2016] [Accepted: 08/08/2016] [Indexed: 01/27/2023]
Abstract
Alpha satellite is a tandemly organized type of repetitive DNA that comprises 5% of the genome and is found at all human centromeres. A defined number of 171-bp monomers are organized into chromosome-specific higher-order repeats (HORs) that are reiterated thousands of times. At least half of all human chromosomes have two or more distinct HOR alpha satellite arrays within their centromere regions. We previously showed that the two alpha satellite arrays of Homo sapiens Chromosome 17 (HSA17), D17Z1 and D17Z1-B, behave as centromeric epialleles, that is, the centromere, defined by chromatin containing the centromeric histone variant CENPA and recruitment of other centromere proteins, can form at either D17Z1 or D17Z1-B. Some individuals in the human population are functional heterozygotes in that D17Z1 is the active centromere on one homolog and D17Z1-B is active on the other. In this study, we aimed to understand the molecular basis for how centromere location is determined on HSA17. Specifically, we focused on D17Z1 genomic variation as a driver of epiallele formation. We found that D17Z1 arrays that are predominantly composed of HOR size and sequence variants were functionally less competent. They either recruited decreased amounts of the centromere-specific histone variant CENPA and the HSA17 was mitotically unstable, or alternatively, the centromere was assembled at D17Z1-B and the HSA17 was stable. Our study demonstrates that genomic variation within highly repetitive, noncoding DNA of human centromere regions has a pronounced impact on genome stability and basic chromosomal function.
Collapse
Affiliation(s)
- Megan E Aldrup-MacDonald
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina 27710, USA
| | - Molly E Kuo
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina 27710, USA
| | - Lori L Sullivan
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina 27710, USA
| | - Kimberline Chew
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina 27710, USA
| | - Beth A Sullivan
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina 27710, USA; Division of Human Genetics, Duke University Medical Center, Durham, North Carolina 27710, USA
| |
Collapse
|
22
|
Faber-Hammond JJ, Brown KH. Anchored pseudo-de novo assembly of human genomes identifies extensive sequence variation from unmapped sequence reads. Hum Genet 2016; 135:727-40. [PMID: 27061184 PMCID: PMC4899208 DOI: 10.1007/s00439-016-1667-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2016] [Accepted: 03/29/2016] [Indexed: 01/08/2023]
Abstract
The human genome reference (HGR) completion marked the genomics era beginning, yet despite its utility universal application is limited by the small number of individuals used in its development. This is highlighted by the presence of high-quality sequence reads failing to map within the HGR. Sequences failing to map generally represent 2-5 % of total reads, which may harbor regions that would enhance our understanding of population variation, evolution, and disease. Alternatively, complete de novo assemblies can be created, but these effectively ignore the groundwork of the HGR. In an effort to find a middle ground, we developed a bioinformatic pipeline that maps paired-end reads to the HGR as separate single reads, exports unmappable reads, de novo assembles these reads per individual and then combines assemblies into a secondary reference assembly used for comparative analysis. Using 45 diverse 1000 Genomes Project individuals, we identified 351,361 contigs covering 195.5 Mb of sequence unincorporated in GRCh38. 30,879 contigs are represented in multiple individuals with ~40 % showing high sequence complexity. Genomic coordinates were generated for 99.9 %, with 52.5 % exhibiting high-quality mapping scores. Comparative genomic analyses with archaic humans and primates revealed significant sequence alignments and comparisons with model organism RefSeq gene datasets identified novel human genes. If incorporated, these sequences will expand the HGR, but more importantly our data highlight that with this method low coverage (~10-20×) next-generation sequencing can still be used to identify novel unmapped sequences to explore biological functions contributing to human phenotypic variation, disease and functionality for personal genomic medicine.
Collapse
Affiliation(s)
- Joshua J Faber-Hammond
- Department of Biology, Portland State University, 1719 SW 10th Ave., SRTC 246, Portland, 97207-0751, USA
| | - Kim H Brown
- Department of Biology, Portland State University, 1719 SW 10th Ave., SRTC 246, Portland, 97207-0751, USA.
| |
Collapse
|
23
|
Genomic leftovers: identifying novel microsatellites, over-represented motifs and functional elements in the human genome. Sci Rep 2016; 6:27722. [PMID: 27278669 PMCID: PMC4899811 DOI: 10.1038/srep27722] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2015] [Accepted: 05/23/2016] [Indexed: 01/29/2023] Open
Abstract
The human genome is 99% complete. This study contributes to filling the 1% gap by enriching previously unknown repeat regions called microsatellites (MST). We devised a Global MST Enrichment (GME) kit to enrich and nextgen sequence 2 colorectal cell lines and 16 normal human samples to illustrate its utility in identifying contigs from reads that do not map to the genome reference. The analysis of these samples yielded 790 novel extra-referential concordant contigs that are observed in more than one sample. We searched for evidence of functional elements in the concordant contigs in two ways: (1) BLAST-ing each contig against normal RNA-Seq samples, (2) Checking for predicted functional elements using GlimmerHMM. Of the 790 concordant contigs, 37 had an exact match to at least one RNA-Seq read; 15 aligned to more than 100 RNA-Seq reads. Of the 249 concordant contigs predicted by GlimmerHMM to have functional elements, 6 had at least one exact RNA-Seq match. BLAST-ing these novel contigs against all publically available sequences confirmed that they were found in human and chimpanzee BAC and FOSMID clones sequenced as part of the original human genome project. These extra-referential contigs predominantly contained pentameric repeats, especially two motifs: AATGG and GTGGA.
Collapse
|
24
|
Abstract
Over the last decade, the long accepted dogma that heterochromatin is silent has been challenged by increasing evidence of active transcription in these apocryphally annotated quiescent regions of the genome. The recent discovery of noncoding RNAs (ncRNAs) originating from, or localizing to, centromeres, pericentromeres, and telomeres (ie, constitutive heterochromatin) suggest a potential role for ncRNAs in genome integrity. This new paradigm suggests that ncRNAs may recruit chromatin-binding factors, stabilize the higher order folded state of the chromatin fiber, and participate in regulation of processes such as transcription-mediated nucleosome assembly. Thus, identifying, purifying, and elucidating the function of ncRNAs has the potential to provide key insights into genome organization and is currently a topic of intense experimental investigation.
Collapse
Affiliation(s)
- D Quénet
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, United States.
| | - D Sturgill
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, United States
| | - Y Dalal
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, United States.
| |
Collapse
|