1
|
Freestone J, Noble WS, Keich U. Analysis of Tandem Mass Spectrometry Data with CONGA: Combining Open and Narrow Searches with Group-Wise Analysis. J Proteome Res 2024. [PMID: 38652578 DOI: 10.1021/acs.jproteome.3c00399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/25/2024]
Abstract
Searching for tandem mass spectrometry proteomics data against a database is a well-established method for assigning peptide sequences to observed spectra but typically cannot identify peptides harboring unexpected post-translational modifications (PTMs). Open modification searching aims to address this problem by allowing a spectrum to match a peptide even if the spectrum's precursor mass differs from the peptide mass. However, expanding the search space in this way can lead to a loss of statistical power to detect peptides. We therefore developed a method, called CONGA (combining open and narrow searches with group-wise analysis), that takes into account results from both types of searches─a traditional "narrow window" search and an open modification search─while carrying out rigorous false discovery rate control. The result is an algorithm that provides the best of both worlds: the ability to detect unexpected PTMs without a concomitant loss of power to detect unmodified peptides.
Collapse
Affiliation(s)
- Jack Freestone
- School of Mathematics and Statistics F07, University of Sydney, NSW 2006, Australia
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| | - Uri Keich
- School of Mathematics and Statistics F07, University of Sydney, NSW 2006, Australia
| |
Collapse
|
2
|
Wu CC, Tsantilas KA, Park J, Plubell D, Sanders JA, Naicker P, Govender I, Buthelezi S, Stoychev S, Jordaan J, Merrihew G, Huang E, Parker ED, Riffle M, Hoofnagle AN, Noble WS, Poston KL, Montine TJ, MacCoss MJ. Mag-Net: Rapid enrichment of membrane-bound particles enables high coverage quantitative analysis of the plasma proteome. bioRxiv 2024:2023.06.10.544439. [PMID: 38617345 PMCID: PMC11014469 DOI: 10.1101/2023.06.10.544439] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Membrane-bound particles in plasma are composed of exosomes, microvesicles, and apoptotic bodies and represent ~1-2% of the total protein composition. Proteomic interrogation of this subset of plasma proteins augments the representation of tissue-specific proteins, representing a "liquid biopsy," while enabling the detection of proteins that would otherwise be beyond the dynamic range of liquid chromatography-tandem mass spectrometry of unfractionated plasma. We have developed an enrichment strategy (Mag-Net) using hyper-porous strong-anion exchange magnetic microparticles to sieve membrane-bound particles from plasma. The Mag-Net method is robust, reproducible, inexpensive, and requires <100 μL plasma input. Coupled to a quantitative data-independent mass spectrometry analytical strategy, we demonstrate that we can collect results for >37,000 peptides from >4,000 plasma proteins with high precision. Using this analytical pipeline on a small cohort of patients with neurodegenerative disease and healthy age-matched controls, we discovered 204 proteins that differentiate (q-value < 0.05) patients with Alzheimer's disease dementia (ADD) from those without ADD. Our method also discovered 310 proteins that were different between Parkinson's disease and those with either ADD or healthy cognitively normal individuals. Using machine learning we were able to distinguish between ADD and not ADD with a mean ROC AUC = 0.98 ± 0.06.
Collapse
Affiliation(s)
- Christine C. Wu
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | | | - Jea Park
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Deanna Plubell
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Justin A. Sanders
- Department of Computer Science, University of Washington, Seattle, WA, USA
| | | | | | | | | | | | - Gennifer Merrihew
- Department of Computer Science, University of Washington, Seattle, WA, USA
| | - Eric Huang
- Department of Computer Science, University of Washington, Seattle, WA, USA
| | - Edward D. Parker
- Vision Core Lab, Department of Ophthalmology, University of Washington, Seattle, WA, USA
| | - Michael Riffle
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Andrew N. Hoofnagle
- Department of Lab Medicine and Pathology, University of Washington, Seattle, WA, USA
| | - William S. Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Department of Computer Science, University of Washington, Seattle, WA, USA
| | - Kathleen L. Poston
- Department of Neurology & Neurological Sciences, Stanford University, Palo Alto CA, USA
| | | | - Michael J. MacCoss
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| |
Collapse
|
3
|
Aguilar R, Camplisson CK, Lin Q, Miga KH, Noble WS, Beliveau BJ. Tigerfish designs oligonucleotide-based in situ hybridization probes targeting intervals of highly repetitive DNA at the scale of genomes. Nat Commun 2024; 15:1027. [PMID: 38310092 PMCID: PMC10838309 DOI: 10.1038/s41467-024-45385-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 01/22/2024] [Indexed: 02/05/2024] Open
Abstract
Fluorescent in situ hybridization (FISH) is a powerful method for the targeted visualization of nucleic acids in their native contexts. Recent technological advances have leveraged computationally designed oligonucleotide (oligo) probes to interrogate > 100 distinct targets in the same sample, pushing the boundaries of FISH-based assays. However, even in the most highly multiplexed experiments, repetitive DNA regions are typically not included as targets, as the computational design of specific probes against such regions presents significant technical challenges. Consequently, many open questions remain about the organization and function of highly repetitive sequences. Here, we introduce Tigerfish, a software tool for the genome-scale design of oligo probes against repetitive DNA intervals. We showcase Tigerfish by designing a panel of 24 interval-specific repeat probes specific to each of the 24 human chromosomes and imaging this panel on metaphase spreads and in interphase nuclei. Tigerfish extends the powerful toolkit of oligo-based FISH to highly repetitive DNA.
Collapse
Affiliation(s)
- Robin Aguilar
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Conor K Camplisson
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Qiaoyi Lin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Karen H Miga
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
| | - Brian J Beliveau
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA.
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, USA.
| |
Collapse
|
4
|
Harris L, Fondrie WE, Oh S, Noble WS. Evaluating Proteomics Imputation Methods with Improved Criteria. J Proteome Res 2023; 22:3427-3438. [PMID: 37861703 PMCID: PMC10949645 DOI: 10.1021/acs.jproteome.3c00205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2023]
Abstract
Quantitative measurements produced by tandem mass spectrometry proteomics experiments typically contain a large proportion of missing values. Missing values hinder reproducibility, reduce statistical power, and make it difficult to compare across samples or experiments. Although many methods exist for imputing missing values, in practice, the most commonly used methods are among the worst performing. Furthermore, previous benchmarking studies have focused on relatively simple measurements of error such as the mean-squared error between imputed and held-out values. Here we evaluate the performance of commonly used imputation methods using three practical, "downstream-centric" criteria. These criteria measure the ability to identify differentially expressed peptides, generate new quantitative peptides, and improve the peptide lower limit of quantification. Our evaluation comprises several experiment types and acquisition strategies, including data-dependent and data-independent acquisition. We find that imputation does not necessarily improve the ability to identify differentially expressed peptides but that it can identify new quantitative peptides and improve the peptide lower limit of quantification. We find that MissForest is generally the best performing method per our downstream-centric criteria. We also argue that existing imputation methods do not properly account for the variance of peptide quantifications and highlight the need for methods that do.
Collapse
Affiliation(s)
- Lincoln Harris
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | | | - Sewoong Oh
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| |
Collapse
|
5
|
Dekker J, Alber F, Aufmkolk S, Beliveau BJ, Bruneau BG, Belmont AS, Bintu L, Boettiger A, Calandrelli R, Disteche CM, Gilbert DM, Gregor T, Hansen AS, Huang B, Huangfu D, Kalhor R, Leslie CS, Li W, Li Y, Ma J, Noble WS, Park PJ, Phillips-Cremins JE, Pollard KS, Rafelski SM, Ren B, Ruan Y, Shav-Tal Y, Shen Y, Shendure J, Shu X, Strambio-De-Castillia C, Vertii A, Zhang H, Zhong S. Spatial and temporal organization of the genome: Current state and future aims of the 4D nucleome project. Mol Cell 2023; 83:2624-2640. [PMID: 37419111 PMCID: PMC10528254 DOI: 10.1016/j.molcel.2023.06.018] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Revised: 06/10/2023] [Accepted: 06/12/2023] [Indexed: 07/09/2023]
Abstract
The four-dimensional nucleome (4DN) consortium studies the architecture of the genome and the nucleus in space and time. We summarize progress by the consortium and highlight the development of technologies for (1) mapping genome folding and identifying roles of nuclear components and bodies, proteins, and RNA, (2) characterizing nuclear organization with time or single-cell resolution, and (3) imaging of nuclear organization. With these tools, the consortium has provided over 2,000 public datasets. Integrative computational models based on these data are starting to reveal connections between genome structure and function. We then present a forward-looking perspective and outline current aims to (1) delineate dynamics of nuclear architecture at different timescales, from minutes to weeks as cells differentiate, in populations and in single cells, (2) characterize cis-determinants and trans-modulators of genome organization, (3) test functional consequences of changes in cis- and trans-regulators, and (4) develop predictive models of genome structure and function.
Collapse
Affiliation(s)
- Job Dekker
- University of Massachusetts Chan Medical School, Boston, MA, USA; Howard Hughes Medical Institute, Chevy Chase, MD, USA.
| | - Frank Alber
- University of California, Los Angeles, Los Angeles, CA, USA
| | | | | | - Benoit G Bruneau
- Gladstone Institutes, San Francisco, CA, USA; University of California, San Francisco, San Francisco, CA, USA
| | | | | | | | | | | | | | | | | | - Bo Huang
- University of California, San Francisco, San Francisco, CA, USA
| | - Danwei Huangfu
- Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Reza Kalhor
- Johns Hopkins University, Baltimore, MD, USA
| | | | - Wenbo Li
- University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Yun Li
- University of North Carolina, Gillings School of Global Public Health, Chapel Hill, NC, USA
| | - Jian Ma
- Carnegie Mellon University, Pittsburgh, PA, USA
| | | | | | | | - Katherine S Pollard
- Gladstone Institutes, San Francisco, CA, USA; University of California, San Francisco, San Francisco, CA, USA; Chan Zuckerberg Biohub, San Francisco, San Francisco, CA, USA
| | | | - Bing Ren
- University of California, San Diego, La Jolla, CA, USA
| | - Yijun Ruan
- Zhejiang University, Hangzhou, Zhejiang, China
| | | | - Yin Shen
- University of California, San Francisco, San Francisco, CA, USA
| | | | - Xiaokun Shu
- University of California, San Francisco, San Francisco, CA, USA
| | | | | | | | - Sheng Zhong
- University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
6
|
Yang R, Das A, Gao VR, Karbalayghareh A, Noble WS, Bilmes JA, Leslie CS. Epiphany: predicting Hi-C contact maps from 1D epigenomic signals. Genome Biol 2023; 24:134. [PMID: 37280678 PMCID: PMC10242996 DOI: 10.1186/s13059-023-02934-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 04/06/2023] [Indexed: 06/08/2023] Open
Abstract
Recent deep learning models that predict the Hi-C contact map from DNA sequence achieve promising accuracy but cannot generalize to new cell types and or even capture differences among training cell types. We propose Epiphany, a neural network to predict cell-type-specific Hi-C contact maps from widely available epigenomic tracks. Epiphany uses bidirectional long short-term memory layers to capture long-range dependencies and optionally a generative adversarial network architecture to encourage contact map realism. Epiphany shows excellent generalization to held-out chromosomes within and across cell types, yields accurate TAD and interaction calls, and predicts structural changes caused by perturbations of epigenomic signals.
Collapse
Affiliation(s)
- Rui Yang
- Memorial Sloan Kettering Cancer Center, New York, USA
| | - Arnav Das
- University of Washington, Seattle, USA
| | - Vianne R. Gao
- Memorial Sloan Kettering Cancer Center, New York, USA
| | | | | | | | | |
Collapse
|
7
|
Ebadi A, Freestone J, Noble WS, Keich U. Bridging the False Discovery Gap. J Proteome Res 2023. [PMID: 37261867 DOI: 10.1021/acs.jproteome.3c00176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Controlling the false discovery rate (FDR) among discoveries from a tandem mass spectrometry proteomics experiment using target decoy competition (TDC) controls only the proportion of false discoveries in an average sense. Thus, for any particular analysis, even with a valid FDR control procedure, the proportion of false discoveries (the FDP) may be higher than the specified FDR threshold. We demonstrate this phenomenon using real data and describe two recently developed methods that help bridge the gap between controlling the expected or average rate of false discoveries and the empirical rate (FDP). The FDP Stepdown method controls the FDP at any desired confidence level, and the TDC Uniform Band provides a confidence, or upper prediction bound, on the FDP in TDC's list of discoveries.
Collapse
Affiliation(s)
- Arya Ebadi
- School of Mathematics and Statistics F07, University of Sydney, Sydney, New South Wales 2006, Australia
| | - Jack Freestone
- School of Mathematics and Statistics F07, University of Sydney, Sydney, New South Wales 2006, Australia
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| | - Uri Keich
- School of Mathematics and Statistics F07, University of Sydney, Sydney, New South Wales 2006, Australia
| |
Collapse
|
8
|
Fang H, Tronco AR, Bonora G, Nguyen T, Thakur J, Berletch JB, Filippova GN, Henikoff S, Shendure J, Noble WS, Disteche CM, Deng X. CTCF-mediated insulation and chromatin environment modulate Car5b escape from X inactivation. bioRxiv 2023:2023.05.04.539469. [PMID: 37205597 PMCID: PMC10187265 DOI: 10.1101/2023.05.04.539469] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Background The number and escape levels of genes that escape X chromosome inactivation (XCI) in female somatic cells vary among tissues and cell types, potentially contributing to specific sex differences. Here we investigate the role of CTCF, a master chromatin conformation regulator, in regulating escape from XCI. CTCF binding profiles and epigenetic features were systematically examined at constitutive and facultative escape genes using mouse allelic systems to distinguish the inactive X (Xi) and active X (Xa) chromosomes. Results We found that escape genes are located inside domains flanked by convergent arrays of CTCF binding sites, consistent with the formation of loops. In addition, strong and divergent CTCF binding sites often located at the boundaries between escape genes and adjacent neighbors subject to XCI would help insulate domains. Facultative escapees show clear differences in CTCF binding dependent on their XCI status in specific cell types/tissues. Concordantly, deletion but not inversion of a CTCF binding site at the boundary between the facultative escape gene Car5b and its silent neighbor Siah1b resulted in loss of Car5b escape. Reduced CTCF binding and enrichment of a repressive mark over Car5b in cells with a boundary deletion indicated loss of looping and insulation. In mutant lines in which either the Xi-specific compact structure or its H3K27me3 enrichment was disrupted, escape genes showed an increase in gene expression and associated active marks, supporting the roles of the 3D Xi structure and heterochromatic marks in constraining levels of escape. Conclusion Our findings indicate that escape from XCI is modulated both by looping and insulation of chromatin via convergent arrays of CTCF binding sites and by compaction and epigenetic features of the surrounding heterochromatin.
Collapse
Affiliation(s)
- He Fang
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, 98195
| | - Ana R Tronco
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, 98195
| | - Giancarlo Bonora
- Department of Genome Sciences, University of Washington, Seattle, WA, 98195
| | - Truong Nguyen
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, 98195
| | - Jitendra Thakur
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, 98109
| | - Joel B Berletch
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, 98195
| | - Galina N Filippova
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, 98195
| | - Steven Henikoff
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, 98109
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, 98195
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, 98195
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, 98195
| | - Christine M Disteche
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, 98195
- Department of Medicine, University of Washington, Seattle, WA, 98195
| | - Xinxian Deng
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, 98195
| |
Collapse
|
9
|
Rozowsky J, Gao J, Borsari B, Yang YT, Galeev T, Gürsoy G, Epstein CB, Xiong K, Xu J, Li T, Liu J, Yu K, Berthel A, Chen Z, Navarro F, Sun MS, Wright J, Chang J, Cameron CJF, Shoresh N, Gaskell E, Drenkow J, Adrian J, Aganezov S, Aguet F, Balderrama-Gutierrez G, Banskota S, Corona GB, Chee S, Chhetri SB, Cortez Martins GC, Danyko C, Davis CA, Farid D, Farrell NP, Gabdank I, Gofin Y, Gorkin DU, Gu M, Hecht V, Hitz BC, Issner R, Jiang Y, Kirsche M, Kong X, Lam BR, Li S, Li B, Li X, Lin KZ, Luo R, Mackiewicz M, Meng R, Moore JE, Mudge J, Nelson N, Nusbaum C, Popov I, Pratt HE, Qiu Y, Ramakrishnan S, Raymond J, Salichos L, Scavelli A, Schreiber JM, Sedlazeck FJ, See LH, Sherman RM, Shi X, Shi M, Sloan CA, Strattan JS, Tan Z, Tanaka FY, Vlasova A, Wang J, Werner J, Williams B, Xu M, Yan C, Yu L, Zaleski C, Zhang J, Ardlie K, Cherry JM, Mendenhall EM, Noble WS, Weng Z, Levine ME, Dobin A, Wold B, Mortazavi A, Ren B, Gillis J, Myers RM, Snyder MP, Choudhary J, Milosavljevic A, Schatz MC, Bernstein BE, Guigó R, Gingeras TR, Gerstein M. The EN-TEx resource of multi-tissue personal epigenomes & variant-impact models. Cell 2023; 186:1493-1511.e40. [PMID: 37001506 PMCID: PMC10074325 DOI: 10.1016/j.cell.2023.02.018] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Revised: 10/16/2022] [Accepted: 02/10/2023] [Indexed: 04/03/2023]
Abstract
Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of 1,635 open-access datasets from four donors (∼30 tissues × ∼15 assays). The datasets are mapped to matched, diploid genomes with long-read phasing and structural variants, instantiating a catalog of >1 million allele-specific loci. These loci exhibit coordinated activity along haplotypes and are less conserved than corresponding, non-allele-specific ones. Surprisingly, a deep-learning transformer model can predict the allele-specific activity based only on local nucleotide-sequence context, highlighting the importance of transcription-factor-binding motifs particularly sensitive to variants. Furthermore, combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci. It also enables models for transferring known eQTLs to difficult-to-profile tissues (e.g., from skin to heart). Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.
Collapse
Affiliation(s)
- Joel Rozowsky
- Section on Biomedical Informatics and Data Science, Yale University, New Haven, CT, USA; Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Jiahao Gao
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Beatrice Borsari
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA; Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Yucheng T Yang
- Institute of Science and Technology for Brain-Inspired Intelligence; MOE Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence; MOE Frontiers Center for Brain Science, Fudan University, Shanghai 200433, China; Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Timur Galeev
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Gamze Gürsoy
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | | | - Kun Xiong
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Jinrui Xu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Tianxiao Li
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Jason Liu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Keyang Yu
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Ana Berthel
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Zhanlin Chen
- Department of Statistics and Data Science, Yale University, New Haven, CT, USA
| | - Fabio Navarro
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Maxwell S Sun
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | | | - Justin Chang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Christopher J F Cameron
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Noam Shoresh
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | - Jorg Drenkow
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Jessika Adrian
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Sergey Aganezov
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA
| | | | | | | | | | - Sora Chee
- Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA
| | - Surya B Chhetri
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - Gabriel Conte Cortez Martins
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Cassidy Danyko
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Carrie A Davis
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Daniel Farid
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | | | - Idan Gabdank
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Yoel Gofin
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - David U Gorkin
- Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA
| | - Mengting Gu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Vivian Hecht
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Benjamin C Hitz
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Robbyn Issner
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Yunzhe Jiang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Melanie Kirsche
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Xiangmeng Kong
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Bonita R Lam
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Shantao Li
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Bian Li
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Xiqi Li
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Khine Zin Lin
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Ruibang Luo
- Department of Computer Science, The University of Hong Kong, Hong Kong, CHN
| | - Mark Mackiewicz
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - Ran Meng
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Jill E Moore
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Jonathan Mudge
- European Bioinformatics Institute, Cambridge, Cambridgeshire, GB
| | | | - Chad Nusbaum
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ioann Popov
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Henry E Pratt
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Yunjiang Qiu
- Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA
| | - Srividya Ramakrishnan
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Joe Raymond
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Leonidas Salichos
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA; Department of Biological and Chemical Sciences, New York Institute of Technology, Old Westbury, NY, USA
| | - Alexandra Scavelli
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Jacob M Schreiber
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Fritz J Sedlazeck
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA; Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Lei Hoon See
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Rachel M Sherman
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Xu Shi
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Minyi Shi
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Cricket Alicia Sloan
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - J Seth Strattan
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Zhen Tan
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Forrest Y Tanaka
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Anna Vlasova
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain; Comparative Genomics Group, Life Science Programme, Barcelona Supercomputing Centre, Barcelona, Spain; Institute of Research in Biomedicine, Barcelona, Spain
| | - Jun Wang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Jonathan Werner
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Brian Williams
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Min Xu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Chengfei Yan
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - Lu Yu
- Institute of Cancer Research, London, UK
| | - Christopher Zaleski
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Jing Zhang
- Department of Computer Science, University of California, Irvine, Irvine, CA, USA
| | | | - J Michael Cherry
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | | | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Morgan E Levine
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Pathology, Yale University School of Medicine, New Haven, CT, USA
| | - Alexander Dobin
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Barbara Wold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Ali Mortazavi
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA
| | - Bing Ren
- Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA
| | - Jesse Gillis
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA; Department of Physiology, University of Toronto, Toronto, ON, Canada
| | - Richard M Myers
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | - Michael P Snyder
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | | | | | - Michael C Schatz
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA; Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| | - Bradley E Bernstein
- Broad Institute of MIT and Harvard, Cambridge, MA, USA; Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA.
| | - Roderic Guigó
- Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain; Universitat Pompeu Fabra, Barcelona, Catalonia, Spain.
| | - Thomas R Gingeras
- Functional Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| | - Mark Gerstein
- Section on Biomedical Informatics and Data Science, Yale University, New Haven, CT, USA; Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA; Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA; Department of Statistics and Data Science, Yale University, New Haven, CT, USA; Department of Computer Science, Yale University, New Haven, CT, USA.
| |
Collapse
|
10
|
Aguilar R, Camplisson CK, Lin Q, Miga KH, Noble WS, Beliveau BJ. Tigerfish designs oligonucleotide-based in situ hybridization probes targeting intervals of highly repetitive DNA at the scale of genomes. bioRxiv 2023:2023.03.06.530899. [PMID: 36945528 PMCID: PMC10028787 DOI: 10.1101/2023.03.06.530899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/30/2023]
Abstract
Fluorescent in situ hybridization (FISH) is a powerful method for the targeted visualization of nucleic acids in their native contexts. Recent technological advances have leveraged computationally designed oligonucleotide (oligo) probes to interrogate >100 distinct targets in the same sample, pushing the boundaries of FISH-based assays. However, even in the most highly multiplexed experiments, repetitive DNA regions are typically not included as targets, as the computational design of specific probes against such regions presents significant technical challenges. Consequently, many open questions remain about the organization and function of highly repetitive sequences. Here, we introduce Tigerfish, a software tool for the genome-scale design of oligo probes against repetitive DNA intervals. We showcase Tigerfish by designing a panel of 24 interval-specific repeat probes specific to each of the 24 human chromosomes and imaging this panel on metaphase spreads and in interphase nuclei. Tigerfish extends the powerful toolkit of oligo-based FISH to highly repetitive DNA.
Collapse
Affiliation(s)
- Robin Aguilar
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | | | - Qiaoyi Lin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Karen H. Miga
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, CA, USA
| | - William S. Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Brian J. Beliveau
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| |
Collapse
|
11
|
Alavattam KG, Mitzelfelt KA, Bonora G, Fields PA, Yang X, Chiu HS, Pabon L, Bertero A, Palpant NJ, Noble WS, Murry CE. Dynamic chromatin organization and regulatory interactions in human endothelial cell differentiation. Stem Cell Reports 2023; 18:159-174. [PMID: 36493778 PMCID: PMC9860068 DOI: 10.1016/j.stemcr.2022.11.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2022] [Revised: 11/07/2022] [Accepted: 11/07/2022] [Indexed: 12/10/2022] Open
Abstract
Vascular endothelial cells are a mesoderm-derived lineage with many essential functions, including angiogenesis and coagulation. The gene-regulatory mechanisms underpinning endothelial specialization are largely unknown, as are the roles of chromatin organization in regulating endothelial cell transcription. To investigate the relationships between chromatin organization and gene expression, we induced endothelial cell differentiation from human pluripotent stem cells and performed Hi-C and RNA-sequencing assays at specific time points. Long-range intrachromosomal contacts increase over the course of differentiation, accompanied by widespread heteroeuchromatic compartment transitions that are tightly associated with transcription. Dynamic topologically associating domain boundaries strengthen and converge on an endothelial cell state, and function to regulate gene expression. Chromatin pairwise point interactions (DNA loops) increase in frequency during differentiation and are linked to the expression of genes essential to vascular biology. Chromatin dynamics guide transcription in endothelial cell development and promote the divergence of endothelial cells from cardiomyocytes.
Collapse
Affiliation(s)
- Kris G Alavattam
- Department of Laboratory Medicine and Pathology, University of Washington, 1959 NE Pacific Street, Seattle, WA 98195, USA; Center for Cardiovascular Biology, University of Washington, 850 Republican Street, Brotman Building, Seattle, WA 98109, USA; Institute for Stem Cell and Regenerative Medicine, University of Washington, 850 Republican Street, Seattle, WA 98109, USA; Department of Genome Sciences, University of Washington, William H. Foege Hall, 3720 15th Avenue NE, Seattle, WA 98195, USA
| | - Katie A Mitzelfelt
- Department of Laboratory Medicine and Pathology, University of Washington, 1959 NE Pacific Street, Seattle, WA 98195, USA; Center for Cardiovascular Biology, University of Washington, 850 Republican Street, Brotman Building, Seattle, WA 98109, USA; Institute for Stem Cell and Regenerative Medicine, University of Washington, 850 Republican Street, Seattle, WA 98109, USA
| | - Giancarlo Bonora
- Department of Genome Sciences, University of Washington, William H. Foege Hall, 3720 15th Avenue NE, Seattle, WA 98195, USA
| | - Paul A Fields
- Department of Laboratory Medicine and Pathology, University of Washington, 1959 NE Pacific Street, Seattle, WA 98195, USA; Center for Cardiovascular Biology, University of Washington, 850 Republican Street, Brotman Building, Seattle, WA 98109, USA; Institute for Stem Cell and Regenerative Medicine, University of Washington, 850 Republican Street, Seattle, WA 98109, USA
| | - Xiulan Yang
- Department of Laboratory Medicine and Pathology, University of Washington, 1959 NE Pacific Street, Seattle, WA 98195, USA; Center for Cardiovascular Biology, University of Washington, 850 Republican Street, Brotman Building, Seattle, WA 98109, USA; Institute for Stem Cell and Regenerative Medicine, University of Washington, 850 Republican Street, Seattle, WA 98109, USA
| | - Han Sheng Chiu
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia; Centre for Cardiac and Vascular Biology, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Lil Pabon
- Department of Laboratory Medicine and Pathology, University of Washington, 1959 NE Pacific Street, Seattle, WA 98195, USA; Center for Cardiovascular Biology, University of Washington, 850 Republican Street, Brotman Building, Seattle, WA 98109, USA; Institute for Stem Cell and Regenerative Medicine, University of Washington, 850 Republican Street, Seattle, WA 98109, USA; Sana Biotechnology, Seattle, WA 98102, USA
| | - Alessandro Bertero
- Department of Laboratory Medicine and Pathology, University of Washington, 1959 NE Pacific Street, Seattle, WA 98195, USA; Center for Cardiovascular Biology, University of Washington, 850 Republican Street, Brotman Building, Seattle, WA 98109, USA; Institute for Stem Cell and Regenerative Medicine, University of Washington, 850 Republican Street, Seattle, WA 98109, USA
| | - Nathan J Palpant
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD 4072, Australia; Centre for Cardiac and Vascular Biology, The University of Queensland, Brisbane, QLD 4072, Australia; School of Biomedical Sciences, The University of Queensland, Brisbane, QLD 4072, Australia
| | - William S Noble
- Department of Genome Sciences, University of Washington, William H. Foege Hall, 3720 15th Avenue NE, Seattle, WA 98195, USA; Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA.
| | - Charles E Murry
- Department of Laboratory Medicine and Pathology, University of Washington, 1959 NE Pacific Street, Seattle, WA 98195, USA; Center for Cardiovascular Biology, University of Washington, 850 Republican Street, Brotman Building, Seattle, WA 98109, USA; Institute for Stem Cell and Regenerative Medicine, University of Washington, 850 Republican Street, Seattle, WA 98109, USA; Sana Biotechnology, Seattle, WA 98102, USA; Department of Medicine/Cardiology, University of Washington, 1959 NE Pacific Street, Seattle, WA 98195, USA; Department of Bioengineering, University of Washington, 1959 NE Pacific Street, Seattle, WA 98195, USA.
| |
Collapse
|
12
|
Abstract
MOTIVATION We address the challenge of inferring a consensus 3D model of genome architecture from Hi-C data. Existing approaches most often rely on a two-step algorithm: first, convert the contact counts into distances, then optimize an objective function akin to multidimensional scaling (MDS) to infer a 3D model. Other approaches use a maximum likelihood approach, modeling the contact counts between two loci as a Poisson random variable whose intensity is a decreasing function of the distance between them. However, a Poisson model of contact counts implies that the variance of the data is equal to the mean, a relationship that is often too restrictive to properly model count data. RESULTS We first confirm the presence of overdispersion in several real Hi-C datasets, and we show that the overdispersion arises even in simulated datasets. We then propose a new model, called Pastis-NB, where we replace the Poisson model of contact counts by a negative binomial one, which is parametrized by a mean and a separate dispersion parameter. The dispersion parameter allows the variance to be adjusted independently from the mean, thus better modeling overdispersed data. We compare the results of Pastis-NB to those of several previously published algorithms, both MDS-based and statistical methods. We show that the negative binomial inference yields more accurate structures on simulated data, and more robust structures than other models across real Hi-C replicates and across different resolutions. AVAILABILITY AND IMPLEMENTATION A Python implementation of Pastis-NB is available at https://github.com/hiclib/pastis under the BSD license. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nelle Varoquaux
- TIMC, Université Grenoble Alpes, CNRS, Grenoble INP, Grenoble 38000, France
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - Jean-Philippe Vert
- Brain Team, Google Research, Paris 75009, France
- Centre for Computational Biology , MINES ParisTech, PSL University, Paris 75006, France
| |
Collapse
|
13
|
Heil LR, Fondrie WE, McGann CD, Federation AJ, Noble WS, MacCoss MJ, Keich U. Building Spectral Libraries from Narrow-Window Data-Independent Acquisition Mass Spectrometry Data. J Proteome Res 2022; 21:1382-1391. [PMID: 35549345 DOI: 10.1021/acs.jproteome.1c00895] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Advances in library-based methods for peptide detection from data-independent acquisition (DIA) mass spectrometry have made it possible to detect and quantify tens of thousands of peptides in a single mass spectrometry run. However, many of these methods rely on a comprehensive, high-quality spectral library containing information about the expected retention time and fragmentation patterns of peptides in the sample. Empirical spectral libraries are often generated through data-dependent acquisition and may suffer from biases as a result. Spectral libraries can be generated in silico, but these models are not trained to handle all possible post-translational modifications. Here, we propose a false discovery rate-controlled spectrum-centric search workflow to generate spectral libraries directly from gas-phase fractionated DIA tandem mass spectrometry data. We demonstrate that this strategy is able to detect phosphorylated peptides and can be used to generate a spectral library for accurate peptide detection and quantitation in wide-window DIA data. We compare the results of this search workflow to other library-free approaches and demonstrate that our search is competitive in terms of accuracy and sensitivity. These results demonstrate that the proposed workflow has the capacity to generate spectral libraries while avoiding the limitations of other methods.
Collapse
Affiliation(s)
- Lilian R Heil
- Department of Genome Sciences, University of Washington, Seattle, Washington 98105, United States
| | - William E Fondrie
- Department of Genome Sciences, University of Washington, Seattle, Washington 98105, United States
| | - Christopher D McGann
- Department of Genome Sciences, University of Washington, Seattle, Washington 98105, United States
| | - Alexander J Federation
- Department of Genome Sciences, University of Washington, Seattle, Washington 98105, United States
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington 98105, United States.,Paul G. Allen School for Computer Science and Engineering, University of Washington, Seattle, Washington 98105, United States
| | - Michael J MacCoss
- Department of Genome Sciences, University of Washington, Seattle, Washington 98105, United States
| | - Uri Keich
- School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia
| |
Collapse
|
14
|
Phipps WS, Smith KD, Yang HY, Henderson CM, Pflaum H, Lerch ML, Fondrie WE, Emrick MA, Wu CC, MacCoss MJ, Noble WS, Hoofnagle AN. Tandem Mass Spectrometry-Based Amyloid Typing Using Manual Microdissection and Open-Source Data Processing. Am J Clin Pathol 2022; 157:748-757. [PMID: 35512256 PMCID: PMC9071319 DOI: 10.1093/ajcp/aqab185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 09/20/2021] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVES Standard implementations of amyloid typing by liquid chromatography-tandem mass spectrometry use capabilities unavailable to most clinical laboratories. To improve accessibility of this testing, we explored easier approaches to tissue sampling and data processing. METHODS We validated a typing method using manual sampling in place of laser microdissection, pairing the technique with a semiquantitative measure of sampling adequacy. In addition, we created an open-source data processing workflow (Crux Pipeline) for clinical users. RESULTS Cases of amyloidosis spanning the major types were distinguishable with 100% specificity using measurements of individual amyloidogenic proteins or in combination with the ratio of λ and κ constant regions. Crux Pipeline allowed for rapid, batched data processing, integrating the steps of peptide identification, statistical confidence estimation, and label-free protein quantification. CONCLUSIONS Accurate mass spectrometry-based amyloid typing is possible without laser microdissection. To facilitate entry into solid tissue proteomics, newcomers can leverage manual sampling approaches in combination with Crux Pipeline and related tools.
Collapse
Affiliation(s)
- William S Phipps
- Department of Laboratory Medicine and Pathology, Seattle, WA, USA
| | - Kelly D Smith
- Department of Laboratory Medicine and Pathology, Seattle, WA, USA
- Department of Medicine, Seattle, WA, USA
| | - Han-Yin Yang
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Clark M Henderson
- Department of Laboratory Medicine and Pathology, Seattle, WA, USA
- Seagen, Bothel, WA, USA
| | - Hannah Pflaum
- Department of Laboratory Medicine and Pathology, Seattle, WA, USA
- Seattle Children’s Hospital, Seattle, WA, USA
| | - Melissa L Lerch
- Department of Laboratory Medicine and Pathology, Seattle, WA, USA
| | - William E Fondrie
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | | | - Christine C Wu
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Michael J MacCoss
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Andrew N Hoofnagle
- Department of Laboratory Medicine and Pathology, Seattle, WA, USA
- Department of Medicine, Seattle, WA, USA
| |
Collapse
|
15
|
Moore JE, Purcaro MJ, Pratt HE, Epstein CB, Shoresh N, Adrian J, Kawli T, Davis CA, Dobin A, Kaul R, Halow J, Van Nostrand EL, Freese P, Gorkin DU, Shen Y, He Y, Mackiewicz M, Pauli-Behn F, Williams BA, Mortazavi A, Keller CA, Zhang XO, Elhajjajy SI, Huey J, Dickel DE, Snetkova V, Wei X, Wang X, Rivera-Mulia JC, Rozowsky J, Zhang J, Chhetri SB, Zhang J, Victorsen A, White KP, Visel A, Yeo GW, Burge CB, Lécuyer E, Gilbert DM, Dekker J, Rinn J, Mendenhall EM, Ecker JR, Kellis M, Klein RJ, Noble WS, Kundaje A, Guigó R, Farnham PJ, Cherry JM, Myers RM, Ren B, Graveley BR, Gerstein MB, Pennacchio LA, Snyder MP, Bernstein BE, Wold B, Hardison RC, Gingeras TR, Stamatoyannopoulos JA, Weng Z. Author Correction: Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 2022; 605:E3. [PMID: 35474001 PMCID: PMC9095460 DOI: 10.1038/s41586-021-04226-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Affiliation(s)
| | - Jill E Moore
- University of Massachusetts Medical School, Program in Bioinformatics and Integrative Biology, Worcester, MA, USA
| | - Michael J Purcaro
- University of Massachusetts Medical School, Program in Bioinformatics and Integrative Biology, Worcester, MA, USA
| | - Henry E Pratt
- University of Massachusetts Medical School, Program in Bioinformatics and Integrative Biology, Worcester, MA, USA
| | | | - Noam Shoresh
- The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Jessika Adrian
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Trupti Kawli
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Carrie A Davis
- Cold Spring Harbor Laboratory, Functional Genomics, Cold Spring Harbor, NY, USA
| | - Alexander Dobin
- Cold Spring Harbor Laboratory, Functional Genomics, Cold Spring Harbor, NY, USA
| | - Rajinder Kaul
- Altius Institute for Biomedical Sciences, Seattle, WA, USA.,Department of Medicine, University of Washington School of Medicine, Seattle, WA, USA
| | - Jessica Halow
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
| | - Eric L Van Nostrand
- Department of Cellular and Molecular Medicine, Institute for Genomic Medicine, Stem Cell Program, Sanford Consortium for Regenerative Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Peter Freese
- Program in Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - David U Gorkin
- Center for Epigenomics, Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA.,Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA
| | - Yin Shen
- Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA.,Institute for Human Genetics, Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Yupeng He
- Genomics Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Mark Mackiewicz
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | | | - Brian A Williams
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Ali Mortazavi
- Department of Developmental and Cell Biology, University of California Irvine, Irvine, CA, USA
| | - Cheryl A Keller
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA
| | - Xiao-Ou Zhang
- University of Massachusetts Medical School, Program in Bioinformatics and Integrative Biology, Worcester, MA, USA
| | - Shaimae I Elhajjajy
- University of Massachusetts Medical School, Program in Bioinformatics and Integrative Biology, Worcester, MA, USA
| | - Jack Huey
- University of Massachusetts Medical School, Program in Bioinformatics and Integrative Biology, Worcester, MA, USA
| | - Diane E Dickel
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Valentina Snetkova
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Xintao Wei
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, UConn Health, Farmington, CT, USA
| | - Xiaofeng Wang
- Département de Biochimie et Médecine Moléculaire, Université de Montréal, Montréal, Quebec, Canada.,Division of Experimental Medicine, McGill University, Montreal, Quebec, Canada.,Institut de Recherches Cliniques de Montréal (IRCM), Montréal, Quebec, Canada
| | - Juan Carlos Rivera-Mulia
- Department of Biological Science, Florida State University, Tallahassee, FL, USA.,Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota Medical School, Minneapolis, MN, USA
| | | | | | - Surya B Chhetri
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA.,Biological Sciences, University of Alabama in Huntsville, Huntsville, AL, USA
| | - Jialing Zhang
- Department of Genetics, School of Medicine, Yale University, New Haven, CT, USA
| | - Alec Victorsen
- Department of Human Genetics, Institute for Genomics and Systems Biology, The University of Chicago, Chicago, IL, USA
| | | | - Axel Visel
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,School of Natural Sciences, University of California, Merced, Merced, CA, USA
| | - Gene W Yeo
- Department of Cellular and Molecular Medicine, Institute for Genomic Medicine, Stem Cell Program, Sanford Consortium for Regenerative Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Christopher B Burge
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Eric Lécuyer
- Département de Biochimie et Médecine Moléculaire, Université de Montréal, Montréal, Quebec, Canada.,Division of Experimental Medicine, McGill University, Montreal, Quebec, Canada.,Institut de Recherches Cliniques de Montréal (IRCM), Montréal, Quebec, Canada
| | - David M Gilbert
- Department of Biological Science, Florida State University, Tallahassee, FL, USA
| | - Job Dekker
- HHMI and Program in Systems Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - John Rinn
- University of Colorado Boulder, Boulder, CO, USA
| | - Eric M Mendenhall
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA.,Biological Sciences, University of Alabama in Huntsville, Huntsville, AL, USA
| | - Joseph R Ecker
- Genomics Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA.,Howard Hughes Medical Institute, The Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Manolis Kellis
- The Broad Institute of Harvard and MIT, Cambridge, MA, USA.,Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Robert J Klein
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - William S Noble
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Anshul Kundaje
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Roderic Guigó
- Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology and Universitat Pompeu Fabra, Barcelona, Spain
| | - Peggy J Farnham
- Department of Biochemistry and Molecular Medicine, Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - J Michael Cherry
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA.
| | - Richard M Myers
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA.
| | - Bing Ren
- Center for Epigenomics, Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA. .,Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA.
| | - Brenton R Graveley
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, UConn Health, Farmington, CT, USA.
| | | | - Len A Pennacchio
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA. .,US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA. .,Comparative Biochemistry Program, University of California, Berkeley, CA, USA.
| | - Michael P Snyder
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA. .,Cardiovascular Institute, Stanford School of Medicine, Stanford, CA, USA.
| | - Bradley E Bernstein
- Broad Institute and Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
| | - Barbara Wold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA.
| | - Ross C Hardison
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA.
| | - Thomas R Gingeras
- Cold Spring Harbor Laboratory, Functional Genomics, Cold Spring Harbor, NY, USA.
| | - John A Stamatoyannopoulos
- Altius Institute for Biomedical Sciences, Seattle, WA, USA. .,Department of Medicine, University of Washington School of Medicine, Seattle, WA, USA. .,Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
| | - Zhiping Weng
- University of Massachusetts Medical School, Program in Bioinformatics and Integrative Biology, Worcester, MA, USA. .,Department of Thoracic Surgery, Clinical Translational Research Center, Shanghai Pulmonary Hospital, The School of Life Sciences and Technology, Tongji University, Shanghai, China. .,Bioinformatics Program, Boston University, Boston, MA, USA.
| |
Collapse
|
16
|
Swygert SG, Lin D, Portillo-Ledesma S, Lin PY, Hunt DR, Kao CF, Schlick T, Noble WS, Tsukiyama T. Local chromatin fiber folding represses transcription and loop extrusion in quiescent cells. eLife 2021; 10:e72062. [PMID: 34734806 PMCID: PMC8598167 DOI: 10.7554/elife.72062] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 11/03/2021] [Indexed: 12/16/2022] Open
Abstract
A longstanding hypothesis is that chromatin fiber folding mediated by interactions between nearby nucleosomes represses transcription. However, it has been difficult to determine the relationship between local chromatin fiber compaction and transcription in cells. Further, global changes in fiber diameters have not been observed, even between interphase and mitotic chromosomes. We show that an increase in the range of local inter-nucleosomal contacts in quiescent yeast drives the compaction of chromatin fibers genome-wide. Unlike actively dividing cells, inter-nucleosomal interactions in quiescent cells require a basic patch in the histone H4 tail. This quiescence-specific fiber folding globally represses transcription and inhibits chromatin loop extrusion by condensin. These results reveal that global changes in chromatin fiber compaction can occur during cell state transitions, and establish physiological roles for local chromatin fiber folding in regulating transcription and chromatin domain formation.
Collapse
Affiliation(s)
- Sarah G Swygert
- Basic Sciences Division, Fred Hutchinson Cancer Research CenterSeattleUnited States
| | - Dejun Lin
- Department of Genome Sciences, University of WashingtonSeattleUnited States
| | | | - Po-Yen Lin
- Institute of Cellular and Organismic Biology, Academia SinicaTaipeiTaiwan
| | - Dakota R Hunt
- Basic Sciences Division, Fred Hutchinson Cancer Research CenterSeattleUnited States
| | - Cheng-Fu Kao
- Institute of Cellular and Organismic Biology, Academia SinicaTaipeiTaiwan
| | - Tamar Schlick
- Department of Chemistry, New York UniversityNew YorkUnited States
- Courant Institute of Mathematical Sciences, New York UniversityNew YorkUnited States
- New York University-East China Normal University Center for Computational Chemistry at New York University ShanghaiShanghaiChina
| | - William S Noble
- Department of Genome Sciences, University of WashingtonSeattleUnited States
- Paul G. Allen School of Computer Science and Engineering, University of WashingtonSeattleUnited States
| | - Toshio Tsukiyama
- Basic Sciences Division, Fred Hutchinson Cancer Research CenterSeattleUnited States
| |
Collapse
|
17
|
Bonora G, Ramani V, Singh R, Fang H, Jackson DL, Srivatsan S, Qiu R, Lee C, Trapnell C, Shendure J, Duan Z, Deng X, Noble WS, Disteche CM. Single-cell landscape of nuclear configuration and gene expression during stem cell differentiation and X inactivation. Genome Biol 2021; 22:279. [PMID: 34579774 PMCID: PMC8474932 DOI: 10.1186/s13059-021-02432-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Accepted: 07/07/2021] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Mammalian development is associated with extensive changes in gene expression, chromatin accessibility, and nuclear structure. Here, we follow such changes associated with mouse embryonic stem cell differentiation and X inactivation by integrating, for the first time, allele-specific data from these three modalities obtained by high-throughput single-cell RNA-seq, ATAC-seq, and Hi-C. RESULTS Allele-specific contact decay profiles obtained by single-cell Hi-C clearly show that the inactive X chromosome has a unique profile in differentiated cells that have undergone X inactivation. Loss of this inactive X-specific structure at mitosis is followed by its reappearance during the cell cycle, suggesting a "bookmark" mechanism. Differentiation of embryonic stem cells to follow the onset of X inactivation is associated with changes in contact decay profiles that occur in parallel on both the X chromosomes and autosomes. Single-cell RNA-seq and ATAC-seq show evidence of a delay in female versus male cells, due to the presence of two active X chromosomes at early stages of differentiation. The onset of the inactive X-specific structure in single cells occurs later than gene silencing, consistent with the idea that chromatin compaction is a late event of X inactivation. Single-cell Hi-C highlights evidence of discrete changes in nuclear structure characterized by the acquisition of very long-range contacts throughout the nucleus. Novel computational approaches allow for the effective alignment of single-cell gene expression, chromatin accessibility, and 3D chromosome structure. CONCLUSIONS Based on trajectory analyses, three distinct nuclear structure states are detected reflecting discrete and profound simultaneous changes not only to the structure of the X chromosomes, but also to that of autosomes during differentiation. Our study reveals that long-range structural changes to chromosomes appear as discrete events, unlike progressive changes in gene expression and chromatin accessibility.
Collapse
Affiliation(s)
- Giancarlo Bonora
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Vijay Ramani
- Department of Biochemistry & Biophysics, University of California San Francisco, San Francisco, CA, USA
| | - Ritambhara Singh
- Department of Computer Science, Brown University, Providence, RI, USA
- Center for Computational Molecular Biology, Brown University, Providence, RI, USA
| | - He Fang
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
| | - Dana L Jackson
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Sanjay Srivatsan
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Ruolan Qiu
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Choli Lee
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Cole Trapnell
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
- Howard Hughes Medical Institute, Seattle, WA, USA
| | - Zhijun Duan
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, USA
- Division of Hematology, Department of Medicine, University of Washington, Seattle, USA
| | - Xinxian Deng
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA.
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
| | - Christine M Disteche
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA.
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA.
- Department of Medicine, University of Washington, Seattle, WA, USA.
| |
Collapse
|
18
|
Abstract
The volume of proteomics and mass spectrometry data available in public repositories continues to grow at a rapid pace as more researchers embrace open science practices. Open access to the data behind scientific discoveries has become critical to validate published findings and develop new computational tools. Here, we present ppx, a Python package that provides easy, programmatic access to the data stored in ProteomeXchange repositories, such as PRIDE and MassIVE. The ppx package can be used as either a command line tool or a Python package to retrieve the files and metadata associated with a project when provided its identifier. To demonstrate how ppx enhances reproducible research, we used ppx within a Snakemake workflow to reanalyze a published data set with the open modification search tool ANN-SoLo and compared our reanalysis to the original results. We show that ppx readily integrates into workflows, and our reanalysis produced results consistent with the original analysis. We envision that ppx will be a valuable tool for creating reproducible analyses, providing tool developers easy access to data for development, testing, and benchmarking, and enabling the use of mass spectrometry data in data-intensive analyses. The ppx package is freely available and open source under the MIT license at https://github.com/wfondrie/ppx.
Collapse
Affiliation(s)
- William E Fondrie
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Wout Bittremieux
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
- Department of Computer Science, University of Antwerp, Antwerp, Belgium
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| |
Collapse
|
19
|
Abstract
The standard proteomics database search strategy involves searching spectra against a peptide database and estimating the false discovery rate (FDR) of the resulting set of peptide-spectrum matches. One assumption of this protocol is that all the peptides in the database are relevant to the hypothesis being investigated. However, in settings where researchers are interested in a subset of peptides, alternative search and FDR control strategies are needed. Recently, two methods were proposed to address this problem: subset-search and all-sub. We show that both methods fail to control the FDR. For subset-search, this failure is due to the presence of "neighbor" peptides, which are defined as irrelevant peptides with a similar precursor mass and fragmentation spectrum as a relevant peptide. Not considering neighbors compromises the FDR estimate because a spectrum generated by an irrelevant peptide can incorrectly match well to a relevant peptide. Therefore, we have developed a new method, "subset-neighbor search" (SNS), that accounts for neighbor peptides. We show evidence that SNS controls the FDR when neighbors are present and that SNS outperforms group-FDR, the only other method that appears to control the FDR relative to a subset of relevant peptides.
Collapse
Affiliation(s)
- Andy Lin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Deanna L. Plubell
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Uri Keich
- School of Mathematics and Statistics, University of Sydney, NSW, Australia
| | - William S. Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen School for Computer Science and Engineering, University of Washington, Seattle, WA, USA
| |
Collapse
|
20
|
Mudge MC, Nunn BL, Firth E, Ewert M, Hales K, Fondrie WE, Noble WS, Toner J, Light B, Junge KA. Subzero, saline incubations of
Colwellia psychrerythraea
reveal strategies and biomarkers for sustained life in extreme icy environments. Environ Microbiol 2021; 23:3840-3866. [DOI: 10.1111/1462-2920.15485] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 03/22/2021] [Indexed: 11/26/2022]
Affiliation(s)
- Miranda C. Mudge
- Department of Genome Sciences University of Washington Seattle WA USA
- Department of Molecular and Cellular Biology University of Washington Seattle WA USA
| | - Brook L. Nunn
- Department of Genome Sciences University of Washington Seattle WA USA
- Astrobiology Program University of Washington Seattle WA USA
| | - Erin Firth
- Applied Physics Lab, Polar Science Center University of Washington Seattle WA USA
| | - Marcela Ewert
- Applied Physics Lab, Polar Science Center University of Washington Seattle WA USA
| | - Kianna Hales
- Department of Genome Sciences University of Washington Seattle WA USA
| | | | - William S. Noble
- Department of Genome Sciences University of Washington Seattle WA USA
- Paul G. Allen School of Computer Science and Engineering University of Washington Seattle WA USA
| | - Jonathan Toner
- Department of Earth and Space Sciences University of Washington Seattle WA USA
| | - Bonnie Light
- Applied Physics Lab, Polar Science Center University of Washington Seattle WA USA
| | - Karen A. Junge
- Applied Physics Lab, Polar Science Center University of Washington Seattle WA USA
| |
Collapse
|
21
|
Abstract
Proteomics studies rely on the accurate assignment of peptides to the acquired tandem mass spectra-a task where machine learning algorithms have proven invaluable. We describe mokapot, which provides a flexible semisupervised learning algorithm that allows for highly customized analyses. We demonstrate some of the unique features of mokapot by improving the detection of RNA-cross-linked peptides from an analysis of RNA-binding proteins and increasing the consistency of peptide detection in a single-cell proteomics study.
Collapse
Affiliation(s)
- William
E. Fondrie
- Department
of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - William S. Noble
- Department
of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
- Paul
G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| |
Collapse
|
22
|
Erijman A, Kozlowski L, Sohrabi-Jahromi S, Fishburn J, Warfield L, Schreiber J, Noble WS, Sӧding J, Hahn S. A High-Throughput Screen for Transcription Activation Domains Reveals Their Sequence Features and Permits Prediction by Deep Learning. Mol Cell 2020; 79:1066. [PMID: 32946759 DOI: 10.1016/j.molcel.2020.08.013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
23
|
Moore JE, Purcaro MJ, Pratt HE, Epstein CB, Shoresh N, Adrian J, Kawli T, Davis CA, Dobin A, Kaul R, Halow J, Van Nostrand EL, Freese P, Gorkin DU, Shen Y, He Y, Mackiewicz M, Pauli-Behn F, Williams BA, Mortazavi A, Keller CA, Zhang XO, Elhajjajy SI, Huey J, Dickel DE, Snetkova V, Wei X, Wang X, Rivera-Mulia JC, Rozowsky J, Zhang J, Chhetri SB, Zhang J, Victorsen A, White KP, Visel A, Yeo GW, Burge CB, Lécuyer E, Gilbert DM, Dekker J, Rinn J, Mendenhall EM, Ecker JR, Kellis M, Klein RJ, Noble WS, Kundaje A, Guigó R, Farnham PJ, Cherry JM, Myers RM, Ren B, Graveley BR, Gerstein MB, Pennacchio LA, Snyder MP, Bernstein BE, Wold B, Hardison RC, Gingeras TR, Stamatoyannopoulos JA, Weng Z. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 2020; 583:699-710. [PMID: 32728249 PMCID: PMC7410828 DOI: 10.1038/s41586-020-2493-4] [Citation(s) in RCA: 879] [Impact Index Per Article: 219.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2017] [Accepted: 05/27/2020] [Indexed: 12/13/2022]
Abstract
The human and mouse genomes contain instructions that specify RNAs and proteins and govern the timing, magnitude, and cellular context of their production. To better delineate these elements, phase III of the Encyclopedia of DNA Elements (ENCODE) Project has expanded analysis of the cell and tissue repertoires of RNA transcription, chromatin structure and modification, DNA methylation, chromatin looping, and occupancy by transcription factors and RNA-binding proteins. Here we summarize these efforts, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development. All data are available through the ENCODE data portal (https://www.encodeproject.org), including phase II ENCODE1 and Roadmap Epigenomics2 data. We have developed a registry of 926,535 human and 339,815 mouse candidate cis-regulatory elements, covering 7.9 and 3.4% of their respective genomes, by integrating selected datatypes associated with gene regulation, and constructed a web-based server (SCREEN; http://screen.encodeproject.org) to provide flexible, user-defined access to this resource. Collectively, the ENCODE data and registry provide an expansive resource for the scientific community to build a better understanding of the organization and function of the human and mouse genomes.
Collapse
Affiliation(s)
- Jill E Moore
- University of Massachusetts Medical School, Program in Bioinformatics and Integrative Biology, Worcester, MA, USA
| | - Michael J Purcaro
- University of Massachusetts Medical School, Program in Bioinformatics and Integrative Biology, Worcester, MA, USA
| | - Henry E Pratt
- University of Massachusetts Medical School, Program in Bioinformatics and Integrative Biology, Worcester, MA, USA
| | | | - Noam Shoresh
- The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Jessika Adrian
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Trupti Kawli
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Carrie A Davis
- Cold Spring Harbor Laboratory, Functional Genomics, Cold Spring Harbor, NY, USA
| | - Alexander Dobin
- Cold Spring Harbor Laboratory, Functional Genomics, Cold Spring Harbor, NY, USA
| | - Rajinder Kaul
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
- Department of Medicine, University of Washington School of Medicine, Seattle, WA, USA
| | - Jessica Halow
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
| | - Eric L Van Nostrand
- Department of Cellular and Molecular Medicine, Institute for Genomic Medicine, Stem Cell Program, Sanford Consortium for Regenerative Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Peter Freese
- Program in Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - David U Gorkin
- Center for Epigenomics, Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
- Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA
| | - Yin Shen
- Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA
- Institute for Human Genetics, Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Yupeng He
- Genomics Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Mark Mackiewicz
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
| | | | - Brian A Williams
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Ali Mortazavi
- Department of Developmental and Cell Biology, University of California Irvine, Irvine, CA, USA
| | - Cheryl A Keller
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA
| | - Xiao-Ou Zhang
- University of Massachusetts Medical School, Program in Bioinformatics and Integrative Biology, Worcester, MA, USA
| | - Shaimae I Elhajjajy
- University of Massachusetts Medical School, Program in Bioinformatics and Integrative Biology, Worcester, MA, USA
| | - Jack Huey
- University of Massachusetts Medical School, Program in Bioinformatics and Integrative Biology, Worcester, MA, USA
| | - Diane E Dickel
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Valentina Snetkova
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Xintao Wei
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, UConn Health, Farmington, CT, USA
| | - Xiaofeng Wang
- Département de Biochimie et Médecine Moléculaire, Université de Montréal, Montréal, Quebec, Canada
- Division of Experimental Medicine, McGill University, Montreal, Quebec, Canada
- Institut de Recherches Cliniques de Montréal (IRCM), Montréal, Quebec, Canada
| | - Juan Carlos Rivera-Mulia
- Department of Biological Science, Florida State University, Tallahassee, FL, USA
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota Medical School, Minneapolis, MN, USA
| | | | | | - Surya B Chhetri
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
- Biological Sciences, University of Alabama in Huntsville, Huntsville, AL, USA
| | - Jialing Zhang
- Department of Genetics, School of Medicine, Yale University, New Haven, CT, USA
| | - Alec Victorsen
- Department of Human Genetics, Institute for Genomics and Systems Biology, The University of Chicago, Chicago, IL, USA
| | | | - Axel Visel
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- School of Natural Sciences, University of California, Merced, Merced, CA, USA
| | - Gene W Yeo
- Department of Cellular and Molecular Medicine, Institute for Genomic Medicine, Stem Cell Program, Sanford Consortium for Regenerative Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Christopher B Burge
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Eric Lécuyer
- Département de Biochimie et Médecine Moléculaire, Université de Montréal, Montréal, Quebec, Canada
- Division of Experimental Medicine, McGill University, Montreal, Quebec, Canada
- Institut de Recherches Cliniques de Montréal (IRCM), Montréal, Quebec, Canada
| | - David M Gilbert
- Department of Biological Science, Florida State University, Tallahassee, FL, USA
| | - Job Dekker
- HHMI and Program in Systems Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - John Rinn
- University of Colorado Boulder, Boulder, CO, USA
| | - Eric M Mendenhall
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA
- Biological Sciences, University of Alabama in Huntsville, Huntsville, AL, USA
| | - Joseph R Ecker
- Genomics Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA
- Howard Hughes Medical Institute, The Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Manolis Kellis
- The Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Robert J Klein
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - William S Noble
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Anshul Kundaje
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA
| | - Roderic Guigó
- Bioinformatics and Genomics Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology and Universitat Pompeu Fabra, Barcelona, Spain
| | - Peggy J Farnham
- Department of Biochemistry and Molecular Medicine, Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - J Michael Cherry
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA.
| | - Richard M Myers
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, USA.
| | - Bing Ren
- Center for Epigenomics, Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA.
- Ludwig Institute for Cancer Research, University of California, San Diego, La Jolla, CA, USA.
| | - Brenton R Graveley
- Department of Genetics and Genome Sciences, Institute for Systems Genomics, UConn Health, Farmington, CT, USA.
| | | | - Len A Pennacchio
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
- Comparative Biochemistry Program, University of California, Berkeley, CA, USA.
| | - Michael P Snyder
- Department of Genetics, School of Medicine, Stanford University, Palo Alto, CA, USA.
- Cardiovascular Institute, Stanford School of Medicine, Stanford, CA, USA.
| | - Bradley E Bernstein
- Broad Institute and Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
| | - Barbara Wold
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA.
| | - Ross C Hardison
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA.
| | - Thomas R Gingeras
- Cold Spring Harbor Laboratory, Functional Genomics, Cold Spring Harbor, NY, USA.
| | - John A Stamatoyannopoulos
- Altius Institute for Biomedical Sciences, Seattle, WA, USA.
- Department of Medicine, University of Washington School of Medicine, Seattle, WA, USA.
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
| | - Zhiping Weng
- University of Massachusetts Medical School, Program in Bioinformatics and Integrative Biology, Worcester, MA, USA.
- Department of Thoracic Surgery, Clinical Translational Research Center, Shanghai Pulmonary Hospital, The School of Life Sciences and Technology, Tongji University, Shanghai, China.
- Bioinformatics Program, Boston University, Boston, MA, USA.
| |
Collapse
|
24
|
Erijman A, Kozlowski L, Sohrabi-Jahromi S, Fishburn J, Warfield L, Schreiber J, Noble WS, Söding J, Hahn S. A High-Throughput Screen for Transcription Activation Domains Reveals Their Sequence Features and Permits Prediction by Deep Learning. Mol Cell 2020; 78:890-902.e6. [PMID: 32416068 PMCID: PMC7275923 DOI: 10.1016/j.molcel.2020.04.020] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Revised: 03/11/2020] [Accepted: 04/15/2020] [Indexed: 01/03/2023]
Abstract
Acidic transcription activation domains (ADs) are encoded by a wide range of seemingly unrelated amino acid sequences, making it difficult to recognize features that promote their dynamic behavior, "fuzzy" interactions, and target specificity. We screened a large set of random 30-mer peptides for AD function in yeast and trained a deep neural network (ADpred) on the AD-positive and -negative sequences. ADpred identifies known acidic ADs within transcription factors and accurately predicts the consequences of mutations. Our work reveals that strong acidic ADs contain multiple clusters of hydrophobic residues near acidic side chains, explaining why ADs often have a biased amino acid composition. ADs likely use a binding mechanism similar to avidity where a minimum number of weak dynamic interactions are required between activator and target to generate biologically relevant affinity and in vivo function. This mechanism explains the basis for fuzzy binding observed between acidic ADs and targets.
Collapse
Affiliation(s)
- Ariel Erijman
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Lukasz Kozlowski
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Salma Sohrabi-Jahromi
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - James Fishburn
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Linda Warfield
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Jacob Schreiber
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA; Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Johannes Söding
- Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany.
| | - Steven Hahn
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
| |
Collapse
|
25
|
Pan S, Hullar MAJ, Lai LA, Peng H, May DH, Noble WS, Raftery D, Navarro SL, Neuhouser ML, Lampe PD, Lampe JW, Chen R. Gut Microbial Protein Expression in Response to Dietary Patterns in a Controlled Feeding Study: A Metaproteomic Approach. Microorganisms 2020; 8:E379. [PMID: 32156071 PMCID: PMC7143255 DOI: 10.3390/microorganisms8030379] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Revised: 03/02/2020] [Accepted: 03/04/2020] [Indexed: 12/11/2022] Open
Abstract
Although the gut microbiome has been associated with dietary patterns linked to health, microbial metabolism is not well characterized. This ancillary study was a proof of principle analysis for a novel application of metaproteomics to study microbial protein expression in a controlled dietary intervention. We measured the response of the microbiome to diet in a randomized crossover dietary intervention of a whole-grain, low glycemic load diet (WG) and a refined-grain, high glycemic load diet (RG). Total proteins in stools from 9 participants at the end of each diet period (n = 18) were analyzed by LC MS/MS and proteins were identified using the Human Microbiome Project (HMP) human gut microbiome database and UniProt human protein databases. T-tests, controlling for false discovery rate (FDR) <10%, were used to compare the Gene Ontology (GO) biological processes and bacterial enzymes between the two interventions. Using shotgun proteomics, more than 53,000 unique peptides were identified including microbial (89%) and human peptides (11%). Forty-eight bacterial enzymes were statistically different between the diets, including those implicated in SCFA production and degradation of fatty acids. Enzymes associated with degradation of human mucin were significantly enriched in the RG diet. These results illustrate that the metaproteomic approach is a valuable tool to study the microbial metabolism of diets that may influence host health.
Collapse
Affiliation(s)
- Sheng Pan
- Institute of Molecular Medicine, the University of Texas Health Science Center at Houston, Houston, TX 77030, USA; (S.P.); (H.P.)
| | - Meredith A. J. Hullar
- Fred Hutchinson Cancer Research Center, Division of Public Health Sciences, Seattle, WA 98109, USA; (D.R.); (S.L.N.); (M.L.N.); (P.D.L.); (J.W.L.)
| | - Lisa A. Lai
- Department of Medicine, University of Washington, Seattle, WA 98105, USA;
| | - Hong Peng
- Institute of Molecular Medicine, the University of Texas Health Science Center at Houston, Houston, TX 77030, USA; (S.P.); (H.P.)
| | - Damon H. May
- Department of Genome Sciences, University of Washington, Seattle, WA 98105, USA; (D.H.M.)
| | - William S. Noble
- Department of Genome Sciences, University of Washington, Seattle, WA 98105, USA; (D.H.M.)
| | - Daniel Raftery
- Fred Hutchinson Cancer Research Center, Division of Public Health Sciences, Seattle, WA 98109, USA; (D.R.); (S.L.N.); (M.L.N.); (P.D.L.); (J.W.L.)
- Department of Anesthesiology and Pain Medicine, Northwest Metabolomics Research Center, University of Washington, Seattle, WA 98109 USA
| | - Sandi L. Navarro
- Fred Hutchinson Cancer Research Center, Division of Public Health Sciences, Seattle, WA 98109, USA; (D.R.); (S.L.N.); (M.L.N.); (P.D.L.); (J.W.L.)
| | - Marian L. Neuhouser
- Fred Hutchinson Cancer Research Center, Division of Public Health Sciences, Seattle, WA 98109, USA; (D.R.); (S.L.N.); (M.L.N.); (P.D.L.); (J.W.L.)
| | - Paul D. Lampe
- Fred Hutchinson Cancer Research Center, Division of Public Health Sciences, Seattle, WA 98109, USA; (D.R.); (S.L.N.); (M.L.N.); (P.D.L.); (J.W.L.)
| | - Johanna W. Lampe
- Fred Hutchinson Cancer Research Center, Division of Public Health Sciences, Seattle, WA 98109, USA; (D.R.); (S.L.N.); (M.L.N.); (P.D.L.); (J.W.L.)
| | - Ru Chen
- Division of Gastroenterology and Hepatology, Department of Medicine, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
26
|
Pino LK, Searle BC, Yang HY, Hoofnagle AN, Noble WS, MacCoss MJ. Matrix-Matched Calibration Curves for Assessing Analytical Figures of Merit in Quantitative Proteomics. J Proteome Res 2020; 19:1147-1153. [PMID: 32037841 DOI: 10.1021/acs.jproteome.9b00666] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Mass spectrometry is a powerful tool for quantifying protein abundance in complex samples. Advances in sample preparation and the development of data-independent acquisition (DIA) mass spectrometry approaches have increased the number of peptides and proteins measured per sample. Here, we present a series of experiments demonstrating how to assess whether a peptide measurement is quantitative by mass spectrometry. Our results demonstrate that increasing the number of detected peptides in a proteomics experiment does not necessarily result in increased numbers of peptides that can be measured quantitatively.
Collapse
Affiliation(s)
- Lindsay K Pino
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Brian C Searle
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Han-Yin Yang
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Andrew N Hoofnagle
- Department of Laboratory Medicine, University of Washington, Seattle, Washington 98195, United States
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States.,Department of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, United States
| | - Michael J MacCoss
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| |
Collapse
|
27
|
Abstract
Machine learning methods have proven invaluable for increasing the sensitivity of peptide detection in proteomics experiments. Most modern tools, such as Percolator and PeptideProphet, use semi-supervised algorithms to learn models directly from the datasets that they analyze. Although these methods are effective for many proteomics experiments, we suspected that they may be suboptimal for experiments of smaller scale. In this work, we found that the power and consistency of Percolator results was reduced as the size of the experiment was decreased. As an alternative, we propose a different operating mode for Percolator: learn a model with Percolator from a large dataset and use the learned model to evaluate the small-scale experiment. We call this a “static modeling” approach, in contrast to Percolator’s usual “dynamic model” that is trained anew for each dataset. We applied this static modeling approach to two settings: small, gel-based experiments and single-cell proteomics. In both cases, static models increased the yield of detected peptides and eliminated the model-induced variability of the standard dynamic approach. These results suggest that static models are a powerful tool for bringing the full benefits of Percolator and other semi-supervised algorithms to small-scale experiments.
Collapse
Affiliation(s)
- William E Fondrie
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195-5065, United States
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195-5065, United States.,Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195-5065, United States
| |
Collapse
|
28
|
Ramani V, Deng X, Qiu R, Lee C, Disteche CM, Noble WS, Shendure J, Duan Z. Sci-Hi-C: A single-cell Hi-C method for mapping 3D genome organization in large number of single cells. Methods 2020; 170:61-68. [PMID: 31536770 PMCID: PMC6949367 DOI: 10.1016/j.ymeth.2019.09.012] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Accepted: 09/13/2019] [Indexed: 12/31/2022] Open
Abstract
The highly dynamic nature of chromosome conformation and three-dimensional (3D) genome organization leads to cell-to-cell variability in chromatin interactions within a cell population, even if the cells of the population appear to be functionally homogeneous. Hence, although Hi-C is a powerful tool for mapping 3D genome organization, this heterogeneity of chromosome higher order structure among individual cells limits the interpretive power of population based bulk Hi-C assays. Moreover, single-cell studies have the potential to enable the identification and characterization of rare cell populations or cell subtypes in a heterogeneous population. However, it may require surveying relatively large numbers of single cells to achieve statistically meaningful observations in single-cell studies. By applying combinatorial cellular indexing to chromosome conformation capture, we developed single-cell combinatorial indexed Hi-C (sci-Hi-C), a high throughput method that enables mapping chromatin interactomes in large number of single cells. We demonstrated the use of sci-Hi-C data to separate cells by karytoypic and cell-cycle state differences and to identify cellular variability in mammalian chromosomal conformation. Here, we provide a detailed description of method design and step-by-step working protocols for sci-Hi-C.
Collapse
Affiliation(s)
- Vijay Ramani
- Department of Genome Sciences, University of Washington, Seattle, WA, United States.
| | - Xinxian Deng
- Department of Pathology, University of Washington, Seattle, WA, United States
| | - Ruolan Qiu
- Department of Genome Sciences, University of Washington, Seattle, WA, United States
| | - Choli Lee
- Department of Genome Sciences, University of Washington, Seattle, WA, United States
| | - Christine M Disteche
- Department of Pathology, University of Washington, Seattle, WA, United States; Department of Medicine, University of Washington, Seattle, WA, United States
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, United States
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, United States; Howard Hughes Medical Institute, Seattle, WA, United States.
| | - Zhijun Duan
- Division of Hematology, University of Washington School of Medicine, Seattle, WA, United States; Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, United States.
| |
Collapse
|
29
|
Bertero A, Fields PA, Smith AST, Leonard A, Beussman K, Sniadecki NJ, Kim DH, Tse HF, Pabon L, Shendure J, Noble WS, Murry CE. Chromatin compartment dynamics in a haploinsufficient model of cardiac laminopathy. J Cell Biol 2019; 218:2919-2944. [PMID: 31395619 PMCID: PMC6719452 DOI: 10.1083/jcb.201902117] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Revised: 06/20/2019] [Accepted: 07/10/2019] [Indexed: 01/16/2023] Open
Abstract
Mutations in A-type nuclear lamins cause dilated cardiomyopathy, which is postulated to result from dysregulated gene expression due to changes in chromatin organization into active and inactive compartments. To test this, we performed genome-wide chromosome conformation analyses in human induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CMs) with a haploinsufficient mutation for lamin A/C. Compared with gene-corrected cells, mutant hiPSC-CMs have marked electrophysiological and contractile alterations, with modest gene expression changes. While large-scale changes in chromosomal topology are evident, differences in chromatin compartmentalization are limited to a few hotspots that escape segregation to the nuclear lamina and inactivation during cardiogenesis. These regions exhibit up-regulation of multiple noncardiac genes including CACNA1A, encoding for neuronal P/Q-type calcium channels. Pharmacological inhibition of the resulting current partially mitigates the electrical alterations. However, chromatin compartment changes do not explain most gene expression alterations in mutant hiPSC-CMs. Thus, global errors in chromosomal compartmentation are not the primary pathogenic mechanism in heart failure due to lamin A/C haploinsufficiency.
Collapse
Affiliation(s)
- Alessandro Bertero
- Department of Pathology, University of Washington, Seattle, WA
- Center for Cardiovascular Biology, University of Washington, Seattle, WA
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA
| | - Paul A Fields
- Department of Pathology, University of Washington, Seattle, WA
- Center for Cardiovascular Biology, University of Washington, Seattle, WA
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA
| | - Alec S T Smith
- Center for Cardiovascular Biology, University of Washington, Seattle, WA
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA
- Department of Bioengineering, University of Washington, Seattle, WA
| | - Andrea Leonard
- Center for Cardiovascular Biology, University of Washington, Seattle, WA
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA
- Department of Mechanical Engineering, University of Washington, Seattle, WA
| | - Kevin Beussman
- Center for Cardiovascular Biology, University of Washington, Seattle, WA
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA
- Department of Mechanical Engineering, University of Washington, Seattle, WA
| | - Nathan J Sniadecki
- Center for Cardiovascular Biology, University of Washington, Seattle, WA
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA
- Department of Bioengineering, University of Washington, Seattle, WA
- Department of Mechanical Engineering, University of Washington, Seattle, WA
| | - Deok-Ho Kim
- Center for Cardiovascular Biology, University of Washington, Seattle, WA
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA
- Department of Bioengineering, University of Washington, Seattle, WA
| | - Hung-Fat Tse
- Cardiology Division, Department of Medicine, University of Hong Kong, Pok Fu Lam, Hong Kong
| | - Lil Pabon
- Department of Pathology, University of Washington, Seattle, WA
- Center for Cardiovascular Biology, University of Washington, Seattle, WA
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA
- Howard Hughes Medical Institute, Seattle, WA
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Charles E Murry
- Department of Pathology, University of Washington, Seattle, WA
- Center for Cardiovascular Biology, University of Washington, Seattle, WA
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA
- Department of Bioengineering, University of Washington, Seattle, WA
- Department of Medicine/Cardiology, University of Washington, Seattle, WA
| |
Collapse
|
30
|
Cheng A, Grant CE, Noble WS, Bailey TL. MoMo: discovery of statistically significant post-translational modification motifs. Bioinformatics 2019; 35:2774-2782. [PMID: 30596994 PMCID: PMC6691336 DOI: 10.1093/bioinformatics/bty1058] [Citation(s) in RCA: 112] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Revised: 12/14/2018] [Accepted: 12/26/2018] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Post-translational modifications (PTMs) of proteins are associated with many significant biological functions and can be identified in high throughput using tandem mass spectrometry. Many PTMs are associated with short sequence patterns called 'motifs' that help localize the modifying enzyme. Accordingly, many algorithms have been designed to identify these motifs from mass spectrometry data. Accurate statistical confidence estimates for discovered motifs are critically important for proper interpretation and in the design of downstream experimental validation. RESULTS We describe a method for assigning statistical confidence estimates to PTM motifs, and we demonstrate that this method provides accurate P-values on both simulated and real data. Our methods are implemented in MoMo, a software tool for discovering motifs among sets of PTMs that we make available as a web server and as downloadable source code. MoMo re-implements the two most widely used PTM motif discovery algorithms-motif-x and MoDL-while offering many enhancements. Relative to motif-x, MoMo offers improved statistical confidence estimates and more accurate calculation of motif scores. The MoMo web server offers more proteome databases, more input formats, larger inputs and longer running times than the motif-x web server. Finally, our study demonstrates that the confidence estimates produced by motif-x are inaccurate. This inaccuracy stems in part from the common practice of drawing 'background' peptides from an unshuffled proteome database. Our results thus suggest that many of the papers that use motif-x to find motifs may be reporting results that lack statistical support. AVAILABILITY AND IMPLEMENTATION The MoMo web server and source code are provided at http://meme-suite.org. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alice Cheng
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Charles E Grant
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | | |
Collapse
|
31
|
Bai W, Bilmes J, Noble WS. Submodular Generalized Matching for Peptide Identification in Tandem Mass Spectrometry. IEEE/ACM Trans Comput Biol Bioinform 2019; 16:1168-1181. [PMID: 29993658 PMCID: PMC8641787 DOI: 10.1109/tcbb.2018.2822280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
MOTIVATION Identification of spectra produced by a shotgun proteomics mass spectrometry experiment is commonly performed by searching the observed spectra against a peptide database. The heart of this search procedure is a score function that evaluates the quality of a hypothesized match between an observed spectrum and a theoretical spectrum corresponding to a particular peptide sequence. Accordingly, the success of a spectrum analysis pipeline depends critically upon this peptide-spectrum score function. We develop peptide-spectrum score functions that compute the maximum value of a submodular function under $m$ m matroid constraints. We call this procedure a submodular generalized matching (SGM) since it generalizes bipartite matching. We use a greedy algorithm to compute maximization, which can achieve a solution whose objective is guaranteed to be at least $\frac{1}{1+m}$ 1 1 + m of the true optimum. The advantage of the SGM framework is that known long-range properties of experimental spectra can be modeled by designing suitable submodular functions and matroid constraints. Experiments on four data sets from various organisms and mass spectrometry platforms show that the SGM approach leads to significantly improved performance compared to several state-of-the-art methods. Supplementary information, C++ source code, and data sets can be found at https://melodi-lab.github.io/SGM.
Collapse
|
32
|
Kim JS, He X, Liu J, Duan Z, Kim T, Gerard J, Kim B, Pillai MM, Lane WS, Noble WS, Budnik B, Waldman T. Systematic proteomics of endogenous human cohesin reveals an interaction with diverse splicing factors and RNA-binding proteins required for mitotic progression. J Biol Chem 2019; 294:8760-8772. [PMID: 31010829 PMCID: PMC6552432 DOI: 10.1074/jbc.ra119.007832] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Revised: 04/18/2019] [Indexed: 12/23/2022] Open
Abstract
The cohesin complex regulates sister chromatid cohesion, chromosome organization, gene expression, and DNA repair. Cohesin is a ring complex composed of four core subunits and seven regulatory subunits. In an effort to comprehensively identify additional cohesin-interacting proteins, we used gene editing to introduce a dual epitope tag into the endogenous allele of each of 11 known components of cohesin in cultured human cells, and we performed MS analyses on dual-affinity purifications. In addition to reciprocally identifying all known components of cohesin, we found that cohesin interacts with a panoply of splicing factors and RNA-binding proteins (RBPs). These included diverse components of the U4/U6.U5 tri-small nuclear ribonucleoprotein complex and several splicing factors that are commonly mutated in cancer. The interaction between cohesin and splicing factors/RBPs was RNA- and DNA-independent, occurred in chromatin, was enhanced during mitosis, and required RAD21. Furthermore, cohesin-interacting splicing factors and RBPs followed the cohesin cycle and prophase pathway of cell cycle-regulated interactions with chromatin. Depletion of cohesin-interacting splicing factors and RBPs resulted in aberrant mitotic progression. These results provide a comprehensive view of the endogenous human cohesin interactome and identify splicing factors and RBPs as functionally significant cohesin-interacting proteins.
Collapse
Affiliation(s)
- Jung-Sik Kim
- From the Departments of Oncology and Biochemistry & Molecular Biology, Georgetown University School of Medicine, Washington, D. C. 20057
| | - Xiaoyuan He
- From the Departments of Oncology and Biochemistry & Molecular Biology, Georgetown University School of Medicine, Washington, D. C. 20057
| | - Jie Liu
- the Department of Genome Sciences
| | - Zhijun Duan
- Institute for Stem Cell and Regenerative Medicine, and
- Division of Hematology, University of Washington, Seattle, Washington 98195
| | - Taeyeon Kim
- From the Departments of Oncology and Biochemistry & Molecular Biology, Georgetown University School of Medicine, Washington, D. C. 20057
| | - Julia Gerard
- From the Departments of Oncology and Biochemistry & Molecular Biology, Georgetown University School of Medicine, Washington, D. C. 20057
| | - Brian Kim
- From the Departments of Oncology and Biochemistry & Molecular Biology, Georgetown University School of Medicine, Washington, D. C. 20057
| | - Manoj M Pillai
- the Section of Hematology, Yale Cancer Center, Yale University School of Medicine, New Haven, Connecticut 06510, and
| | - William S Lane
- the Mass Spectrometry and Proteomics Resource Laboratory, Harvard University, Cambridge, Massachusetts 02138
| | | | - Bogdan Budnik
- the Mass Spectrometry and Proteomics Resource Laboratory, Harvard University, Cambridge, Massachusetts 02138
| | - Todd Waldman
- From the Departments of Oncology and Biochemistry & Molecular Biology, Georgetown University School of Medicine, Washington, D. C. 20057,
| |
Collapse
|
33
|
Abstract
Searching tandem mass spectra against a peptide database requires accurate knowledge of various experimental parameters, including machine settings and details of the sample preparation protocol. In some cases, such as in reanalysis of public data sets, this experimental metadata may be missing or inaccurate. We describe a method for automatically inferring the presence of various types of modifications, including stable-isotope and isobaric labeling and tandem mass tags as well as the enrichment of phosphorylated peptides, directly from a given set of mass spectra. We demonstrate the sensitivity and specificity of the proposed approach, and we provide open-source Python and C++ implementations in a new version of the software tool Param-Medic.
Collapse
Affiliation(s)
- Damon H May
- Department of Genome Sciences , University of Washington , Seattle , Washington 98195 , United States
| | - Kaipo Tamura
- Department of Genome Sciences , University of Washington , Seattle , Washington 98195 , United States
| | - William S Noble
- Department of Genome Sciences , University of Washington , Seattle , Washington 98195 , United States.,Paul G. Allen School of Computer Science and Engineering , University of Washington , Seattle , Washington 98195 , United States
| |
Collapse
|
34
|
Bertero A, Fields PA, Ramani V, Bonora G, Yardimci GG, Reinecke H, Pabon L, Noble WS, Shendure J, Murry CE. Dynamics of genome reorganization during human cardiogenesis reveal an RBM20-dependent splicing factory. Nat Commun 2019; 10:1538. [PMID: 30948719 PMCID: PMC6449405 DOI: 10.1038/s41467-019-09483-5] [Citation(s) in RCA: 83] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Accepted: 03/08/2019] [Indexed: 01/25/2023] Open
Abstract
Functional changes in spatial genome organization during human development are poorly understood. Here we report a comprehensive profile of nuclear dynamics during human cardiogenesis from pluripotent stem cells by integrating Hi-C, RNA-seq and ATAC-seq. While chromatin accessibility and gene expression show complex on/off dynamics, large-scale genome architecture changes are mostly unidirectional. Many large cardiac genes transition from a repressive to an active compartment during differentiation, coincident with upregulation. We identify a network of such gene loci that increase their association inter-chromosomally, and are targets of the muscle-specific splicing factor RBM20. Genome editing studies show that TTN pre-mRNA, the main RBM20-regulated transcript in the heart, nucleates RBM20 foci that drive spatial proximity between the TTN locus and other inter-chromosomal RBM20 targets such as CACNA1C and CAMK2D. This mechanism promotes RBM20-dependent alternative splicing of the resulting transcripts, indicating the existence of a cardiac-specific trans-interacting chromatin domain (TID) functioning as a splicing factory.
Collapse
Affiliation(s)
- Alessandro Bertero
- Department of Pathology, University of Washington, 1959 NE Pacific Street, Seattle, WA, 98195, USA.,Center for Cardiovascular Biology, University of Washington, 850 Republican Street, Brotman Building, Seattle, WA, 98109, USA.,Institute for Stem Cell and Regenerative Medicine, University of Washington, 850 Republican Street, Seattle, 98109, WA, USA
| | - Paul A Fields
- Department of Pathology, University of Washington, 1959 NE Pacific Street, Seattle, WA, 98195, USA.,Center for Cardiovascular Biology, University of Washington, 850 Republican Street, Brotman Building, Seattle, WA, 98109, USA.,Institute for Stem Cell and Regenerative Medicine, University of Washington, 850 Republican Street, Seattle, 98109, WA, USA
| | - Vijay Ramani
- Department of Genome Sciences, University of Washington, William H. Foege Hall, 3720 15th Ave NE, Seattle, 98195, WA, USA
| | - Giancarlo Bonora
- Department of Genome Sciences, University of Washington, William H. Foege Hall, 3720 15th Ave NE, Seattle, 98195, WA, USA
| | - Galip G Yardimci
- Department of Genome Sciences, University of Washington, William H. Foege Hall, 3720 15th Ave NE, Seattle, 98195, WA, USA
| | - Hans Reinecke
- Department of Pathology, University of Washington, 1959 NE Pacific Street, Seattle, WA, 98195, USA.,Center for Cardiovascular Biology, University of Washington, 850 Republican Street, Brotman Building, Seattle, WA, 98109, USA.,Institute for Stem Cell and Regenerative Medicine, University of Washington, 850 Republican Street, Seattle, 98109, WA, USA
| | - Lil Pabon
- Department of Pathology, University of Washington, 1959 NE Pacific Street, Seattle, WA, 98195, USA.,Center for Cardiovascular Biology, University of Washington, 850 Republican Street, Brotman Building, Seattle, WA, 98109, USA.,Institute for Stem Cell and Regenerative Medicine, University of Washington, 850 Republican Street, Seattle, 98109, WA, USA
| | - William S Noble
- Department of Genome Sciences, University of Washington, William H. Foege Hall, 3720 15th Ave NE, Seattle, 98195, WA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, William H. Foege Hall, 3720 15th Ave NE, Seattle, 98195, WA, USA.,Howard Hughes Medical Institute, Seattle, WA, USA
| | - Charles E Murry
- Department of Pathology, University of Washington, 1959 NE Pacific Street, Seattle, WA, 98195, USA. .,Center for Cardiovascular Biology, University of Washington, 850 Republican Street, Brotman Building, Seattle, WA, 98109, USA. .,Institute for Stem Cell and Regenerative Medicine, University of Washington, 850 Republican Street, Seattle, 98109, WA, USA. .,Department of Medicine/Cardiology, 1959 NE Pacific Street, University of Washington, Seattle, 98195, WA, USA. .,Department of Bioengineering, University of Washington, 3720 15th Ave NE, Seattle, WA, 98195, USA.
| |
Collapse
|
35
|
Lin D, Bonora G, Yardımcı GG, Noble WS. Computational methods for analyzing and modeling genome structure and organization. Wiley Interdiscip Rev Syst Biol Med 2019; 11:e1435. [PMID: 30022617 PMCID: PMC6294685 DOI: 10.1002/wsbm.1435] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2017] [Revised: 06/07/2018] [Accepted: 06/16/2018] [Indexed: 12/31/2022]
Abstract
Recent advances in chromosome conformation capture technologies have led to the discovery of previously unappreciated structural features of chromatin. Computational analysis has been critical in detecting these features and thereby helping to uncover the building blocks of genome architecture. Algorithms are being developed to integrate these architectural features to construct better three-dimensional (3D) models of the genome. These computational methods have revealed the importance of 3D genome organization to essential biological processes. In this article, we review the state of the art in analytic and modeling techniques with a focus on their application to answering various biological questions related to chromatin structure. We summarize the limitations of these computational techniques and suggest future directions, including the importance of incorporating multiple sources of experimental data in building a more comprehensive model of the genome. This article is categorized under: Analytical and Computational Methods > Computational Methods Laboratory Methods and Technologies > Genetic/Genomic Methods Models of Systems Properties and Processes > Mechanistic Models.
Collapse
Affiliation(s)
- Dejun Lin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Giancarlo Bonora
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | | | - William S. Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| |
Collapse
|
36
|
Bai W, Noble WS, Bilmes JA. Submodular Maximization via Gradient Ascent: The Case of Deep Submodular Functions. Adv Neural Inf Process Syst 2018; 2018:7989-7999. [PMID: 30705579 PMCID: PMC6351064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
We study the problem of maximizing deep submodular functions (DSFs) [13, 3] subject to a matroid constraint. DSFs are an expressive class of submodular functions that include, as strict subfamilies, the facility location, weighted coverage, and sums of concave composed with modular functions. We use a strategy similar to the continuous greedy approach [6], but we show that the multilinear extension of any DSF has a natural and computationally attainable concave relaxation that we can optimize using gradient ascent. Our results show a guarantee ofmax 0 < δ < 1 ( 1 - ϵ - δ - e - δ 2 Ω ( k ) ) with a running time of O(n 2 /ϵ 2 ) plus time for pipage rounding [6] to recover a discrete solution, where k is the rank of the matroid constraint. This bound is often better than the standard 1 - 1/e guarantee of the continuous greedy algorithm, but runs much faster. Our bound also holds even for fully curved (c = 1) functions where the guarantee of 1 - c/e degenerates to 1 - 1/e where c is the curvature of f [37]. We perform computational experiments that support our theoretical results.
Collapse
Affiliation(s)
- Wenruo Bai
- Depts. of Electrical & Computer Engineering, Seattle, WA 98195
| | - William S Noble
- Genome Sciences Seattle, WA 98195
- Computer Science and Engineering, Seattle, WA 98195
| | - Jeff A Bilmes
- Depts. of Electrical & Computer Engineering, Seattle, WA 98195
- Computer Science and Engineering, Seattle, WA 98195
| |
Collapse
|
37
|
Ma W, Ay F, Lee C, Gulsoy G, Deng X, Cook S, Hesson J, Cavanaugh C, Ware CB, Krumm A, Shendure J, Blau CA, Disteche CM, Noble WS, Duan Z. Using DNase Hi-C techniques to map global and local three-dimensional genome architecture at high resolution. Methods 2018; 142:59-73. [PMID: 29382556 PMCID: PMC5993575 DOI: 10.1016/j.ymeth.2018.01.014] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2017] [Revised: 12/14/2017] [Accepted: 01/25/2018] [Indexed: 01/09/2023] Open
Abstract
The folding and three-dimensional (3D) organization of chromatin in the nucleus critically impacts genome function. The past decade has witnessed rapid advances in genomic tools for delineating 3D genome architecture. Among them, chromosome conformation capture (3C)-based methods such as Hi-C are the most widely used techniques for mapping chromatin interactions. However, traditional Hi-C protocols rely on restriction enzymes (REs) to fragment chromatin and are therefore limited in resolution. We recently developed DNase Hi-C for mapping 3D genome organization, which uses DNase I for chromatin fragmentation. DNase Hi-C overcomes RE-related limitations associated with traditional Hi-C methods, leading to improved methodological resolution. Furthermore, combining this method with DNA capture technology provides a high-throughput approach (targeted DNase Hi-C) that allows for mapping fine-scale chromatin architecture at exceptionally high resolution. Hence, targeted DNase Hi-C will be valuable for delineating the physical landscapes of cis-regulatory networks that control gene expression and for characterizing phenotype-associated chromatin 3D signatures. Here, we provide a detailed description of method design and step-by-step working protocols for these two methods.
Collapse
Affiliation(s)
- Wenxiu Ma
- Department of Genome Sciences, University of Washington, USA
| | - Ferhat Ay
- Department of Genome Sciences, University of Washington, USA
| | - Choli Lee
- Department of Genome Sciences, University of Washington, USA
| | - Gunhan Gulsoy
- Department of Genome Sciences, University of Washington, USA
| | - Xinxian Deng
- Department of Pathology, University of Washington, USA
| | - Savannah Cook
- Institute for Stem Cell and Regenerative Medicine, University of Washington, USA; Department of Comparative Medicine, University of Washington, USA
| | - Jennifer Hesson
- Institute for Stem Cell and Regenerative Medicine, University of Washington, USA; Department of Comparative Medicine, University of Washington, USA
| | - Christopher Cavanaugh
- Institute for Stem Cell and Regenerative Medicine, University of Washington, USA; Department of Comparative Medicine, University of Washington, USA
| | - Carol B Ware
- Institute for Stem Cell and Regenerative Medicine, University of Washington, USA; Department of Comparative Medicine, University of Washington, USA
| | - Anton Krumm
- Department of Radiation Oncology, University of Washington, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, USA; Howard Hughes Medical Institute, Seattle, WA 98195-8056, USA
| | - C Anthony Blau
- Institute for Stem Cell and Regenerative Medicine, University of Washington, USA; Division of Hematology, Department of Medicine, University of Washington, USA
| | | | - William S Noble
- Department of Genome Sciences, University of Washington, USA.
| | - ZhiJun Duan
- Institute for Stem Cell and Regenerative Medicine, University of Washington, USA; Division of Hematology, Department of Medicine, University of Washington, USA.
| |
Collapse
|
38
|
Yan KK, Yardimci GG, Yan C, Noble WS, Gerstein M. HiC-spector: a matrix library for spectral and reproducibility analysis of Hi-C contact maps. Bioinformatics 2018; 33:2199-2201. [PMID: 28369339 PMCID: PMC5870694 DOI: 10.1093/bioinformatics/btx152] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Accepted: 03/21/2017] [Indexed: 11/28/2022] Open
Abstract
Summary Genome-wide proximity ligation based assays like Hi-C have opened a window to the 3D organization of the genome. In so doing, they present data structures that are different from conventional 1D signal tracks. To exploit the 2D nature of Hi-C contact maps, matrix techniques like spectral analysis are particularly useful. Here, we present HiC-spector, a collection of matrix-related functions for analyzing Hi-C contact maps. In particular, we introduce a novel reproducibility metric for quantifying the similarity between contact maps based on spectral decomposition. The metric successfully separates contact maps mapped from Hi-C data coming from biological replicates, pseudo-replicates and different cell types. Availability and Implementation Source code in Julia and Python, and detailed documentation is available at https://github.com/gersteinlab/HiC-spector. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Koon-Kiu Yan
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | | | - Chengfei Yan
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.,Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA.,Department of Computer Science, Yale University, New Haven, CT, USA
| |
Collapse
|
39
|
Sakano H, Zorio DAR, Wang X, Ting YS, Noble WS, MacCoss MJ, Rubel EW, Wang Y. Proteomic analyses of nucleus laminaris identified candidate targets of the fragile X mental retardation protein. J Comp Neurol 2017; 525:3341-3359. [PMID: 28685837 DOI: 10.1002/cne.24281] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2017] [Revised: 06/23/2017] [Accepted: 07/04/2017] [Indexed: 12/17/2022]
Abstract
The avian nucleus laminaris (NL) is a brainstem nucleus necessary for binaural processing, analogous in structure and function to the mammalian medial superior olive. In chickens (Gallus gallus), NL is a well-studied model system for activity-dependent neural plasticity. Its neurons have bipolar extension of dendrites, which receive segregated inputs from two ears and display rapid and compartment-specific reorganization in response to unilateral changes in auditory input. More recently, fragile X mental retardation protein (FMRP), an RNA-binding protein that regulates local protein translation, has been shown to be enriched in NL dendrites, suggesting its potential role in the structural dynamics of these dendrites. To explore the molecular role of FMRP in this nucleus, we performed proteomic analysis of NL, using micro laser capture and liquid chromatography tandem mass spectrometry. We identified 657 proteins, greatly represented in pathways involved in mitochondria, translation and metabolism, consistent with high levels of activity of NL neurons. Of these, 94 are potential FMRP targets, by comparative analysis with previously proposed FMRP targets in mammals. These proteins are enriched in pathways involved in cellular growth, cellular trafficking and transmembrane transport. Immunocytochemistry verified the dendritic localization of several proteins in NL. Furthermore, we confirmed the direct interaction of FMRP with one candidate, RhoC, by in vitro RNA binding assays. In summary, we provide a database of highly expressed proteins in NL and in particular a list of potential FMRP targets, with the goal of facilitating molecular characterization of FMRP signaling in future studies.
Collapse
Affiliation(s)
- Hitomi Sakano
- Virginia Merrill Bloedel Hearing Research Center, Department of Otolaryngology-Head and Neck Surgery, University of Washington, School of Medicine, Seattle, Washington
| | - Diego A R Zorio
- Department of Biomedical Sciences, Florida State University, Tallahassee, Florida
| | - Xiaoyu Wang
- Department of Biomedical Sciences, Florida State University, Tallahassee, Florida
| | - Ying S Ting
- Department of Genome Sciences, University of Washington, Seattle, Washington
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington
| | - Michael J MacCoss
- Department of Genome Sciences, University of Washington, Seattle, Washington
| | - Edwin W Rubel
- Virginia Merrill Bloedel Hearing Research Center, Department of Otolaryngology-Head and Neck Surgery, University of Washington, School of Medicine, Seattle, Washington
| | - Yuan Wang
- Department of Biomedical Sciences, Florida State University, Tallahassee, Florida.,Program in Neuroscience, Florida State University, Tallahassee, Florida
| |
Collapse
|
40
|
Kim S, Liachko I, Brickner DG, Cook K, Noble WS, Brickner JH, Shendure J, Dunham MJ. The dynamic three-dimensional organization of the diploid yeast genome. eLife 2017; 6. [PMID: 28537556 PMCID: PMC5476426 DOI: 10.7554/elife.23623] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2016] [Accepted: 05/22/2017] [Indexed: 12/14/2022] Open
Abstract
The budding yeast Saccharomyces cerevisiae is a long-standing model for the three-dimensional organization of eukaryotic genomes. However, even in this well-studied model, it is unclear how homolog pairing in diploids or environmental conditions influence overall genome organization. Here, we performed high-throughput chromosome conformation capture on diverged Saccharomyces hybrid diploids to obtain the first global view of chromosome conformation in diploid yeasts. After controlling for the Rabl-like orientation using a polymer model, we observe significant homolog proximity that increases in saturated culture conditions. Surprisingly, we observe a localized increase in homologous interactions between the HAS1-TDA1 alleles specifically under galactose induction and saturated growth. This pairing is accompanied by relocalization to the nuclear periphery and requires Nup2, suggesting a role for nuclear pore complexes. Together, these results reveal that the diploid yeast genome has a dynamic and complex 3D organization. DOI:http://dx.doi.org/10.7554/eLife.23623.001 Most of the DNA in human, yeast and other eukaryotic cells is packaged into long thread-like structures called chromosomes within a compartment of the cell called the nucleus. The chromosomes are folded to fit inside the nucleus and this organization influences how the DNA is read, copied, and repaired. The folding of chromosomes must be robust in order to protect the organism’s genetic material and yet be flexible enough to allow different parts of the DNA to be accessed in response to different signals. A biochemical technique called Hi-C can be used to detect the points of contact between different regions of a chromosome and between different chromosomes, thereby providing information on how the chromosomes are folded and arranged inside the nucleus. However, most animal cells contain two copies of each chromosome, and the Hi-C method is not able to distinguish between identical copies of chromosomes. As such, it remains unclear how much the chromosomes that can form pairs actually stick together in a cell’s nucleus. Unlike humans and most organisms, two distantly related budding yeast species can mate to produce a “hybrid” in which the chromosome copies can easily be distinguished from each other. Kim et al. now use Hi-C to analyze how chromosomes are organized in hybrid budding yeast cells. The experiments reveal that the copies of a chromosome contact each other more frequently than would be expected by chance. This is especially true for certain chromosomal regions and in hybrid yeast cells that are running out of their preferred nutrient, glucose. In these cells, the regions of both copies of chromosome 13 near a gene called TDA1 are pulled to the edge of the nucleus, which helps the copies to pair up and the gene to become active. The protein encoded by TDA1 then helps turn on other genes that allow the yeast to use nutrients other than glucose. Many questions remain about how and why DNA is organized the way it is, both in yeast and in other organisms. These findings will help guide future experiments testing how the two copies of each chromosome pair, as well as what purpose, if any, this pairing might serve for the cell. A better understanding of the fundamental process of DNA organization and its implications may ultimately lead to improved treatments for genetic diseases including developmental disorders and cancers. DOI:http://dx.doi.org/10.7554/eLife.23623.002
Collapse
Affiliation(s)
- Seungsoo Kim
- Department of Genome Sciences, University of Washington, Seattle, United States
| | - Ivan Liachko
- Department of Genome Sciences, University of Washington, Seattle, United States
| | - Donna G Brickner
- Department of Molecular Biosciences, Northwestern University, Evanston, United States
| | - Kate Cook
- Department of Genome Sciences, University of Washington, Seattle, United States
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, United States
| | - Jason H Brickner
- Department of Molecular Biosciences, Northwestern University, Evanston, United States
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, United States.,Howard Hughes Medical Institute, University of Washington, Seattle, United States
| | - Maitreya J Dunham
- Department of Genome Sciences, University of Washington, Seattle, United States
| |
Collapse
|
41
|
Abstract
In shotgun proteomics analysis, user-specified parameters are critical to database search performance and therefore to the yield of confident peptide-spectrum matches (PSMs). Two of the most important parameters are related to the accuracy of the mass spectrometer. Precursor mass tolerance defines the peptide candidates considered for each spectrum. Fragment mass tolerance or bin size determines how close observed and theoretical fragments must be to be considered a match. For either of these two parameters, too wide a setting yields randomly high-scoring false PSMs, whereas too narrow a setting erroneously excludes true PSMs, in both cases, lowering the yield of peptides detected at a given false discovery rate. We describe a strategy for inferring optimal search parameters by assembling and analyzing pairs of spectra that are likely to have been generated by the same peptide ion to infer precursor and fragment mass error. This strategy does not rely on a database search, making it usable in a wide variety of settings. In our experiments on data from a variety of instruments including Orbitrap and Q-TOF acquisitions, this strategy yields more high-confidence PSMs than using settings based on instrument defaults or determined by experts. Param-Medic is open-source and cross-platform. It is available as a standalone tool ( http://noble.gs.washington.edu/proj/param-medic/ ) and has been integrated into the Crux proteomics toolkit ( http://crux.ms ), providing automatic parameter selection for the Comet and Tide search engines.
Collapse
Affiliation(s)
- Damon H May
- Department of Genome Sciences, University of Washington , Seattle, Washington 98195, United States
| | - Kaipo Tamura
- Department of Genome Sciences, University of Washington , Seattle, Washington 98195, United States
| | - William S Noble
- Department of Genome Sciences, University of Washington , Seattle, Washington 98195, United States
- Department of Computer Science and Engineering, University of Washington , Seattle, Washington 98195, United States
| |
Collapse
|
42
|
Sychev ZE, Hu A, DiMaio TA, Gitter A, Camp ND, Noble WS, Wolf-Yadlin A, Lagunoff M. Integrated systems biology analysis of KSHV latent infection reveals viral induction and reliance on peroxisome mediated lipid metabolism. PLoS Pathog 2017; 13:e1006256. [PMID: 28257516 PMCID: PMC5352148 DOI: 10.1371/journal.ppat.1006256] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2016] [Revised: 03/15/2017] [Accepted: 02/22/2017] [Indexed: 12/22/2022] Open
Abstract
Kaposi’s Sarcoma associated Herpesvirus (KSHV), an oncogenic, human gamma-herpesvirus, is the etiological agent of Kaposi’s Sarcoma the most common tumor of AIDS patients world-wide. KSHV is predominantly latent in the main KS tumor cell, the spindle cell, a cell of endothelial origin. KSHV modulates numerous host cell-signaling pathways to activate endothelial cells including major metabolic pathways involved in lipid metabolism. To identify the underlying cellular mechanisms of KSHV alteration of host signaling and endothelial cell activation, we identified changes in the host proteome, phosphoproteome and transcriptome landscape following KSHV infection of endothelial cells. A Steiner forest algorithm was used to integrate the global data sets and, together with transcriptome based predicted transcription factor activity, cellular networks altered by latent KSHV were predicted. Several interesting pathways were identified, including peroxisome biogenesis. To validate the predictions, we showed that KSHV latent infection increases the number of peroxisomes per cell. Additionally, proteins involved in peroxisomal lipid metabolism of very long chain fatty acids, including ABCD3 and ACOX1, are required for the survival of latently infected cells. In summary, novel cellular pathways altered during herpesvirus latency that could not be predicted by a single systems biology platform, were identified by integrated proteomics and transcriptomics data analysis and when correlated with our metabolomics data revealed that peroxisome lipid metabolism is essential for KSHV latent infection of endothelial cells. Kaposi’s Sarcoma herpesvirus (KSHV) is the etiologic agent of Kaposi’s Sarcoma, the most common tumor of AIDS patients. KSHV modulates host cell signaling and metabolism to maintain a life-long latent infection. To unravel the underlying cellular mechanisms modulated by KSHV, we used multiple global systems biology platforms to identify and integrate changes in both cellular protein expression and transcription following KSHV infection of endothelial cells, the relevant cell type for KS tumors. The analysis identified several interesting pathways including peroxisome biogenesis. Peroxisomes are small cytoplasmic organelles involved in redox reactions and lipid metabolism. KSHV latent infection increases the number of peroxisomes per cell and proteins involved in peroxisomal lipid metabolism are required for the survival of latently infected cells. In summary, through integration of multiple global systems biology analyses we were able to identify novel pathways that could not be predicted by one platform alone and found that lipid metabolism in a small cytoplasmic organelle is necessary for the survival of latent infection with a herpesvirus.
Collapse
Affiliation(s)
- Zoi E. Sychev
- Molecular and Cellular Biology Program, University of Washington, Seattle, Washington, United States of America
- Department of Microbiology, University of Washington, Seattle, Washington, United States of America
| | - Alex Hu
- Department of Genome Science, University of Washington, Seattle, Washington, United States of America
| | - Terri A. DiMaio
- Department of Microbiology, University of Washington, Seattle, Washington, United States of America
| | - Anthony Gitter
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison and Morgridge Institute for Research, Madison, Wisconsin, USA
| | - Nathan D. Camp
- Department of Genome Science, University of Washington, Seattle, Washington, United States of America
| | - William S. Noble
- Department of Genome Science, University of Washington, Seattle, Washington, United States of America
| | - Alejandro Wolf-Yadlin
- Department of Genome Science, University of Washington, Seattle, Washington, United States of America
- * E-mail: (ML); (AWY)
| | - Michael Lagunoff
- Department of Microbiology, University of Washington, Seattle, Washington, United States of America
- * E-mail: (ML); (AWY)
| |
Collapse
|
43
|
Ramani V, Deng X, Qiu R, Gunderson KL, Steemers FJ, Disteche CM, Noble WS, Duan Z, Shendure J. Massively multiplex single-cell Hi-C. Nat Methods 2017; 14:263-266. [PMID: 28135255 PMCID: PMC5330809 DOI: 10.1038/nmeth.4155] [Citation(s) in RCA: 340] [Impact Index Per Article: 48.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2016] [Accepted: 12/14/2016] [Indexed: 02/06/2023]
Abstract
We present single-cell combinatorial indexed Hi-C (sciHi-C), a method that applies combinatorial cellular indexing to chromosome conformation capture. In this proof of concept, we generate and sequence six sciHi-C libraries comprising a total of 10,696 single cells. We use sciHi-C data to separate cells by karyotypic and cell-cycle state differences and identify cell-to-cell heterogeneity in mammalian chromosomal conformation. Our results demonstrate that combinatorial indexing is a generalizable strategy for single-cell genomics.
Collapse
Affiliation(s)
- Vijay Ramani
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Xinxian Deng
- Department of Pathology, University of Washington, Seattle, Washington, USA
| | - Ruolan Qiu
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | | | - Frank J Steemers
- Illumina Inc., Advanced Research Group, San Diego, California, USA
| | - Christine M Disteche
- Department of Pathology, University of Washington, Seattle, Washington, USA.,Department of Medicine, University of Washington, Seattle, Washington, USA
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Zhijun Duan
- Division of Hematology, University of Washington School of Medicine, Seattle, Washington, USA.,Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, Washington, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA.,Howard Hughes Medical Institute, Seattle, Washington, USA
| |
Collapse
|
44
|
The M, MacCoss MJ, Noble WS, Käll L. Fast and Accurate Protein False Discovery Rates on Large-Scale Proteomics Data Sets with Percolator 3.0. J Am Soc Mass Spectrom 2016; 27:1719-1727. [PMID: 27572102 PMCID: PMC5059416 DOI: 10.1007/s13361-016-1460-7] [Citation(s) in RCA: 225] [Impact Index Per Article: 28.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Revised: 06/15/2016] [Accepted: 07/20/2016] [Indexed: 05/21/2023]
Abstract
Percolator is a widely used software tool that increases yield in shotgun proteomics experiments and assigns reliable statistical confidence measures, such as q values and posterior error probabilities, to peptides and peptide-spectrum matches (PSMs) from such experiments. Percolator's processing speed has been sufficient for typical data sets consisting of hundreds of thousands of PSMs. With our new scalable approach, we can now also analyze millions of PSMs in a matter of minutes on a commodity computer. Furthermore, with the increasing awareness for the need for reliable statistics on the protein level, we compared several easy-to-understand protein inference methods and implemented the best-performing method-grouping proteins by their corresponding sets of theoretical peptides and then considering only the best-scoring peptide for each protein-in the Percolator package. We used Percolator 3.0 to analyze the data from a recent study of the draft human proteome containing 25 million spectra (PM:24870542). The source code and Ubuntu, Windows, MacOS, and Fedora binary packages are available from http://percolator.ms/ under an Apache 2.0 license. Graphical Abstract ᅟ.
Collapse
Affiliation(s)
- Matthew The
- Science for Life Laboratory, School of Biotechnology, KTH - Royal Institute of Technology, Box 1031, 17121, Solna, Sweden
| | - Michael J MacCoss
- Department of Genome Sciences, School of Medicine, University of Washington, Seattle, WA, 98195, USA
| | - William S Noble
- Department of Genome Sciences, School of Medicine, University of Washington, Seattle, WA, 98195, USA
- Department of Computer Science and Engineering, University of Washington, Seattle, WA, 98195, USA
| | - Lukas Käll
- Science for Life Laboratory, School of Biotechnology, KTH - Royal Institute of Technology, Box 1031, 17121, Solna, Sweden.
| |
Collapse
|
45
|
Ramani V, Cusanovich DA, Hause RJ, Ma W, Qiu R, Deng X, Blau CA, Disteche CM, Noble WS, Shendure J, Duan Z. Mapping 3D genome architecture through in situ DNase Hi-C. Nat Protoc 2016; 11:2104-21. [PMID: 27685100 PMCID: PMC5547819 DOI: 10.1038/nprot.2016.126] [Citation(s) in RCA: 72] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
With the advent of massively parallel sequencing, considerable work has gone into adapting chromosome conformation capture (3C) techniques to study chromosomal architecture at a genome-wide scale. We recently demonstrated that the inactive murine X chromosome adopts a bipartite structure using a novel 3C protocol, termed in situ DNase Hi-C. Like traditional Hi-C protocols, in situ DNase Hi-C requires that chromatin be chemically cross-linked, digested, end-repaired, and proximity-ligated with a biotinylated bridge adaptor. The resulting ligation products are optionally sheared, affinity-purified via streptavidin bead immobilization, and subjected to traditional next-generation library preparation for Illumina paired-end sequencing. Importantly, in situ DNase Hi-C obviates the dependence on a restriction enzyme to digest chromatin, instead relying on the endonuclease DNase I. Libraries generated by in situ DNase Hi-C have a higher effective resolution than traditional Hi-C libraries, which makes them valuable in cases in which high sequencing depth is allowed for, or when hybrid capture technologies are expected to be used. The protocol described here, which involves ∼4 d of bench work, is optimized for the study of mammalian cells, but it can be broadly applicable to any cell or tissue of interest, given experimental parameter optimization.
Collapse
Affiliation(s)
- Vijay Ramani
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Darren A Cusanovich
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Ronald J Hause
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Wenxiu Ma
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Ruolan Qiu
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Xinxian Deng
- Department of Pathology, University of Washington, Seattle, Washington, USA
| | - C. Anthony Blau
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, Washington, USA
| | - Christine M. Disteche
- Department of Pathology, University of Washington, Seattle, Washington, USA
- Department of Medicine, University of Washington, Seattle, Washington, USA
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
- Howard Hughes Medical Institute, Seattle, Washington, USA
| | - Zhijun Duan
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, Washington, USA
- Division of Hematology, University of Washington, Seattle, Washington, USA
| |
Collapse
|
46
|
May DH, Timmins-Schiffman E, Mikan MP, Harvey HR, Borenstein E, Nunn BL, Noble WS. An Alignment-Free "Metapeptide" Strategy for Metaproteomic Characterization of Microbiome Samples Using Shotgun Metagenomic Sequencing. J Proteome Res 2016; 15:2697-705. [PMID: 27396978 PMCID: PMC5116374 DOI: 10.1021/acs.jproteome.6b00239] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
In principle, tandem mass spectrometry can be used to detect and quantify the peptides present in a microbiome sample, enabling functional and taxonomic insight into microbiome metabolic activity. However, the phylogenetic diversity constituting a particular microbiome is often unknown, and many of the organisms present may not have assembled genomes. In ocean microbiome samples, with particularly diverse and uncultured bacterial communities, it is difficult to construct protein databases that contain the bulk of the peptides in the sample without losing detection sensitivity due to the overwhelming number of candidate peptides for each tandem mass spectrum. We describe a method for deriving "metapeptides" (short amino acid sequences that may be represented in multiple organisms) from shotgun metagenomic sequencing of microbiome samples. In two ocean microbiome samples, we constructed site-specific metapeptide databases to detect more than one and a half times as many peptides as by searching against predicted genes from an assembled metagenome and roughly three times as many peptides as by searching against the NCBI environmental proteome database. The increased peptide yield has the potential to enrich the taxonomic and functional characterization of sample metaproteomes.
Collapse
Affiliation(s)
- Damon H May
- Department of Genome Sciences and ‡Department of Computer Science and Engineering, University of Washington , Seattle, Washington 98195-5065, United States
| | - Emma Timmins-Schiffman
- Department of Genome Sciences and ‡Department of Computer Science and Engineering, University of Washington , Seattle, Washington 98195-5065, United States
| | - Molly P Mikan
- Department of Ocean, Earth & Atmospheric Sciences, Old Dominion University , Norfolk, Virginia 23529, United States
| | - H Rodger Harvey
- Department of Ocean, Earth & Atmospheric Sciences, Old Dominion University , Norfolk, Virginia 23529, United States
| | - Elhanan Borenstein
- Department of Genome Sciences and ‡Department of Computer Science and Engineering, University of Washington , Seattle, Washington 98195-5065, United States
- Santa Fe Institute , Santa Fe, New Mexico 87501, United States
| | - Brook L Nunn
- Department of Genome Sciences and ‡Department of Computer Science and Engineering, University of Washington , Seattle, Washington 98195-5065, United States
| | - William S Noble
- Department of Genome Sciences and ‡Department of Computer Science and Engineering, University of Washington , Seattle, Washington 98195-5065, United States
| |
Collapse
|
47
|
Abstract
A central problem in mass spectrometry analysis involves identifying, for each observed tandem mass spectrum, the corresponding generating peptide. We present a dynamic Bayesian network (DBN) toolkit that addresses this problem by using a machine learning approach. At the heart of this toolkit is a DBN for Rapid Identification (DRIP), which can be trained from collections of high-confidence peptide-spectrum matches (PSMs). DRIP's score function considers fragment ion matches using Gaussians rather than fixed fragment-ion tolerances and also finds the optimal alignment between the theoretical and observed spectrum by considering all possible alignments, up to a threshold that is controlled using a beam-pruning algorithm. This function not only yields state-of-the art database search accuracy but also can be used to generate features that significantly boost the performance of the Percolator postprocessor. The DRIP software is built upon a general purpose DBN toolkit (GMTK), thereby allowing a wide variety of options for user-specific inference tasks as well as facilitating easy modifications to the DRIP model in future work. DRIP is implemented in Python and C++ and is available under Apache license at http://melodi-lab.github.io/dripToolkit .
Collapse
Affiliation(s)
- John T Halloran
- Department of Electrical Engineering, University of Washington , Seattle 98195, Washington, United States
| | - Jeff A Bilmes
- Department of Electrical Engineering, University of Washington , Seattle 98195, Washington, United States
| | - William S Noble
- Department of Genome Sciences, University of Washington , Seattle 98195, Washington, United States
| |
Collapse
|
48
|
Wang S, Halloran JT, Bilmes JA, Noble WS. Faster and more accurate graphical model identification of tandem mass spectra using trellises. Bioinformatics 2016; 32:i322-i331. [PMID: 27307634 PMCID: PMC4908353 DOI: 10.1093/bioinformatics/btw269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Tandem mass spectrometry (MS/MS) is the dominant high throughput technology for identifying and quantifying proteins in complex biological samples. Analysis of the tens of thousands of fragmentation spectra produced by an MS/MS experiment begins by assigning to each observed spectrum the peptide that is hypothesized to be responsible for generating the spectrum. This assignment is typically done by searching each spectrum against a database of peptides. To our knowledge, all existing MS/MS search engines compute scores individually between a given observed spectrum and each possible candidate peptide from the database. In this work, we use a trellis, a data structure capable of jointly representing a large set of candidate peptides, to avoid redundantly recomputing common sub-computations among different candidates. We show how trellises may be used to significantly speed up existing scoring algorithms, and we theoretically quantify the expected speedup afforded by trellises. Furthermore, we demonstrate that compact trellis representations of whole sets of peptides enables efficient discriminative learning of a dynamic Bayesian network for spectrum identification, leading to greatly improved spectrum identification accuracy. Contact:bilmes@uw.edu or william-noble@uw.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Jeff A Bilmes
- Department of Computer Science and Engineering Department of Electrical Engineering
| | - William S Noble
- Department of Computer Science and Engineering Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
49
|
Smith OK, Kim R, Fu H, Martin MM, Lin CM, Utani K, Zhang Y, Marks AB, Lalande M, Chamberlain S, Libbrecht MW, Bouhassira EE, Ryan MC, Noble WS, Aladjem MI. Distinct epigenetic features of differentiation-regulated replication origins. Epigenetics Chromatin 2016; 9:18. [PMID: 27168766 PMCID: PMC4862150 DOI: 10.1186/s13072-016-0067-3] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2015] [Accepted: 04/25/2016] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Eukaryotic genome duplication starts at discrete sequences (replication origins) that coordinate cell cycle progression, ensure genomic stability and modulate gene expression. Origins share some sequence features, but their activity also responds to changes in transcription and cellular differentiation status. RESULTS To identify chromatin states and histone modifications that locally mark replication origins, we profiled origin distributions in eight human cell lines representing embryonic and differentiated cell types. Consistent with a role of chromatin structure in determining origin activity, we found that cancer and non-cancer cells of similar lineages exhibited highly similar replication origin distributions. Surprisingly, our study revealed that DNase hypersensitivity, which often correlates with early replication at large-scale chromatin domains, did not emerge as a strong local determinant of origin activity. Instead, we found that two distinct sets of chromatin modifications exhibited strong local associations with two discrete groups of replication origins. The first origin group consisted of about 40,000 regions that actively initiated replication in all cell types and preferentially colocalized with unmethylated CpGs and with the euchromatin markers, H3K4me3 and H3K9Ac. The second group included origins that were consistently active in cells of a single type or lineage and preferentially colocalized with the heterochromatin marker, H3K9me3. Shared origins replicated throughout the S-phase of the cell cycle, whereas cell-type-specific origins preferentially replicated during late S-phase. CONCLUSIONS These observations are in line with the hypothesis that differentiation-associated changes in chromatin and gene expression affect the activation of specific replication origins.
Collapse
Affiliation(s)
- Owen K. Smith
- />DNA Replication Group, Developmental Therapeutics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892 USA
| | - RyanGuk Kim
- />In Silico Solutions, Falls Church, VA 22033 USA
| | - Haiqing Fu
- />DNA Replication Group, Developmental Therapeutics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892 USA
| | - Melvenia M. Martin
- />DNA Replication Group, Developmental Therapeutics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892 USA
| | - Chii Mei Lin
- />DNA Replication Group, Developmental Therapeutics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892 USA
| | - Koichi Utani
- />DNA Replication Group, Developmental Therapeutics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892 USA
| | - Ya Zhang
- />DNA Replication Group, Developmental Therapeutics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892 USA
| | - Anna B. Marks
- />DNA Replication Group, Developmental Therapeutics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892 USA
| | - Marc Lalande
- />Department of Genetics and Developmental Biology, University of Connecticut Health Center, Farmington, CT 06032 USA
| | - Stormy Chamberlain
- />Department of Genetics and Developmental Biology, University of Connecticut Health Center, Farmington, CT 06032 USA
| | - Maxwell W. Libbrecht
- />Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195 USA
| | - Eric E. Bouhassira
- />Department of Cell Biology, Albert Einstein College of Medicine, Bronx, NY 10461 USA
| | | | - William S. Noble
- />Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195 USA
- />Department of Genome Sciences, University of Washington, Seattle, WA 98195 USA
| | - Mirit I. Aladjem
- />DNA Replication Group, Developmental Therapeutics Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892 USA
| |
Collapse
|
50
|
Abstract
The ultimate aim of proteomics is to fully identify and quantify the entire complement of proteins and post-translational modifications in biological samples of interest. For the last 15 years, liquid chromatography-tandem mass spectrometry (LC-MS/MS) in data-dependent acquisition (DDA) mode has been the standard for proteomics when sampling breadth and discovery were the main objectives; multiple reaction monitoring (MRM) LC-MS/MS has been the standard for targeted proteomics when precise quantification, reproducibility, and validation were the main objectives. Recently, improvements in mass spectrometer design and bioinformatics algorithms have resulted in the rediscovery and development of another sampling method: data-independent acquisition (DIA). DIA comprehensively and repeatedly samples every peptide in a protein digest, producing a complex set of mass spectra that is difficult to interpret without external spectral libraries. Currently, DIA approaches the identification breadth of DDA while achieving the reproducible quantification characteristic of MRM or its newest version, parallel reaction monitoring (PRM). In comparative
de novo identification and quantification studies in human cell lysates, DIA identified up to 89% of the proteins detected in a comparable DDA experiment while providing reproducible quantification of over 85% of them. DIA analysis aided by spectral libraries derived from prior DIA experiments or auxiliary DDA data produces identification and quantification as reproducible and precise as that achieved by MRM/PRM, except on low‑abundance peptides that are obscured by stronger signals. DIA is still a work in progress toward the goal of sensitive, reproducible, and precise quantification without external spectral libraries. New software tools applied to DIA analysis have to deal with deconvolution of complex spectra as well as proper filtering of false positives and false negatives. However, the future outlook is positive, and various researchers are working on novel bioinformatics techniques to address these issues and increase the reproducibility, fidelity, and identification breadth of DIA.
Collapse
Affiliation(s)
- Alex Hu
- Department of Genome Sciences, University of Washington, Seattle, WA, 98109, USA
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, 98109, USA
| | | |
Collapse
|