Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

169
(from Reference Citation Analysis)

Article PDFs (50)

Cited by > 0 (135)

Searched Name

David Landsman

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Statistics

Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Category

Show more Refine

Number	Citation Analysis
1	Detection of new pioneer transcription factors as cell-type-specific nucleosome binders. eLife 2024;12:RP88936. [PMID: 38293962 PMCID: PMC10945518 DOI: 10.7554/elife.88936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2024] Open Abstract Wrapping of DNA into nucleosomes restricts accessibility to DNA and may affect the recognition of binding motifs by transcription factors. A certain class of transcription factors, the pioneer transcription factors, can specifically recognize their DNA binding sites on nucleosomes, initiate local chromatin opening, and facilitate the binding of co-factors in a cell-type-specific manner. For the majority of human pioneer transcription factors, the locations of their binding sites, mechanisms of binding, and regulation remain unknown. We have developed a computational method to predict the cell-type-specific ability of transcription factors to bind nucleosomes by integrating ChIP-seq, MNase-seq, and DNase-seq data with details of nucleosome structure. We have demonstrated the ability of our approach in discriminating pioneer from canonical transcription factors and predicted new potential pioneer transcription factors in H1, K562, HepG2, and HeLa-S3 cell lines. Last, we systematically analyzed the interaction modes between various pioneer transcription factors and detected several clusters of distinctive binding sites on nucleosomal DNA. Collapse Key Words chromatin computational computational biology human nucleosome nucleosome binding pioneer transcription factor systems biology transcription factor Collapse MESH Headings Humans Nucleosomes/genetics Transcription Factors/genetics Transcription Factors/metabolism Chromatin DNA/metabolism Binding Sites Collapse Grants National Library of Medicine NIH HHS Canada Research Chairs Ontario Institute for Cancer Research Natural Sciences and Engineering Research Council of Canada National Natural Science Foundation of China Cancer Research UK Cambridge Institute, University of Cambridge National Institutes of Health Collapse
2	GTax: improving de novo transcriptome assembly by removing foreign RNA contamination. Genome Biol 2024;25:12. [PMID: 38191464 PMCID: PMC10773103 DOI: 10.1186/s13059-023-03141-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 12/08/2023] [Indexed: 01/10/2024] Open Abstract The cost and complexity of generating a complete reference genome means that many organisms lack an annotated reference. An alternative is to use a de novo reference transcriptome. This technology is cost-effective but is susceptible to off-target RNA contamination. In this manuscript, we present GTax, a taxonomy-structured database of genomic sequences that can be used with BLAST to detect and remove foreign contamination in RNA sequencing samples before assembly. In addition, we use a de novo transcriptome assembly of Solanum lycopersicum (tomato) to demonstrate that removing foreign contamination in sequencing samples reduces the number of assembled chimeric transcripts. Collapse Key Words Collapse MESH Headings Transcriptome Databases, Factual Genomics RNA Sequence Analysis, RNA Solanum lycopersicum/genetics Collapse Grants U.S. National Library of Medicine National Institutes of Health (NIH) Collapse
3	Detection of new pioneer transcription factors as cell-type specific nucleosome binders. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.10.540098. [PMID: 37425841 PMCID: PMC10327179 DOI: 10.1101/2023.05.10.540098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023] Abstract Wrapping of DNA into nucleosomes restricts accessibility to the DNA and may affect the recognition of binding motifs by transcription factors. A certain class of transcription factors, the pioneer transcription factors, can specifically recognize their DNA binding sites on nucleosomes, may initiate local chromatin opening and facilitate the binding of co-factors in a cell-type-specific manner. For the majority of human pioneer transcription factors, the locations of their binding sites, mechanisms of binding and regulation remain unknown. We have developed a computational method to predict the cell-type-specific ability of transcription factors to bind nucleosomes by integrating ChIP-seq, MNase-seq and DNase-seq data with details of nucleosome structure. We have demonstrated the ability of our approach in discriminating pioneer from canonical transcription factors and predicted new potential pioneer transcription factors in H1, K562, HepG2 and HeLa cell lines. Lastly, we systemically analyzed the interaction modes between various pioneer transcription factors and detected several clusters of distinctive binding sites on nucleosomal DNA. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
4	Clustered and diverse transcription factor binding underlies cell type specificity of enhancers for housekeeping genes. Genome Res 2023;33:1662-1672. [PMID: 37884340 PMCID: PMC10691539 DOI: 10.1101/gr.278130.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 09/12/2023] [Indexed: 10/28/2023] Abstract Housekeeping genes are considered to be regulated by common enhancers across different tissues. Here we report that most of the commonly expressed mouse or human genes across different cell types, including more than half of the previously identified housekeeping genes, are associated with cell type-specific enhancers. Furthermore, the binding of most transcription factors (TFs) is cell type-specific. We reason that these cell type specificities are causally related to the collective TF recruitment at regulatory sites, as TFs tend to bind to regions associated with many other TFs and each cell type has a unique repertoire of expressed TFs. Based on binding profiles of hundreds of TFs from HepG2, K562, and GM12878 cells, we show that 80% of all TF peaks overlapping H3K27ac signals are in the top 20,000-23,000 most TF-enriched H3K27ac peak regions, and approximately 12,000-15,000 of these peaks are enhancers (nonpromoters). Those enhancers are mainly cell type-specific and include those linked to the majority of commonly expressed genes. Moreover, we show that the top 15,000 most TF-enriched regulatory sites in HepG2 cells, associated with about 200 TFs, can be predicted largely from the binding profile of as few as 30 TFs. Through motif analysis, we show that major enhancers harbor diverse and clustered motifs from a combination of available TFs uniquely present in each cell type. We propose a mechanism that explains how the highly focused TF binding at regulatory sites results in cell type specificity of enhancers for housekeeping and commonly expressed genes. Collapse Key Words Collapse MESH Headings Humans Mice Animals Transcription Factors/genetics Transcription Factors/metabolism Genes, Essential Gene Expression Regulation Regulatory Sequences, Nucleic Acid Protein Binding Binding Sites Collapse Grants National Library of Medicine National Institutes of Health Collapse
5	COVID-19 Cases Among Congregate Care Facility Staff by Neighborhood of Residence and Social and Structural Determinants: Observational Study. JMIR Public Health Surveill 2022;8:e34927. [PMID: 35867901 PMCID: PMC9534317 DOI: 10.2196/34927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 07/20/2022] [Accepted: 07/21/2022] [Indexed: 11/13/2022] Open Abstract Background Disproportionate risks of COVID-19 in congregate care facilities including long-term care homes, retirement homes, and shelters both affect and are affected by SARS-CoV-2 infections among facility staff. In cities across Canada, there has been a consistent trend of geographic clustering of COVID-19 cases. However, there is limited information on how COVID-19 among facility staff reflects urban neighborhood disparities, particularly when stratified by the social and structural determinants of community-level transmission. Objective This study aimed to compare the concentration of cumulative cases by geography and social and structural determinants across 3 mutually exclusive subgroups in the Greater Toronto Area (population: 7.1 million): community, facility staff, and health care workers (HCWs) in other settings. Methods We conducted a retrospective, observational study using surveillance data on laboratory-confirmed COVID-19 cases (January 23 to December 13, 2020; prior to vaccination rollout). We derived neighborhood-level social and structural determinants from census data and generated Lorenz curves, Gini coefficients, and the Hoover index to visualize and quantify inequalities in cases. Results The hardest-hit neighborhoods (comprising 20% of the population) accounted for 53.87% (44,937/83,419) of community cases, 48.59% (2356/4849) of facility staff cases, and 42.34% (1669/3942) of other HCW cases. Compared with other HCWs, cases among facility staff reflected the distribution of community cases more closely. Cases among facility staff reflected greater social and structural inequalities (larger Gini coefficients) than those of other HCWs across all determinants. Facility staff cases were also more likely than community cases to be concentrated in lower-income neighborhoods (Gini 0.24, 95% CI 0.15-0.38 vs 0.14, 95% CI 0.08-0.21) with a higher household density (Gini 0.23, 95% CI 0.17-0.29 vs 0.17, 95% CI 0.12-0.22) and with a greater proportion working in other essential services (Gini 0.29, 95% CI 0.21-0.40 vs 0.22, 95% CI 0.17-0.28). Conclusions COVID-19 cases among facility staff largely reflect neighborhood-level heterogeneity and disparities, even more so than cases among other HCWs. The findings signal the importance of interventions prioritized and tailored to the home geographies of facility staff in addition to workplace measures, including prioritization and reach of vaccination at home (neighborhood level) and at work. Collapse Key Words COVID-19 Canada Toronto congregate congregate living elderly essential worker geography health care worker long-term care nurse nursing home observational older adults retirement retirement home risk shelter staff transmission trend Collapse MESH Headings Collapse Grants Collapse
6	A standardized nomenclature for mammalian histone genes. Epigenetics Chromatin 2022;15:34. [PMID: 36180920 PMCID: PMC9526256 DOI: 10.1186/s13072-022-00467-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 09/21/2022] [Indexed: 11/10/2022] Open Abstract Histones have a long history of research in a wide range of species, leaving a legacy of complex nomenclature in the literature. Community-led discussions at the EMBO Workshop on Histone Variants in 2011 resulted in agreement amongst experts on a revised systematic protein nomenclature for histones, which is based on a combination of phylogenetic classification and historical symbol usage. Human and mouse histone gene symbols previously followed a genome-centric system that was not applicable across all vertebrate species and did not reflect the systematic histone protein nomenclature. This prompted a collaboration between histone experts, the Human Genome Organization (HUGO) Gene Nomenclature Committee (HGNC) and Mouse Genomic Nomenclature Committee (MGNC) to revise human and mouse histone gene nomenclature aiming, where possible, to follow the new protein nomenclature whilst conforming to the guidelines for vertebrate gene naming. The updated nomenclature has also been applied to orthologous histone genes in chimpanzee, rhesus macaque, dog, cat, pig, horse and cattle, and can serve as a framework for naming other vertebrate histone genes in the future. Collapse Key Words Collapse MESH Headings Animals Cattle Dogs Genome Genomics/methods Histones/genetics Horses Humans Macaca mulatta Mammals/genetics Mice Phylogeny Swine Collapse Grants 208349/Z/17/Z Wellcome Trust R01 GM029832 NIGMS NIH HHS U24 HG003345 NHGRI NIH HHS Wellcome Trust Howard Hughes Medical Institute P41 HG000330 NHGRI NIH HHS National Human Genome Research Institute (NHGRI) Russian Science Foundation National Institutes of Health Senior Investigator Award from the Ontario Institute of Cancer Research European Molecular Biology Laboratory (EMBL) (4843) Collapse
7	Multiple epigenetic factors co-localize with HMGN proteins in A-compartment chromatin. Epigenetics Chromatin 2022;15:23. [PMID: 35761366 PMCID: PMC9235084 DOI: 10.1186/s13072-022-00457-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 06/05/2022] [Indexed: 08/30/2023] Open Abstract Background Nucleosomal binding proteins, HMGN, is a family of chromatin architectural proteins that are expressed in all vertebrate nuclei. Although previous studies have discovered that HMGN proteins have important roles in gene regulation and chromatin accessibility, whether and how HMGN proteins affect higher order chromatin status remains unknown. Results We examined the roles that HMGN1 and HMGN2 proteins play in higher order chromatin structures in three different cell types. We interrogated data generated in situ, using several techniques, including Hi–C, Promoter Capture Hi–C, ChIP-seq, and ChIP–MS. Our results show that HMGN proteins occupy the A compartment in the 3D nucleus space. In particular, HMGN proteins occupy genomic regions involved in cell-type-specific long-range promoter–enhancer interactions. Interestingly, depletion of HMGN proteins in the three different cell types does not cause structural changes in higher order chromatin, i.e., in topologically associated domains (TADs) and in A/B compartment scores. Using ChIP-seq combined with mass spectrometry, we discovered protein partners that are directly associated with or neighbors of HMGNs on nucleosomes. Conclusions We determined how HMGN chromatin architectural proteins are positioned within a 3D nucleus space, including the identification of their binding partners in mononucleosomes. Our research indicates that HMGN proteins localize to active chromatin compartments but do not have major effects on 3D higher order chromatin structure and that their binding to chromatin is not dependent on specific protein partners. Supplementary Information The online version contains supplementary material available at 10.1186/s13072-022-00457-4. Collapse Key Words Chromatin structure HMGN Hi–C Mass spectrometry Collapse MESH Headings Collapse Grants Collapse
8	DNA methylation cues in nucleosome geometry, stability and unwrapping. Nucleic Acids Res 2022;50:1864-1874. [PMID: 35166834 DOI: 10.1093/nar/gkac097] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 01/29/2022] [Accepted: 02/01/2022] [Indexed: 01/04/2023] Open Abstract Cytosine methylation at the 5-carbon position is an essential DNA epigenetic mark in many eukaryotic organisms. Although countless structural and functional studies of cytosine methylation have been reported, our understanding of how it influences the nucleosome assembly, structure, and dynamics remains obscure. Here, we investigate the effects of cytosine methylation at CpG sites on nucleosome dynamics and stability. By applying long molecular dynamics simulations on several microsecond time scale, we generate extensive atomistic conformational ensembles of full nucleosomes. Our results reveal that methylation induces pronounced changes in geometry for both linker and nucleosomal DNA, leading to a more curved, under-twisted DNA, narrowing the adjacent minor grooves, and shifting the population equilibrium of sugar-phosphate backbone geometry. These DNA conformational changes are associated with a considerable enhancement of interactions between methylated DNA and the histone octamer, doubling the number of contacts at some key arginines. H2A and H3 tails play important roles in these interactions, especially for DNA methylated nucleosomes. This, in turn, prevents a spontaneous DNA unwrapping of 3-4 helical turns for the methylated nucleosome with truncated histone tails, otherwise observed in the unmethylated system on several microseconds time scale. Collapse Key Words Collapse MESH Headings Cues Cytosine DNA/chemistry DNA Methylation Histones/metabolism Nucleosomes/genetics Collapse Grants Collapse
9	DNA methylation cues in nucleosome geometry, stability, and unwrapping. Biophys J 2022. [DOI: 10.1016/j.bpj.2021.11.2761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
10	The impact of histone mutations in cancer. Biophys J 2022. [DOI: 10.1016/j.bpj.2021.11.947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
11	Deciphering the effects of post-translational modifications and mutations on histone tail dynamics and interactions. Biophys J 2022. [DOI: 10.1016/j.bpj.2021.11.931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
12	Increasing concentration of COVID-19 by socioeconomic determinants and geography in Toronto, Canada: an observational study. Ann Epidemiol 2022;65:84-92. [PMID: 34320380 DOI: 10.1101/2021.04.01.21254585] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 07/15/2021] [Accepted: 07/18/2021] [Indexed: 05/20/2023] Abstract BACKGROUND Inequities in the burden of COVID-19 were observed early in Canada and around the world, suggesting economically marginalized communities faced disproportionate risks. However, there has been limited systematic assessment of how heterogeneity in risks has evolved in large urban centers over time. PURPOSE To address this gap, we quantified the magnitude of risk heterogeneity in Toronto, Ontario from January to November 2020 using a retrospective, population-based observational study using surveillance data. METHODS We generated epidemic curves by social determinants of health (SDOH) and crude Lorenz curves by neighbourhoods to visualize inequities in the distribution of COVID-19 and estimated Gini coefficients. We examined the correlation between SDOH using Pearson-correlation coefficients. RESULTS Gini coefficient of cumulative cases by population size was 0.41 (95% confidence interval [CI]:0.36-0.47) and estimated for: household income (0.20, 95%CI: 0.14-0.28); visible minority (0.21, 95%CI:0.16-0.28); recent immigration (0.12, 95%CI:0.09-0.16); suitable housing (0.21, 95%CI:0.14-0.30); multigenerational households (0.19, 95%CI:0.15-0.23); and essential workers (0.28, 95%CI:0.23-0.34). CONCLUSIONS There was rapid epidemiologic transition from higher- to lower-income neighborhoods with Lorenz curve transitioning from below to above the line of equality across SDOH. Moving forward necessitates integrating programs and policies addressing socioeconomic inequities and structural racism into COVID-19 prevention and vaccination programs. Collapse Key Words COVID-19 Disease transmission Gini coefficients Health inequity Lorenz curves SARS-CoV-2 Social determinants of health Collapse MESH Headings COVID-19 Geography Humans Ontario/epidemiology Retrospective Studies SARS-CoV-2 Socioeconomic Factors Systemic Racism Collapse Grants Collapse
13	Impact of Covid-19 on Tuberculosis Prevention and Treatment in Canada: a multicentre analysis of 10,833 patients. J Infect Dis 2021;225:1317-1320. [PMID: 34919700 PMCID: PMC8755327 DOI: 10.1093/infdis/jiab608] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 12/14/2021] [Indexed: 11/14/2022] Open Abstract We assessed the Covid-19 pandemic's impact on treatment of latent tuberculosis, and of active tuberculosis, at three centres in Montreal and Toronto, using data from 10,833 patients (8685 with latent tuberculosis infection, 2148 with active tuberculosis). Observation periods prior to declarations of Covid-19 public health emergencies ranged from 219 to 744 weeks, and post-declarations, from 28 to 33 weeks. In the latter period, reductions in latent tuberculosis infection treatment initiation rates ranged from 30% to 66%. At two centres, active tuberculosis treatment rates fell by 16% and 29%. In Canada, cornerstone measures for tuberculosis elimination weakened during the Covid-19 pandemic. Collapse Key Words Covid-19 cascade of care diagnosis latent tuberculosis treatment tuberculosis Collapse MESH Headings Collapse Grants Collapse
14	Binding of regulatory proteins to nucleosomes is modulated by dynamic histone tails. Nat Commun 2021;12:5280. [PMID: 34489435 PMCID: PMC8421395 DOI: 10.1038/s41467-021-25568-6] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Accepted: 08/17/2021] [Indexed: 12/19/2022] Open Abstract Little is known about the roles of histone tails in modulating nucleosomal DNA accessibility and its recognition by other macromolecules. Here we generate extensive atomic level conformational ensembles of histone tails in the context of the full nucleosome, totaling 65 microseconds of molecular dynamics simulations. We observe rapid conformational transitions between tail bound and unbound states, and characterize kinetic and thermodynamic properties of histone tail-DNA interactions. Different histone types exhibit distinct binding modes to specific DNA regions. Using a comprehensive set of experimental nucleosome complexes, we find that the majority of them target mutually exclusive regions with histone tails on nucleosomal/linker DNA around the super-helical locations ± 1, ± 2, and ± 7, and histone tails H3 and H4 contribute most to this process. These findings are explained within competitive binding and tail displacement models. Finally, we demonstrate the crosstalk between different histone tail post-translational modifications and mutations; those which change charge, suppress tail-DNA interactions and enhance histone tail dynamics and DNA accessibility. The intrinsic disorder of histone tails poses challenges in their characterization. Here the authors apply extensive molecular dynamics simulations of the full nucleosome to show reversible binding to DNA with specific binding modes of different types of histone tails, where charge-altering modifications suppress tail-DNA interactions and may boost interactions between nucleosomes and nucleosome-binding proteins. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
15	Increasing concentration of COVID-19 by socioeconomic determinants and geography in Toronto, Canada: an observational study. Ann Epidemiol 2021;65:84-92. [PMID: 34320380 PMCID: PMC8730782 DOI: 10.1016/j.annepidem.2021.07.007] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 07/15/2021] [Accepted: 07/18/2021] [Indexed: 12/31/2022] Abstract BACKGROUND Inequities in the burden of COVID-19 were observed early in Canada and around the world suggesting economically marginalized communities faced disproportionate risks. However, there has been limited systematic assessment of how heterogeneity in risks has evolved in large urban centers over time. PURPOSE To address this gap, we quantified the magnitude of risk heterogeneity in Toronto, Ontario from January-November, 2020 using a retrospective, population-based observational study using surveillance data. METHODS We generated epidemic curves by social determinants of health (SDOH) and crude Lorenz curves by neighbourhoods to visualize inequities in the distribution of COVID-19 and estimated Gini coefficients. We examined the correlation between SDOH using Pearson-correlation coefficients. RESULTS Gini coefficient of cumulative cases by population size was 0.41 (95% confidence interval [CI]:0.36-0.47) and estimated for: household income (0.20, 95%CI: 0.14-0.28); visible minority (0.21, 95%CI:0.16-0.28); recent immigration (0.12, 95%CI:0.09-0.16); suitable housing (0.21, 95%CI:0.14-0.30); multi-generational households (0.19, 95%CI:0.15-0.23); and essential workers (0.28, 95%CI:0.23-0.34). CONCLUSIONS There was rapid epidemiologic transition from higher to lower income neighbourhoods with Lorenz curve transitioning from below to above the line of equality across SDOH. Moving forward necessitates integrating programs and policies addressing socioeconomic inequities and structural racism into COVID-19 prevention and vaccination programs. Collapse Key Words COVID-19 Gini coefficients Lorenz curves SARS-CoV-2 disease transmission health inequity social determinants of health Collapse MESH Headings Collapse Grants Collapse
16	A model of active transcription hubs that unifies the roles of active promoters and enhancers. Nucleic Acids Res 2021;49:4493-4505. [PMID: 33872375 DOI: 10.1093/nar/gkab235] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2020] [Revised: 01/27/2021] [Accepted: 03/22/2021] [Indexed: 12/31/2022] Open Abstract An essential questions of gene regulation is how large number of enhancers and promoters organize into gene regulatory loops. Using transcription-factor binding enrichment as an indicator of enhancer strength, we identified a portion of H3K27ac peaks as potentially strong enhancers and found a universal pattern of promoter and enhancer distribution: At actively transcribed regions of length of ∼200-300 kb, the numbers of active promoters and enhancers are inversely related. Enhancer clusters are associated with isolated active promoters, regardless of the gene's cell-type specificity. As the number of nearby active promoters increases, the number of enhancers decreases. At regions where multiple active genes are closely located, there are few distant enhancers. With Hi-C analysis, we demonstrate that the interactions among the regulatory elements (active promoters and enhancers) occur predominantly in clusters and multiway among linearly close elements and the distance between adjacent elements shows a preference of ∼30 kb. We propose a simple rule of spatial organization of active promoters and enhancers: Gene transcriptions and regulations mainly occur at local active transcription hubs contributed dynamically by multiple elements from linearly close enhancers and/or active promoters. The hub model can be represented with a flower-shaped structure and implies an enhancer-like role of active promoters. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
17	Histone tails as signaling antennas of chromatin. Curr Opin Struct Biol 2021;67:153-160. [PMID: 33279866 PMCID: PMC8096652 DOI: 10.1016/j.sbi.2020.10.018] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 10/07/2020] [Accepted: 10/18/2020] [Indexed: 12/19/2022] Abstract Histone tails, representing the N-terminal or C-terminal regions flanking the histone core, play essential roles in chromatin signaling networks. Intrinsic disorder of histone tails and their propensity for post-translational modifications allow them to serve as hubs in coordination of epigenetic processes within the nucleosomal context. Deposition of histone variants with distinct histone tail properties further enriches histone tails' repertoire in epigenetic signaling. Given the advances in experimental techniques and in silico modelling, we review the most recent data on histone tails' effects on nucleosome stability and dynamics, their function in regulating chromatin accessibility and folding. Finally, we discuss different molecular mechanisms to understand how histone tails are involved in nucleosome recognition by binding partners and formation of higher-order chromatin structures. Collapse Key Words histones nucleosome histone tail chromatin Collapse MESH Headings Chromatin/genetics Epigenesis, Genetic Histones/metabolism Nucleosomes Protein Processing, Post-Translational Collapse Grants Z01 LM000071 Intramural NIH HHS Z99 LM999999 Intramural NIH HHS ZIA LM090313 Intramural NIH HHS Collapse
18	Cohort profile: St. Michael's Hospital Tuberculosis Database (SMH-TB), a retrospective cohort of electronic health record data and variables extracted using natural language processing. PLoS One 2021;16:e0247872. [PMID: 33657184 PMCID: PMC7928444 DOI: 10.1371/journal.pone.0247872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Accepted: 02/16/2021] [Indexed: 12/01/2022] Open Abstract Background Tuberculosis (TB) is a major cause of death worldwide. TB research draws heavily on clinical cohorts which can be generated using electronic health records (EHR), but granular information extracted from unstructured EHR data is limited. The St. Michael’s Hospital TB database (SMH-TB) was established to address gaps in EHR-derived TB clinical cohorts and provide researchers and clinicians with detailed, granular data related to TB management and treatment. Methods We collected and validated multiple layers of EHR data from the TB outpatient clinic at St. Michael’s Hospital, Toronto, Ontario, Canada to generate the SMH-TB database. SMH-TB contains structured data directly from the EHR, and variables generated using natural language processing (NLP) by extracting relevant information from free-text within clinic, radiology, and other notes. NLP performance was assessed using recall, precision and F₁ score averaged across variable labels. We present characteristics of the cohort population using binomial proportions and 95% confidence intervals (CI), with and without adjusting for NLP misclassification errors. Results SMH-TB currently contains retrospective patient data spanning 2011 to 2018, for a total of 3298 patients (N = 3237 with at least 1 associated dictation). Performance of TB diagnosis and medication NLP rulesets surpasses 93% in recall, precision and F₁ metrics, indicating good generalizability. We estimated 20% (95% CI: 18.4–21.2%) were diagnosed with active TB and 46% (95% CI: 43.8–47.2%) were diagnosed with latent TB. After adjusting for potential misclassification, the proportion of patients diagnosed with active and latent TB was 18% (95% CI: 16.8–19.7%) and 40% (95% CI: 37.8–41.6%) respectively Conclusion SMH-TB is a unique database that includes a breadth of structured data derived from structured and unstructured EHR data by using NLP rulesets. The data are available for a variety of research applications, such as clinical epidemiology, quality improvement and mathematical modeling studies. Collapse Key Words Collapse MESH Headings Databases, Factual Electronic Health Records Female Hospitals Humans Information Storage and Retrieval Male Natural Language Processing Ontario/epidemiology Retrospective Studies Tuberculosis/diagnosis Tuberculosis/epidemiology Collapse Grants Government of Ontario Collapse
19	Transcriptome annotation in the cloud: complexity, best practices, and cost. Gigascience 2021;10:6123656. [PMID: 33511996 DOI: 10.1093/gigascience/giaa163] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Revised: 11/13/2020] [Accepted: 12/23/2020] [Indexed: 01/22/2023] Open Abstract BACKGROUND The NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) initiative provides NIH-funded researchers cost-effective access to commercial cloud providers, such as Amazon Web Services (AWS) and Google Cloud Platform (GCP). These cloud providers represent an alternative for the execution of large computational biology experiments like transcriptome annotation, which is a complex analytical process that requires the interrogation of multiple biological databases with several advanced computational tools. The core components of annotation pipelines published since 2012 are BLAST sequence alignments using annotated databases of both nucleotide or protein sequences almost exclusively with networked on-premises compute systems. FINDINGS We compare multiple BLAST sequence alignments using AWS and GCP. We prepared several Jupyter Notebooks with all the code required to submit computing jobs to the batch system on each cloud provider. We consider the consequence of the number of query transcripts in input files and the effect on cost and processing time. We tested compute instances with 16, 32, and 64 vCPUs on each cloud provider. Four classes of timing results were collected: the total run time, the time for transferring the BLAST databases to the instance local solid-state disk drive, the time to execute the CWL script, and the time for the creation, set-up, and release of an instance. This study aims to establish an estimate of the cost and compute time needed for the execution of multiple BLAST runs in a cloud environment. CONCLUSIONS We demonstrate that public cloud providers are a practical alternative for the execution of advanced computational biology experiments at low cost. Using our cloud recipes, the BLAST alignments required to annotate a transcriptome with ∼500,000 transcripts can be processed in <2 hours with a compute cost of ∼$200-$250. In our opinion, for BLAST-based workflows, the choice of cloud platform is not dependent on the workflow but, rather, on the specific details and requirements of the cloud provider. These choices include the accessibility for institutional use, the technical knowledge required for effective use of the platform services, and the availability of open source frameworks such as APIs to deploy the workflow. Collapse Key Words Collapse MESH Headings Cloud Computing Computational Biology Databases, Factual Software Transcriptome Workflow Collapse Grants Collapse
20	COG database update: focus on microbial diversity, model organisms, and widespread pathogens. Nucleic Acids Res 2021;49:D274-D281. [PMID: 33167031 DOI: 10.1093/nar/gkaa1018] [Citation(s) in RCA: 325] [Impact Index Per Article: 108.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Revised: 10/13/2020] [Accepted: 10/14/2020] [Indexed: 12/20/2022] Open Abstract The Clusters of Orthologous Genes (COG) database, also referred to as the Clusters of Orthologous Groups of proteins, was created in 1997 and went through several rounds of updates, most recently, in 2014. The current update, available at https://www.ncbi.nlm.nih.gov/research/COG, substantially expands the scope of the database to include complete genomes of 1187 bacteria and 122 archaea, typically, with a single genome per genus. In addition, the current version of the COGs includes the following new features: (i) the recently deprecated NCBI's gene index (gi) numbers for the encoded proteins are replaced with stable RefSeq or GenBank\ENA\DDBJ coding sequence (CDS) accession numbers; (ii) COG annotations are updated for >200 newly characterized protein families with corresponding references and PDB links, where available; (iii) lists of COGs grouped by pathways and functional systems are added; (iv) 266 new COGs for proteins involved in CRISPR-Cas immunity, sporulation in Firmicutes and photosynthesis in cyanobacteria are included; and (v) the database is made available as a web page, in addition to FTP. The current release includes 4877 COGs. Future plans include further expansion of the COG collection by adding archaeal COGs (arCOGs), splitting the COGs containing multiple paralogs, and continued refinement of COG annotations. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
21	PM4NGS, a project management framework for next-generation sequencing data analysis. Gigascience 2021;10:giaa141. [PMID: 33410471 PMCID: PMC7788391 DOI: 10.1093/gigascience/giaa141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Revised: 10/14/2020] [Accepted: 11/16/2020] [Indexed: 11/14/2022] Open Abstract BACKGROUND FAIR (Findability, Accessibility, Interoperability, and Reusability) next-generation sequencing (NGS) data analysis relies on complex computational biology workflows and pipelines to guarantee reproducibility, portability, and scalability. Moreover, workflow languages, managers, and container technologies have helped address the problem of data analysis pipeline execution across multiple platforms in scalable ways. FINDINGS Here, we present a project management framework for NGS data analysis called PM4NGS. This framework is composed of an automatic creation of a standard organizational structure of directories and files, bioinformatics tool management using Docker or Bioconda, and data analysis pipelines in CWL format. Pre-configured Jupyter notebooks with minimum Python code are included in PM4NGS to produce a project report and publication-ready figures. We present 3 pipelines for demonstration purposes including the analysis of RNA-Seq, ChIP-Seq, and ChIP-exo datasets. CONCLUSIONS PM4NGS is an open source framework that creates a standard organizational structure for NGS data analysis projects. PM4NGS is easy to install, configure, and use by non-bioinformaticians on personal computers and laptops. It permits execution of the NGS data analysis on Windows 10 with the Windows Subsystem for Linux feature activated. The framework aims to reduce the gap between researcher in experimental laboratories producing NGS data and workflows for data analysis. PM4NGS documentation can be accessed at https://pm4ngs.readthedocs.io/. Collapse Key Words ChIP-Seq ChIP-exo FAIR, RNA-Seq NGS pipelines NGS sequence analysis open source frameworks Collapse MESH Headings Computational Biology Data Analysis High-Throughput Nucleotide Sequencing Reproducibility of Results Software Collapse Grants National Institutes of Health U.S. National Library of Medicine Collapse
22	Data sets on human histone interaction networks. Data Brief 2020;33:106555. [PMID: 33299912 PMCID: PMC7701981 DOI: 10.1016/j.dib.2020.106555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Revised: 11/13/2020] [Accepted: 11/16/2020] [Indexed: 11/28/2022] Open Abstract Here, we present the data of human histone interactomes generated and analysed in the research article by Peng et al., 2020 [1]. The histone interactome data provide a comprehensive mapping of human histone/nucleosome interaction networks by using different data sources from the structural, chemical cross-linking, and high-throughput studies. The histone interactions are presented at different levels of granularity in networks, including protein, domain, and residue-levels. All human histone interactome Cytoscape session files are available at https://github.com/Panchenko-Lab/Human-histone-interactome. Collapse Key Words Histone interaction Histone variant Interaction network Interactome Nucleosome interaction Collapse MESH Headings Collapse Grants Collapse
23	Human Histone Interaction Networks: An Old Concept, New Trends. J Mol Biol 2020;433:166684. [PMID: 33098859 DOI: 10.1016/j.jmb.2020.10.018] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 10/12/2020] [Accepted: 10/13/2020] [Indexed: 12/19/2022] Abstract To elucidate the properties of human histone interactions on the large scale, we perform a comprehensive mapping of human histone interaction networks by using data from structural, chemical cross-linking and various high-throughput studies. Histone interactomes derived from different data sources show limited overlap and complement each other. It inspires us to integrate these data into the combined histone global interaction network which includes 5308 proteins and 10,330 interactions. The analysis of topological properties of the human histone interactome reveals its scale free behavior and high modularity. Our study of histone binding interfaces uncovers a remarkably high number of residues involved in interactions between histones and non-histone proteins, 80-90% of residues in histones H3 and H4 have at least one binding partner. Two types of histone binding modes are detected: interfaces conserved in most histone variants and variant specific interfaces. Finally, different types of chromatin factors recognize histones in nucleosomes via distinct binding modes, and many of these interfaces utilize acidic patches among other sites. Interaction networks are available at https://github.com/Panchenko-Lab/Human-histone-interactome. Collapse Key Words histone interaction interactome network nucleosome Collapse MESH Headings Binding Sites Chromosomal Proteins, Non-Histone/chemistry Chromosomal Proteins, Non-Histone/genetics Chromosomal Proteins, Non-Histone/metabolism DNA/chemistry DNA/genetics DNA/metabolism Databases, Protein Histones/chemistry Histones/genetics Histones/metabolism Humans Internet Nucleic Acid Conformation Nucleosomes/chemistry Nucleosomes/metabolism Nucleosomes/ultrastructure Protein Binding Protein Conformation, alpha-Helical Protein Conformation, beta-Strand Protein Interaction Domains and Motifs Protein Interaction Maps Software Collapse Grants Z01 LM000071 Intramural NIH HHS Collapse
24	Heterogeneity in testing, diagnosis and outcome in SARS-CoV-2 infection across outbreak settings in the Greater Toronto Area, Canada: an observational study. CMAJ Open 2020;8:E627-E636. [PMID: 33037070 PMCID: PMC7567509 DOI: 10.9778/cmajo.20200213] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open Abstract BACKGROUND Congregate settings have been disproportionately affected by coronavirus disease 2019 (COVID-19). Our objective was to compare testing for, diagnosis of and death after severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection across 3 settings (residents of long-term care homes, people living in shelters and the rest of the population). METHODS We conducted a population-based prospective cohort study involving individuals tested for SARS-CoV-2 in the Greater Toronto Area between Jan. 23, 2020, and May 20, 2020. We sourced person-level data from COVID-19 surveillance and reporting systems in Ontario. We calculated cumulatively diagnosed cases per capita, proportion tested, proportion tested positive and case-fatality proportion for each setting. We estimated the age- and sex-adjusted rate ratios associated with setting for test positivity and case fatality using quasi-Poisson regression. RESULTS Over the study period, a total of 173 092 individuals were tested for and 16 490 individuals were diagnosed with SARS-CoV-2 infection. We observed a shift in the proportion of cumulative cases from all cases being related to travel to cases in residents of long-term care homes (20.4% [3368/16 490]), shelters (2.3% [372/16 490]), other congregate settings (20.9% [3446/16 490]) and community settings (35.4% [5834/16 490]), with cumulative travel-related cases at 4.1% (674/16490). Cumulatively, compared with the rest of the population, the diagnosed cases per capita was 64-fold and 19-fold higher among long-term care home and shelter residents, respectively. By May 20, 2020, 76.3% (21 617/28 316) of long-term care home residents and 2.2% (150 077/6 808 890) of the rest of the population had been tested. After adjusting for age and sex, residents of long-term care homes were 2.4 (95% confidence interval [CI] 2.2-2.7) times more likely to test positive, and those who received a diagnosis of COVID-19 were 1.4-fold (95% CI 1.1-1.8) more likely to die than the rest of the population. INTERPRETATION Long-term care homes and shelters had disproportionate diagnosed cases per capita, and residents of long-term care homes diagnosed with COVID-19 had higher case fatality than the rest of the population. Heterogeneity across micro-epidemics among specific populations and settings may reflect underlying heterogeneity in transmission risks, necessitating setting-specific COVID-19 prevention and mitigation strategies. Collapse Key Words Collapse MESH Headings Adult Aged Aged, 80 and over COVID-19/diagnosis COVID-19/epidemiology COVID-19/transmission COVID-19/virology COVID-19 Testing/methods COVID-19 Testing/statistics & numerical data Canada/epidemiology Disease Outbreaks/prevention & control Female Ill-Housed Persons/statistics & numerical data Humans Long-Term Care/statistics & numerical data Male Middle Aged Outcome Assessment, Health Care Prospective Studies SARS-CoV-2/genetics Travel/statistics & numerical data Travel-Related Illness Collapse Grants MC_PC_19012 Medical Research Council MR/R015600/1 Medical Research Council Collapse
25	Kin28 depletion increases association of TFIID subunits Taf1 and Taf4 with promoters in Saccharomyces cerevisiae. Nucleic Acids Res 2020;48:4244-4255. [PMID: 32182349 DOI: 10.1093/nar/gkaa165] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2019] [Revised: 02/07/2020] [Accepted: 03/04/2020] [Indexed: 01/31/2023] Open Abstract Transcription of eukaryotic mRNA-encoding genes by RNA polymerase II (Pol II) begins with assembly of the pre-initiation complex (PIC), comprising Pol II and the general transcription factors. Although the pathway of PIC assembly is well established, the mechanism of assembly and the dynamics of PIC components are not fully understood. For example, only recently has it been shown that in yeast, the Mediator complex normally occupies promoters only transiently, but shows increased association when Pol II promoter escape is inhibited. Here we show that two subunits of TFIID, Taf1 and Taf4, similarly show increased occupancy as measured by ChIP upon depletion or inactivation of Kin28. In contrast, TBP occupancy is unaffected by depletion of Kin28, thus revealing an uncoupling of Taf and TBP occupancy during the transcription cycle. Increased Taf1 occupancy upon Kin28 depletion is suppressed by depletion of TBP, while depletion of TBP in the presence of Kin28 has little effect on Taf1 occupancy. The increase in Taf occupancy upon depletion of Kin28 is more pronounced at TFIID-dominated promoters compared to SAGA-dominated promoters. Our results support the suggestion, based on recent structural studies, that TFIID may not remain bound to gene promoters through the transcription initiation cycle. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
26	Estimated surge in hospital and intensive care admission because of the coronavirus disease 2019 pandemic in the Greater Toronto Area, Canada: a mathematical modelling study. CMAJ Open 2020;8:E593-E604. [PMID: 32963024 PMCID: PMC7641231 DOI: 10.9778/cmajo.20200093] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open Abstract BACKGROUND In pandemics, local hospitals need to anticipate a surge in health care needs. We examined the modelled surge because of the coronavirus disease 2019 (COVID-19) pandemic that was used to inform the early hospital-level response against cases as they transpired. METHODS To estimate hospital-level surge in March and April 2020, we simulated a range of scenarios of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spread in the Greater Toronto Area (GTA), Canada, using the best available data at the time. We applied outputs to hospital-specific data to estimate surge over 6 weeks at 2 hospitals (St. Michael's Hospital and St. Joseph's Health Centre). We examined multiple scenarios, wherein the default (R0 = 2.4) resembled the early trajectory (to Mar. 25, 2020), and compared the default model projections with observed COVID-19 admissions in each hospital from Mar. 25 to May 6, 2020. RESULTS For the hospitals to remain below non-ICU bed capacity, the default pessimistic scenario required a reduction in non-COVID-19 inpatient care by 38% and 28%, respectively, with St. Michael's Hospital requiring 40 new ICU beds and St. Joseph's Health Centre reducing its ICU beds for non-COVID-19 care by 6%. The absolute difference between default-projected and observed census of inpatients with COVID-19 at each hospital was less than 20 from Mar. 25 to Apr. 11; projected and observed cases diverged widely thereafter. Uncertainty in local epidemiological features was more influential than uncertainty in clinical severity. INTERPRETATION Scenario-based analyses were reliable in estimating short-term cases, but would require frequent re-analyses. Distribution of the city's surge was expected to vary across hospitals, and community-level strategies were key to mitigating each hospital's surge. Collapse Key Words Collapse MESH Headings COVID-19/diagnosis COVID-19/epidemiology COVID-19/transmission COVID-19/virology Canada/epidemiology Forecasting/methods Health Services Needs and Demand/trends Hospitalization/statistics & numerical data Hospitals/statistics & numerical data Hospitals/supply & distribution Humans Inpatients/statistics & numerical data Intensive Care Units/statistics & numerical data Models, Theoretical SARS-CoV-2/genetics Surge Capacity/statistics & numerical data Collapse Grants Collapse
27	TPMCalculator: one-step software to quantify mRNA abundance of genomic features. Bioinformatics 2020;35:1960-1962. [PMID: 30379987 DOI: 10.1093/bioinformatics/bty896] [Citation(s) in RCA: 112] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Revised: 10/02/2018] [Accepted: 10/30/2018] [Indexed: 11/12/2022] Open Abstract SUMMARY The quantification of RNA sequencing (RNA-seq) abundance using a normalization method that calculates transcripts per million (TPM) is a key step to compare multiple samples from different experiments. TPMCalculator is a one-step software to process RNA-seq alignments in BAM format and reports TPM values, raw read counts and feature lengths for genes, transcripts, exons and introns. The program describes the genomic features through a model generated from the gene transfer format file used during alignments reporting of the TPM values and the raw read counts for each feature. In this paper, we show the correlation for 1256 samples from the TCGA-BRCA project between TPM and FPKM reported by TPMCalculator and RSeQC. We also show the correlation for raw read counts reported by TPMCalculator, HTSeq and featureCounts. AVAILABILITY AND IMPLEMENTATION TPMCalculator is freely available at https://github.com/ncbi/TPMCalculator. It is implemented in C++14 and supported on Mac OS X, Linux and MS Windows. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
28	BAMscale: quantification of next-generation sequencing peaks and generation of scaled coverage tracks. Epigenetics Chromatin 2020;13:21. [PMID: 32321568 PMCID: PMC7175505 DOI: 10.1186/s13072-020-00343-x] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Accepted: 04/11/2020] [Indexed: 12/12/2022] Open Abstract Background Next-generation sequencing allows genome-wide analysis of changes in chromatin states and gene expression. Data analysis of these increasingly used methods either requires multiple analysis steps, or extensive computational time. We sought to develop a tool for rapid quantification of sequencing peaks from diverse experimental sources and an efficient method to produce coverage tracks for accurate visualization that can be intuitively displayed and interpreted by experimentalists with minimal bioinformatics background. We demonstrate its strength and usability by integrating data from several types of sequencing approaches. Results We have developed BAMscale, a one-step tool that processes a wide set of sequencing datasets. To demonstrate the usefulness of BAMscale, we analyzed multiple sequencing datasets from chromatin immunoprecipitation sequencing data (ChIP-seq), chromatin state change data (assay for transposase-accessible chromatin using sequencing: ATAC-seq, DNA double-strand break mapping sequencing: END-seq), DNA replication data (Okazaki fragments sequencing: OK-seq, nascent-strand sequencing: NS-seq, single-cell replication timing sequencing: scRepli-seq) and RNA-seq data. The outputs consist of raw and normalized peak scores (multiple normalizations) in text format and scaled bigWig coverage tracks that are directly accessible to data visualization programs. BAMScale also includes a visualization module facilitating direct, on-demand quantitative peak comparisons that can be used by experimentalists. Our tool can effectively analyze large sequencing datasets (~ 100 Gb size) in minutes, outperforming currently available tools. Conclusions BAMscale accurately quantifies and normalizes identified peaks directly from BAM files, and creates coverage tracks for visualization in genome browsers. BAMScale can be implemented for a wide set of methods for calculating coverage tracks, including ChIP-seq and ATAC-seq, as well as methods that currently require specialized, separate tools for analyses, such as splice-aware RNA-seq, END-seq and OK-seq for which no dedicated software is available. BAMscale is freely available on github (https://github.com/ncbi/BAMscale). Collapse Key Words ATAC-seq ChIP-seq Expression Histone modifications NS-seq RNA-seq Replication origins Replication timing SLFN11 Collapse MESH Headings Collapse Grants Collapse
29	Exploring Interactions of Nucleosome via Interactome Analysis and Integrative Modeling. Biophys J 2020. [DOI: 10.1016/j.bpj.2019.11.2166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
30	Overdispersion models for correlated multinomial data: Applications to blinding assessment. Stat Med 2019;38:4963-4976. [PMID: 31460677 DOI: 10.1002/sim.8344] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Revised: 05/17/2019] [Accepted: 07/21/2019] [Indexed: 11/08/2022] Abstract Overdispersion models have been extensively studied for correlated normal and binomial data but much less so for correlated multinomial data. In this work, we describe a multinomial overdispersion model that leads to the specification of the first two moments of the outcome and allows the estimation of the global parameters using generalized estimating equations (GEE). We introduce a Global Blinding Index as a target parameter and illustrate the application of the GEE method to its estimation from (1) a clinical trial with clustering by practitioner and (2) a meta-analysis on psychiatric disorders. We examine the impact of a small number of clusters, high variability in cluster sizes, and the magnitude of the intraclass correlation on the performance of the GEE estimators of the Global Blinding Index using the data simulated from different models. We compare these estimators with the inverse-variance weighted estimators and a maximum-likelihood estimator, derived under the Dirichlet-multinomial model. Our results indicate that the performance of the GEE estimators was satisfactory even in situations with a small number of clusters, whereas the inverse-variance weighted estimators performed poorly, especially for larger values of the intraclass correlation coefficient. Our findings and illustrations may be instrumental for practitioners who analyze clustered multinomial data from clinical trials and/or meta-analysis. Collapse Key Words Dirichlet-multinomial GEE blinding index meta-analysis Collapse MESH Headings Collapse Grants Collapse
31	Banana (Musa acuminata) transcriptome profiling in response to rhizobacteria: Bacillus amyloliquefaciens Bs006 and Pseudomonas fluorescens Ps006. BMC Genomics 2019;20:378. [PMID: 31088352 PMCID: PMC6518610 DOI: 10.1186/s12864-019-5763-5] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Accepted: 05/02/2019] [Indexed: 12/19/2022] Open Abstract Background Banana is one of the most important crops in tropical and sub-tropical regions. To meet the demands of international markets, banana plantations require high amounts of chemical fertilizers which translate into high farming costs and are hazardous to the environment when used excessively. Beneficial free-living soil bacteria that colonize the rhizosphere are known as plant growth-promoting rhizobacteria (PGPR). PGPR affect plant growth in direct or indirect ways and hold great promise for sustainable agriculture. Results PGPR of the genera Bacillus and Pseudomonas in banana cv. Williams were evaluated. These plants were produced through in vitro culture and inoculated individually with two rhizobacteria, Bacillus amyloliquefaciens strain Bs006 and Pseudomonas fluorescens strain Ps006. Control plants without microbial inoculum were also evaluated. These plants were kept in a controlled climate growth room with conditions required to favor plant-microorganism interactions. These interactions were evaluated at 1-, 48- and 96-h using transcriptome sequencing after inoculation to establish differentially expressed genes (DEGs) in plants elicited by the interaction with the two rhizobacteria. Additionally, droplet digital PCR was performed at 1, 48, 96 h, and also at 15 and 30 days to validate the expression patterns of selected DEGs. The banana cv. Williams transcriptome reported differential expression in a large number of genes of which 22 were experimentally validated. Genes validated experimentally correspond to growth promotion and regulation of specific functions (flowering, photosynthesis, glucose catabolism and root growth) as well as plant defense genes. This study focused on the analysis of 18 genes involved in growth promotion, defense and response to biotic or abiotic stress. Conclusions Differences in banana gene expression profiles in response to the rhizobacteria evaluated here (Bacillus amyloliquefaciens Bs006 and Pseudomonas fluorescens Ps006) are influenced by separate bacterial colonization processes and levels that stimulate distinct groups of genes at various points in time. Electronic supplementary material The online version of this article (10.1186/s12864-019-5763-5) contains supplementary material, which is available to authorized users. Collapse Key Words Bacillus amyloliquefaciens Bs006 Banana cv. Williams Genes Musa acuminata Plant growth promoting rhizobacteria (PGPR) Pseudomonas fluorescens Ps006 Transcriptome Collapse MESH Headings Collapse Grants Collapse
32	Molecular recognition of nucleosomes by binding partners. Curr Opin Struct Biol 2019;56:164-170. [PMID: 30991239 DOI: 10.1016/j.sbi.2019.03.010] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Revised: 03/01/2019] [Accepted: 03/07/2019] [Indexed: 12/20/2022] Abstract Nucleosomes represent the elementary units of chromatin packing and hubs in epigenetic signaling pathways. Across the chromatin and over the lifetime of the eukaryotic cell, nucleosomes experience a broad repertoire of alterations that affect their structure and binding with various chromatin factors. Dynamics of the histone core, nucleosomal and linker DNA, and intrinsic disorder of histone tails add further complexity to the nucleosome interaction landscape. In light of our understanding through the growing number of experimental and computational studies, we review the emerging patterns of molecular recognition of nucleosomes by their binding partners and assess the basic mechanisms of its regulation. Collapse Key Words Collapse MESH Headings Humans Intrinsically Disordered Proteins/metabolism Nucleosomes/metabolism Collapse Grants ZIA LM090313-01 Intramural NIH HHS Collapse
33	Role of the pre-initiation complex in Mediator recruitment and dynamics. eLife 2018;7:39633. [PMID: 30540252 PMCID: PMC6322861 DOI: 10.7554/elife.39633] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Accepted: 12/12/2018] [Indexed: 12/19/2022] Open Abstract The Mediator complex stimulates the cooperative assembly of a pre-initiation complex (PIC) and recruitment of RNA Polymerase II (Pol II) for gene activation. The core Mediator complex is organized into head, middle, and tail modules, and in budding yeast (Saccharomyces cerevisiae), Mediator recruitment has generally been ascribed to sequence-specific activators engaging the tail module triad of Med2-Med3-Med15 at upstream activating sequences (UASs). We show that yeast lacking Med2-Med3-Med15 are viable and that Mediator and PolII are recruited to promoters genome-wide in these cells, albeit at reduced levels. To test whether Mediator might alternatively be recruited via interactions with the PIC, we examined Mediator association genome-wide after depleting PIC components. We found that depletion of Taf1, Rpb3, and TBP profoundly affected Mediator association at active gene promoters, with TBP being critical for transit of Mediator from UAS to promoter, while Pol II and Taf1 stabilize Mediator association at proximal promoters. Collapse Key Words ChIP-seq S. cerevisiae TATA-binding protein chromosomes gene expression genetics genomics mediator pre-initiation complex transcription factors yeast Collapse MESH Headings Collapse Grants Collapse
34	Binding of HMGN proteins to cell specific enhancers stabilizes cell identity. Nat Commun 2018;9:5240. [PMID: 30532006 PMCID: PMC6286339 DOI: 10.1038/s41467-018-07687-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Accepted: 11/15/2018] [Indexed: 01/10/2023] Open Abstract The dynamic nature of the chromatin epigenetic landscape plays a key role in the establishment and maintenance of cell identity, yet the factors that affect the dynamics of the epigenome are not fully known. Here we find that the ubiquitous nucleosome binding proteins HMGN1 and HMGN2 preferentially colocalize with epigenetic marks of active chromatin, and with cell-type specific enhancers. Loss of HMGNs enhances the rate of OSKM induced reprogramming of mouse embryonic fibroblasts (MEFs) into induced pluripotent stem cells (iPSCs), and the ASCL1 induced conversion of fibroblast into neurons. During transcription factor induced reprogramming to pluripotency, loss of HMGNs accelerates the erasure of the MEF-specific epigenetic landscape and the establishment of an iPSCs-specific chromatin landscape, without affecting the pluripotency potential and the differentiation potential of the reprogrammed cells. Thus, HMGN proteins modulate the plasticity of the chromatin epigenetic landscape thereby stabilizing, rather than determining cell identity. HMGN1 and HMGN2 are ubiquitous nucleosome binding proteins. Here the authors provide evidence that HMGN proteins preferentially localize to chromatin regulatory sites to modulate the plasticity of the epigenetic landscape, proposing that HGMNs stabilize, rather than determine, cell identity. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
35	Structural interpretation of DNA-protein hydroxyl-radical footprinting experiments with high resolution using HYDROID. Nat Protoc 2018;13:2535-2556. [PMID: 30341436 PMCID: PMC6322412 DOI: 10.1038/s41596-018-0048-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Abstract Hydroxyl-radical footprinting (HRF) is a powerful method for probing structures of nucleic acid-protein complexes with single-nucleotide resolution in solution. To tap the full quantitative potential of HRF, we describe a protocol, hydroxyl-radical footprinting interpretation for DNA (HYDROID), to quantify HRF data and integrate them with atomistic structural models. The stages of the HYDROID protocol are extraction of the lane profiles from gel images, quantification of the DNA cleavage frequency at each nucleotide and theoretical estimation of the DNA cleavage frequency from atomistic structural models, followed by comparison of experimental and theoretical results. Example scripts for each step of HRF data analysis and interpretation are provided for several nucleosome systems; they can be easily adapted to analyze user data. As input, HYDROID requires polyacrylamide gel electrophoresis (PAGE) images of HRF products and optionally can use a molecular model of the DNA-protein complex. The HYDROID protocol can be used to quantify HRF over DNA regions of up to 100 nucleotides per gel image. In addition, it can be applied to the analysis of RNA-protein complexes and free RNA or DNA molecules in solution. Compared with other methods reported to date, HYDROID is unique in its ability to simultaneously integrate HRF data with the analysis of atomistic structural models. HYDROID is freely available. The complete protocol takes ~3 h. Users should be familiar with the command-line interface, the Python scripting language and Protein Data Bank (PDB) file formats. A graphical user interface (GUI) with basic functionality (HYDROID_GUI) is also available. Collapse Key Words nucleic acid footprinting hydroxyl radicals dna-protein complexes nucleic acids cleavage page gel image quantification molecular modeling solvent accessible surface area Collapse MESH Headings DNA/chemistry DNA/metabolism DNA Cleavage DNA Footprinting/methods DNA Footprinting/statistics & numerical data Electrophoresis, Polyacrylamide Gel/statistics & numerical data Humans Hydroxyl Radical/chemistry Models, Molecular Nucleosomes/chemistry Nucleosomes/metabolism Protein Footprinting/methods Protein Footprinting/statistics & numerical data Proteins/chemistry Proteins/metabolism Software Solutions Collapse Grants R01 GM119398 NIGMS NIH HHS Z01 LM000071-13 Intramural NIH HHS R21 CA220151 NCI NIH HHS P50 DE019032 NIDCR NIH HHS R21 DE025398 NIDCR NIH HHS Collapse
36	The Mediator co-activator complex regulates Ty1 retromobility by controlling the balance between Ty1i and Ty1 promoters. PLoS Genet 2018;14:e1007232. [PMID: 29462141 PMCID: PMC5834202 DOI: 10.1371/journal.pgen.1007232] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Revised: 03/02/2018] [Accepted: 01/30/2018] [Indexed: 12/24/2022] Open Abstract The Ty1 retrotransposons present in the genome of Saccharomyces cerevisiae belong to the large class of mobile genetic elements that replicate via an RNA intermediary and constitute a significant portion of most eukaryotic genomes. The retromobility of Ty1 is regulated by numerous host factors, including several subunits of the Mediator transcriptional co-activator complex. In spite of its known function in the nucleus, previous studies have implicated Mediator in the regulation of post-translational steps in Ty1 retromobility. To resolve this paradox, we systematically examined the effects of deleting non-essential Mediator subunits on the frequency of Ty1 retromobility and levels of retromobility intermediates. Our findings reveal that loss of distinct Mediator subunits alters Ty1 retromobility positively or negatively over a >10,000-fold range by regulating the ratio of an internal transcript, Ty1i, to the genomic Ty1 transcript. Ty1i RNA encodes a dominant negative inhibitor of Ty1 retromobility that blocks virus-like particle maturation and cDNA synthesis. These results resolve the conundrum of Mediator exerting sweeping control of Ty1 retromobility with only minor effects on the levels of Ty1 genomic RNA and the capsid protein, Gag. Since the majority of characterized intrinsic and extrinsic regulators of Ty1 retromobility do not appear to effect genomic Ty1 RNA levels, Mediator could play a central role in integrating signals that influence Ty1i expression to modulate retromobility. Retrotransposons are mobile genetic elements that copy their RNA genomes into DNA and insert the DNA copies into the host genome. These elements contribute to genome instability, control of host gene expression and adaptation to changing environments. Retrotransposons depend on numerous host factors for their own propagation and control. The retrovirus-like retrotransposon, Ty1, in the yeast Saccharomyces cerevisiae has been an invaluable model for retrotransposon research, and hundreds of host factors that regulate Ty1 retrotransposition have been identified. Non-essential subunits of the Mediator transcriptional co-activator complex have been identified as one set of host factors implicated in Ty1 regulation. Here, we report a systematic investigation of the effects of loss of these non-essential subunits of Mediator on Ty1 retrotransposition. Our findings reveal a heretofore unknown mechanism by which Mediator influences the balance between transcription from two promoters in Ty1 to modulate expression of an autoinhibitory transcript known as Ty1i RNA. Our results provide new insights into host control of retrotransposon activity via promoter choice and elucidate a novel mechanism by which the Mediator co-activator governs this choice. Collapse Key Words Collapse MESH Headings Gene Expression Regulation Gene Products, gag/genetics Homeostasis/genetics Mediator Complex/physiology Mutagenesis, Insertional/genetics Organisms, Genetically Modified Promoter Regions, Genetic/genetics Recombination, Genetic/genetics Retroelements/genetics Saccharomyces cerevisiae/genetics Saccharomyces cerevisiae Proteins/genetics Collapse Grants R01 GM052072 NIGMS NIH HHS R29 GM052072 NIGMS NIH HHS GM52072 NIH HHS National Science Foundation National Institutes of Health U.S. National Library of Medicine Collapse
37	Workflow and web application for annotating NCBI BioProject transcriptome data. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2017;2017:3737827. [PMID: 28605765 PMCID: PMC5467576 DOI: 10.1093/database/bax008] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/26/2016] [Accepted: 01/24/2017] [Indexed: 01/08/2023] Abstract The volume of transcriptome data is growing exponentially due to rapid improvement of experimental technologies. In response, large central resources such as those of the National Center for Biotechnology Information (NCBI) are continually adapting their computational infrastructure to accommodate this large influx of data. New and specialized databases, such as Transcriptome Shotgun Assembly Sequence Database (TSA) and Sequence Read Archive (SRA), have been created to aid the development and expansion of centralized repositories. Although the central resource databases are under continual development, they do not include automatic pipelines to increase annotation of newly deposited data. Therefore, third-party applications are required to achieve that aim. Here, we present an automatic workflow and web application for the annotation of transcriptome data. The workflow creates secondary data such as sequencing reads and BLAST alignments, which are available through the web application. They are based on freely available bioinformatics tools and scripts developed in-house. The interactive web application provides a search engine and several browser utilities. Graphical views of transcript alignments are available through SeqViewer, an embedded tool developed by NCBI for viewing biological sequence data. The web application is tightly integrated with other NCBI web applications and tools to extend the functionality of data processing and interconnectivity. We present a case study for the species Physalis peruviana with data generated from BioProject ID 67621. Database URL: http://www.ncbi.nlm.nih.gov/projects/physalis/ Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
38	Hydroxyl-radical footprinting combined with molecular modeling identifies unique features of DNA conformation and nucleosome positioning. Nucleic Acids Res 2017;45:9229-9243. [PMID: 28934480 PMCID: PMC5765820 DOI: 10.1093/nar/gkx616] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Accepted: 07/05/2017] [Indexed: 01/08/2023] Open Abstract Nucleosomes are the most abundant protein–DNA complexes in eukaryotes that provide compaction of genomic DNA and are implicated in regulation of transcription, DNA replication and repair. The details of DNA positioning on the nucleosome and the DNA conformation can provide key regulatory signals. Hydroxyl-radical footprinting (HRF) of protein–DNA complexes is a chemical technique that probes nucleosome organization in solution with a high precision unattainable by other methods. In this work we propose an integrative modeling method for constructing high-resolution atomistic models of nucleosomes based on HRF experiments. Our method precisely identifies DNA positioning on nucleosome by combining HRF data for both DNA strands with the pseudo-symmetry constraints. We performed high-resolution HRF for Saccharomyces cerevisiae centromeric nucleosome of unknown structure and characterized it using our integrative modeling approach. Our model provides the basis for further understanding the cooperative engagement and interplay between Cse4p protein and the A-tracts important for centromere function. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
39	Molecular basis of CENP-C association with the CENP-A nucleosome at yeast centromeres. Genes Dev 2017;31:1958-1972. [PMID: 29074736 PMCID: PMC5710141 DOI: 10.1101/gad.304782.117] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Accepted: 10/05/2017] [Indexed: 12/16/2022] Abstract Histone CENP-A-containing nucleosomes play an important role in nucleating kinetochores at centromeres for chromosome segregation. However, the molecular mechanisms by which CENP-A nucleosomes engage with kinetochore proteins are not well understood. Here, we report the finding of a new function for the budding yeast Cse4/CENP-A histone-fold domain interacting with inner kinetochore protein Mif2/CENP-C. Strikingly, we also discovered that AT-rich centromere DNA has an important role for Mif2 recruitment. Mif2 contacts one side of the nucleosome dyad, engaging with both Cse4 residues and AT-rich nucleosomal DNA. Both interactions are directed by a contiguous DNA- and histone-binding domain (DHBD) harboring the conserved CENP-C motif, an AT hook, and RK clusters (clusters enriched for arginine-lysine residues). Human CENP-C has two related DHBDs that bind preferentially to DNA sequences of higher AT content. Our findings suggest that a DNA composition-based mechanism together with residues characteristic for the CENP-A histone variant contribute to the specification of centromere identity. Collapse Key Words AT-rich DNA Cse4/CENP-A Mif2/CENP-C budding yeast centromere nucleosome Collapse MESH Headings AT Rich Sequence Centromere/chemistry Centromere/metabolism Centromere Protein A/chemistry Centromere Protein A/metabolism Chromosomal Proteins, Non-Histone/chemistry Chromosomal Proteins, Non-Histone/metabolism DNA, Satellite/metabolism DNA-Binding Proteins/metabolism Dimerization Humans Models, Molecular Nucleosomes/chemistry Nucleosomes/metabolism Protein Binding Protein Structure, Tertiary Saccharomyces cerevisiae/chemistry Saccharomyces cerevisiae/metabolism Saccharomyces cerevisiae Proteins/metabolism Collapse Grants Howard Hughes Medical Institute National Cancer Institute National Institute of Diabetes and Digestive and Kidney Diseases Howard Hughes Medical Institute Janelia Research Campus Bloomberg Distinguished Professorship, Johns Hopkins University Collapse
40	SNPDelScore: combining multiple methods to score deleterious effects of noncoding mutations in the human genome. Bioinformatics 2017;34:289-291. [PMID: 28968739 DOI: 10.1093/bioinformatics/btx583] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Revised: 09/11/2017] [Accepted: 09/13/2017] [Indexed: 11/12/2022] Open Abstract SUMMARY Addressing deleterious effects of noncoding mutations is an essential step towards the identification of disease-causal mutations of gene regulatory elements. Several methods for quantifying the deleteriousness of noncoding mutations using artificial intelligence, deep learning and other approaches have been recently proposed. Although the majority of the proposed methods have demonstrated excellent accuracy on different test sets, there is rarely a consensus. In addition, advanced statistical and artificial learning approaches used by these methods make it difficult porting these methods outside of the labs that have developed them. To address these challenges and to transform the methodological advances in predicting deleterious noncoding mutations into a practical resource available for the broader functional genomics and population genetics communities, we developed SNPDelScore, which uses a panel of proposed methods for quantifying deleterious effects of noncoding mutations to precompute and compare the deleteriousness scores of all common SNPs in the human genome in 44 cell lines. The panel of deleteriousness scores of a SNP computed using different methods is supplemented by functional information from the GWAS Catalog, libraries of transcription factor-binding sites, and genic characteristics of mutations. SNPDelScore comes with a genome browser capable of displaying and comparing large sets of SNPs in a genomic locus and rapidly identifying consensus SNPs with the highest deleteriousness scores making those prime candidates for phenotype-causal polymorphisms. AVAILABILITY AND IMPLEMENTATION https://www.ncbi.nlm.nih.gov/research/snpdelscore/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
41	Quantifying deleterious effects of regulatory variants. Nucleic Acids Res 2017;45:2307-2317. [PMID: 27980060 PMCID: PMC5389506 DOI: 10.1093/nar/gkw1263] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2016] [Accepted: 12/05/2016] [Indexed: 12/13/2022] Open Abstract The majority of genome-wide association study (GWAS) risk variants reside in non-coding DNA sequences. Understanding how these sequence modifications lead to transcriptional alterations and cell-to-cell variability can help unraveling genotype-phenotype relationships. Here, we describe a computational method, dubbed CAPE, which calculates the likelihood of a genetic variant deactivating enhancers by disrupting the binding of transcription factors (TFs) in a given cellular context. CAPE learns sequence signatures associated with putative enhancers originating from large-scale sequencing experiments (such as ChIP-seq or DNase-seq) and models the change in enhancer signature upon a single nucleotide substitution. CAPE accurately identifies causative cis-regulatory variation including expression quantitative trait loci (eQTLs) and DNase I sensitivity quantitative trait loci (dsQTLs) in a tissue-specific manner with precision superior to several currently available methods. The presented method can be trained on any tissue-specific dataset of enhancers and known functional variants and applied to prioritize disease-associated variants in the corresponding tissue. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
42	MS_HistoneDB, a manually curated resource for proteomic analysis of human and mouse histones. Epigenetics Chromatin 2017;10:2. [PMID: 28096900 PMCID: PMC5223428 DOI: 10.1186/s13072-016-0109-x] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2016] [Accepted: 12/14/2016] [Indexed: 12/13/2022] Open Abstract BACKGROUND Histones and histone variants are essential components of the nuclear chromatin. While mass spectrometry has opened a large window to their characterization and functional studies, their identification from proteomic data remains challenging. Indeed, the current interpretation of mass spectrometry data relies on public databases which are either not exhaustive (Swiss-Prot) or contain many redundant entries (UniProtKB or NCBI). Currently, no protein database is ideally suited for the analysis of histones and the complex array of mammalian histone variants. RESULTS We propose two proteomics-oriented manually curated databases for mouse and human histone variants. We manually curated >1700 gene, transcript and protein entries to produce a non-redundant list of 83 mouse and 85 human histones. These entries were annotated in accordance with the current nomenclature and unified with the "HistoneDB2.0 with Variants" database. This resource is provided in a format that can be directly read by programs used for mass spectrometry data interpretation. In addition, it was used to interpret mass spectrometry data acquired on histones extracted from mouse testis. Several histone variants, which had so far only been inferred by homology or detected at the RNA level, were detected by mass spectrometry, confirming the existence of their protein form. CONCLUSIONS Mouse and human histone entries were collected from different databases and subsequently curated to produce a non-redundant protein-centric resource, MS_HistoneDB. It is dedicated to the proteomic study of histones in mouse and human and will hopefully facilitate the identification and functional study of histone variants. Collapse Key Words Chromatin Histone Histone variants Mass spectrometry Proteomics Collapse MESH Headings Collapse Grants Collapse
43	Most of the tight positional conservation of transcription factor binding sites near the transcription start site reflects their co-localization within regulatory modules. BMC Bioinformatics 2016;17:479. [PMID: 27871221 PMCID: PMC5117513 DOI: 10.1186/s12859-016-1354-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Accepted: 11/11/2016] [Indexed: 11/24/2022] Open Abstract Background Transcription factors (TFs) form complexes that bind regulatory modules (RMs) within DNA, to control specific sets of genes. Some transcription factor binding sites (TFBSs) near the transcription start site (TSS) display tight positional preferences relative to the TSS. Furthermore, near the TSS, RMs can co-localize TFBSs with each other and the TSS. The proportion of TFBS positional preferences due to TFBS co-localization within RMs is unknown, however. ChIP experiments confirm co-localization of some TFBSs genome-wide, including near the TSS, but they typically examine only a few TFs at a time, using non-physiological conditions that can vary from lab to lab. In contrast, sequence analysis can examine many TFs uniformly and methodically, broadly surveying the co-localization of TFBSs with tight positional preferences relative to the TSS. Results Our statistics found 43 significant sets of human motifs in the JASPAR TF Database with positional preferences relative to the TSS, with 38 preferences tight (±5 bp). Each set of motifs corresponded to a gene group of 135 to 3304 genes, with 42/43 (98%) gene groups independently validated by DAVID, a gene ontology database, with FDR < 0.05. Motifs corresponding to two TFBSs in a RM should co-occur more than by chance alone, enriching the intersection of the gene groups corresponding to the two TFs. Thus, a gene-group intersection systematically enriched beyond chance alone provides evidence that the two TFs participate in an RM. Of the 903 = 4342/2 intersections of the 43 significant gene groups, we found 768/903 (85%) pairs of gene groups with significantly enriched intersections, with 564/768 (73%) intersections independently validated by DAVID with FDR < 0.05. A user-friendly web site at http://go.usa.gov/3kjsH permits biologists to explore the interaction network of our TFBSs to identify candidate subunit RMs. Conclusions Gene duplication and convergent evolution within a genome provide obvious biological mechanisms for replicating an RM near the TSS that binds a particular TF subunit. Of all intersections of our 43 significant gene groups, 85% were significantly enriched, with 73% of the significant enrichments independently validated by gene ontology. The co-localization of TFBSs within RMs therefore likely explains much of the tight TFBS positional preferences near the TSS. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1354-5) contains supplementary material, which is available to authorized users. Collapse Key Words* Positional preference Transcription factor binding site Transcription start site Collapse MESH Headings Collapse Grants Collapse
44	Dataset of Arabidopsis plants that overexpress FT driven by a meristem-specific KNAT1 promoter. Data Brief 2016;8:520-8. [PMID: 27366785 PMCID: PMC4919726 DOI: 10.1016/j.dib.2016.06.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Revised: 05/25/2016] [Accepted: 06/02/2016] [Indexed: 11/18/2022] Open Abstract In this dataset we integrated figures comparing leaf number and rosette diameter in three Arabidopsis FT overexpressor lines (AtFTOE) driven by KNAT1 promoter, “A member of the KNOTTED class of homeodomain proteins encoded by the STM gene of Arabidopsis” [5], vs Wild Type (WT) Arabidopsis plats. Also, presented in the tables are some transcriptomic data obtained by RNA-seq Illumina HiSeq from rosette leaves of Arabidopsis plants of AtFTOE 2.1 line vs WT with accession numbers SRR2094583 and SRR2094587 for AtFTOE replicates 1–3 and AtWT for control replicates 1–2 respectively. Raw data of paired-end sequences are located in the public repository of the National Center for Biotechnology Information of the National Library of Medicine, National Institutes of Health, United States of America, Bethesda, MD, USA as Sequence Read Archive (SRA). Performed analyses of differential expression genes are visualized by Mapman and presented in figures. “Transcriptomic analysis of Arabidopsis overexpressing flowering locus T driven by a meristem-specific promoter that induces early flowering” [2], described the interpretation and discussion of the obtained data. Collapse Key Words Bioinformatics Differential expression Flowering Collapse MESH Headings Collapse Grants Z99 LM999999 Intramural NIH HHS ZIA LM082713-04 Intramural NIH HHS Collapse
45	Trajectories of microsecond molecular dynamics simulations of nucleosomes and nucleosome core particles. Data Brief 2016;7:1678-81. [PMID: 27222871 PMCID: PMC4872717 DOI: 10.1016/j.dib.2016.04.073] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2016] [Revised: 03/22/2016] [Accepted: 04/29/2016] [Indexed: 10/29/2022] Open Abstract We present here raw trajectories of molecular dynamics simulations for nucleosome with linker DNA strands as well as minimalistic nucleosome core particle model. The simulations were done in explicit solvent using CHARMM36 force field. We used this data in the research article Shaytan et al., 2016 [1]. The trajectory files are supplemented by TCL scripts providing advanced visualization capabilities. Collapse Key Words Histone Histone tails Linker DNA Molecular dynamics Nucleosome Collapse MESH Headings Collapse Grants Collapse
46	Transcriptomic analysis of Arabidopsis overexpressing flowering locus T driven by a meristem-specific promoter that induces early flowering. Gene 2016;587:120-31. [PMID: 27154816 DOI: 10.1016/j.gene.2016.04.060] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2015] [Revised: 04/18/2016] [Accepted: 04/25/2016] [Indexed: 01/09/2023] Abstract Here we analyzed in leaves the effect of FT overexpression driven by meristem-specific KNAT1 gene homolog of Arabidopsis thaliana (Lincoln et al., 1994; Long et al., 1996) on the transcriptomic response during plant development. Our results demonstrated that meristematic FT overexpression generates a phenotype with an early flowering independent of photoperiod when compared with wild type (WT) plants. Arabidopsis FT-overexpressor lines (AtFTOE) did not show significant differences compared with WT lines neither in leaf number nor in rosette diameter up to day 21, when AtFTOE flowered. After this period AtFTOE plants started flower production and no new rosette leaves were produced. Additionally, WT plants continued on vegetative stage up to day 40, producing 12-14 rosette leaves before flowering. Transcriptomic analysis of rosette leaves studied by sequencing Illumina RNA-seq allowed us to determine the differential expression in mature leaf rosette of 3652 genes, being 626 of them up-regulated and 3026 down-regulated. Overexpressed genes related with flowering showed up-regulated transcription factors such as MADS-box that are known as flowering markers in meristem and which overexpression has been related with meristem identity preservation and the transition from vegetative to floral stage. Genes related with sugar transport have shown a higher demand of monosaccharides derived from the hydrolysis of sucrose to glucose and probably fructose, which can also be influenced by reproductive stage of AtFTOE plants. Collapse Key Words Arabidopsis Flowering locus T Plant development RNA-seq analysis Transgenic plants Collapse MESH Headings Arabidopsis/genetics Arabidopsis/growth & development Arabidopsis Proteins/genetics Biological Transport Carbohydrate Metabolism Flowers/growth & development Gene Expression Profiling Gene Expression Regulation, Plant Gene Ontology Meristem/metabolism Promoter Regions, Genetic Collapse Grants Z99 LM999999 Intramural NIH HHS ZIA LM082713-03 Intramural NIH HHS Collapse
47	HMGN proteins modulate chromatin regulatory sites and gene expression during activation of naïve B cells. Nucleic Acids Res 2016;44:7144-58. [PMID: 27112571 PMCID: PMC5009722 DOI: 10.1093/nar/gkw323] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2016] [Accepted: 04/14/2016] [Indexed: 12/18/2022] Open Abstract The activation of naïve B lymphocyte involves rapid and major changes in chromatin organization and gene expression; however, the complete repertoire of nuclear factors affecting these genomic changes is not known. We report that HMGN proteins, which bind to nucleosomes and affect chromatin structure and function, co-localize with, and maintain the intensity of DNase I hypersensitive sites genome wide, in resting but not in activated B cells. Transcription analyses of resting and activated B cells from wild-type and Hmgn^−/− mice, show that loss of HMGNs dampens the magnitude of the transcriptional response and alters the pattern of gene expression during the course of B-cell activation; defense response genes are most affected at the onset of activation. Our study provides insights into the biological function of the ubiquitous HMGN chromatin binding proteins and into epigenetic processes that affect the fidelity of the transcriptional response during the activation of B cell lymphocytes. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
48	HistoneDB 2.0: a histone database with variants--an integrated resource to explore histones and their variants. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016;2016:baw014. [PMID: 26989147 PMCID: PMC4795928 DOI: 10.1093/database/baw014] [Citation(s) in RCA: 74] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2015] [Accepted: 02/01/2016] [Indexed: 12/15/2022] Abstract Compaction of DNA into chromatin is a characteristic feature of eukaryotic organisms. The core (H2A, H2B, H3, H4) and linker (H1) histone proteins are responsible for this compaction through the formation of nucleosomes and higher order chromatin aggregates. Moreover, histones are intricately involved in chromatin functioning and provide a means for genome dynamic regulation through specific histone variants and histone post-translational modifications. ‘HistoneDB 2.0 – with variants’ is a comprehensive database of histone protein sequences, classified by histone types and variants. All entries in the database are supplemented by rich sequence and structural annotations with many interactive tools to explore and compare sequences of different variants from various organisms. The core of the database is a manually curated set of histone sequences grouped into 30 different variant subsets with variant-specific annotations. The curated set is supplemented by an automatically extracted set of histone sequences from the non-redundant protein database using algorithms trained on the curated set. The interactive web site supports various searching strategies in both datasets: browsing of phylogenetic trees; on-demand generation of multiple sequence alignments with feature annotations; classification of histone-like sequences and browsing of the taxonomic diversity for every histone variant. HistoneDB 2.0 is a resource for the interactive comparative analysis of histone protein sequences and their implications for chromatin function. Database URL:http://www.ncbi.nlm.nih.gov/projects/HistoneDB2.0 Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
49	Nucleosome Dynamics at Microsecond Timescale: DNA-Protein Interactions, Water-Mediated Interactions and Nucleosome Formation. Biophys J 2016. [DOI: 10.1016/j.bpj.2015.11.2189] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open Abstract Collapse Key Words Collapse MESH Headings Collapse Grants Collapse
50	Coupling between Histone Conformations and DNA Geometry in Nucleosomes on a Microsecond Timescale: Atomistic Insights into Nucleosome Functions. J Mol Biol 2015;428:221-237. [PMID: 26699921 DOI: 10.1016/j.jmb.2015.12.004] [Citation(s) in RCA: 101] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2015] [Revised: 12/04/2015] [Accepted: 12/07/2015] [Indexed: 12/16/2022] Abstract An octamer of histone proteins wraps about 200bp of DNA into two superhelical turns to form nucleosomes found in chromatin. Although the static structure of the nucleosomal core particle has been solved, details of the dynamic interactions between histones and DNA remain elusive. We performed extensively long unconstrained, all-atom microsecond molecular dynamics simulations of nucleosomes including linker DNA segments and full-length histones in explicit solvent. For the first time, we were able to identify and characterize the rearrangements in nucleosomes on a microsecond timescale including the coupling between the conformation of the histone tails and the DNA geometry. We found that certain histone tail conformations promoted DNA bulging near its entry/exit sites, resulting in the formation of twist defects within the DNA. This led to a reorganization of histone-DNA interactions, suggestive of the formation of initial nucleosome sliding intermediates. We characterized the dynamics of the histone tails upon their condensation on the core and linker DNA and showed that tails may adopt conformationally constrained positions due to the insertion of "anchoring" lysines and arginines into the DNA minor grooves. Potentially, these phenomena affect the accessibility of post-translationally modified histone residues that serve as important sites for epigenetic marks (e.g., at H3K9, H3K27, H4K16), suggesting that interactions of the histone tails with the core and linker DNA modulate the processes of histone tail modifications and binding of the effector proteins. We discuss the implications of the observed results on the nucleosome function and compare our results to different experimental studies. Collapse Key Words chromatin epigenetics molecular dynamics simulations nucleosome dynamics protein–DNA interactions Collapse MESH Headings Collapse Grants Collapse