Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Setty M, Leslie CS. SeqGL Identifies Context-Dependent Binding Signals in Genome-Wide Regulatory Element Maps. PLoS Comput Biol 2015;11:e1004271. [PMID: 26016777 DOI: 10.1371/journal.pcbi.1004271] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2014] [Accepted: 04/03/2015] [Indexed: 11/23/2022] Open

For:	Setty M, Leslie CS. SeqGL Identifies Context-Dependent Binding Signals in Genome-Wide Regulatory Element Maps. PLoS Comput Biol 2015;11:e1004271. [PMID: 26016777 DOI: 10.1371/journal.pcbi.1004271] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2014] [Accepted: 04/03/2015] [Indexed: 11/23/2022] Open

Number

Cited by Other Article(s)

Schroeder JW, Wolfe MB, Freddolino L. ShapeME: A tool and web front-end for de novo discovery of structural motifs underpinning protein-DNA interactions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.28.635290. [PMID: 39975017 PMCID: PMC11838363 DOI: 10.1101/2025.01.28.635290] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]

Abstract

Determining where transcriptional regulators bind within a genome is paramount to understanding how gene expression is regulated. Historically, position weight matrices (PWMs) have been used to define the binding preferences of DNA binding proteins1. However, PWMs treat the identity of each base in a sequence as an independent and additive measure of binding preference, which can limit their utility2. Models that consider higher order interactions between nearby bases yield greater success in predicting proteins' binding to DNA, but for many proteins there is still substantial room for improvement in predicting and understanding the determinants of proteins' binding to DNA3. In addition to DNA sequence motifs, structural motifs (e.g., a narrow minor groove width) are important determinants of binding for some DNA-binding proteins4. Despite the initial success of algorithms using structural features of DNA to predict binding properties of proteins from either ChIP-seq or SELEX data5-8, there remains a need for a de novo structural motif discovery framework which can be applied to data from a variety of experimental designs. Here, we present a unified workflow, capable of utilizing virtually any type of data representing sequence coverage or enrichment (e.g. ChIP-seq, RNA-seq, SELEX, etc.), to discover short structural motifs with explanatory power for a protein's DNA binding preference. We couple the DNAshapeR algorithm9 with our own information-theoretic approach to de novo motif discovery, and wrap shape and sequence motif inference and model selection into a single tool called ShapeME. Application of our structural motif discovery algorithm to proteins with ChIP-seq data in ENCODE datasets reveals a subset of proteins where short structural motifs outperform the best PWM for that protein as determined from the JASPAR database, or as identified by the sequence motif elicitation tool STREME. Our approach offers a powerful and versatile framework for inferring structural DNA binding motifs, and will complement current sequence-based motif elicitation tools in discovery of protein-DNA interaction principles. A web-based interface to ShapeME is available at https://seq2fun.dcmb.med.umich.edu/shapeme, with full source code available at https://github.com/freddolino-lab/ShapeME.

Collapse

Yue T, Wang Y, Zhang L, Gu C, Xue H, Wang W, Lyu Q, Dun Y. Deep Learning for Genomics: From Early Neural Nets to Modern Large Language Models. Int J Mol Sci 2023;24:15858. [PMID: 37958843 PMCID: PMC10649223 DOI: 10.3390/ijms242115858] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 10/24/2023] [Accepted: 10/30/2023] [Indexed: 11/15/2023] Open

Cazares TA, Rizvi FW, Iyer B, Chen X, Kotliar M, Bejjani AT, Wayman JA, Donmez O, Wronowski B, Parameswaran S, Kottyan LC, Barski A, Weirauch MT, Prasath VBS, Miraldi ER. maxATAC: Genome-scale transcription-factor binding prediction from ATAC-seq with deep neural networks. PLoS Comput Biol 2023;19:e1010863. [PMID: 36719906 PMCID: PMC9917285 DOI: 10.1371/journal.pcbi.1010863] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 02/10/2023] [Accepted: 01/10/2023] [Indexed: 02/01/2023] Open

Affiliation(s)

Tareian A. Cazares Immunology Graduate Program, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
Faiz W. Rizvi Systems Biology and Physiology Graduate Program, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
Balaji Iyer Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, Ohio, United States of America
Xiaoting Chen The Center for Autoimmune Genetics and Etiology (CAGE), Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
Michael Kotliar Division of Allergy and Immunology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
Anthony T. Bejjani Molecular and Developmental Biology Graduate Program, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
Joseph A. Wayman Division of Immunobiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
Omer Donmez The Center for Autoimmune Genetics and Etiology (CAGE), Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
Benjamin Wronowski Division of Allergy and Immunology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
Sreeja Parameswaran The Center for Autoimmune Genetics and Etiology (CAGE), Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
Leah C. Kottyan The Center for Autoimmune Genetics and Etiology (CAGE), Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America Division of Human Genetics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
Artem Barski Division of Allergy and Immunology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America Division of Human Genetics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
Matthew T. Weirauch Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America The Center for Autoimmune Genetics and Etiology (CAGE), Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America Division of Human Genetics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
V. B. Surya Prasath Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, Ohio, United States of America Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
Emily R. Miraldi Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, Ohio, United States of America Division of Immunobiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America

Collapse

Kshirsagar M, Yuan H, Ferres JL, Leslie C. BindVAE: Dirichlet variational autoencoders for de novo motif discovery from accessible chromatin. Genome Biol 2022;23:174. [PMID: 35971180 PMCID: PMC9380350 DOI: 10.1186/s13059-022-02723-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 06/28/2022] [Indexed: 11/10/2022] Open

Siahpirani AF, Knaack S, Chasman D, Seirup M, Sridharan R, Stewart R, Thomson J, Roy S. Dynamic regulatory module networks for inference of cell type-specific transcriptional networks. Genome Res 2022;32:1367-1384. [PMID: 35705328 PMCID: PMC9341506 DOI: 10.1101/gr.276542.121] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Accepted: 06/02/2022] [Indexed: 11/25/2022]

Lai B, Qian S, Zhang H, Zhang S, Kozlova A, Duan J, Xu J, He X. Annotating functional effects of non-coding variants in neuropsychiatric cell types by deep transfer learning. PLoS Comput Biol 2022;18:e1010011. [PMID: 35576194 PMCID: PMC9135341 DOI: 10.1371/journal.pcbi.1010011] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Revised: 05/26/2022] [Accepted: 03/11/2022] [Indexed: 12/02/2022] Open

Morrow A, Hughes J, Singh J, Joseph A, Yosef N. Epitome: predicting epigenetic events in novel cell types with multi-cell deep ensemble learning. Nucleic Acids Res 2021;49:e110. [PMID: 34379786 PMCID: PMC8565335 DOI: 10.1093/nar/gkab676] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 07/19/2021] [Accepted: 07/25/2021] [Indexed: 01/04/2023] Open

Zhang Q, Wang D, Han K, Huang DS. Predicting TF-DNA Binding Motifs from ChIP-seq Datasets Using the Bag-Based Classifier Combined With a Multi-Fold Learning Scheme. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:1743-1751. [PMID: 32946398 DOI: 10.1109/tcbb.2020.3025007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

A machine learning-based framework for modeling transcription elongation. Proc Natl Acad Sci U S A 2021;118:2007450118. [PMID: 33526657 DOI: 10.1073/pnas.2007450118] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open

Powell SK, O'Shea C, Brennand KJ, Akbarian S. Parsing the Functional Impact of Noncoding Genetic Variants in the Brain Epigenome. Biol Psychiatry 2021;89:65-75. [PMID: 33131715 PMCID: PMC7718420 DOI: 10.1016/j.biopsych.2020.06.033] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 05/29/2020] [Accepted: 06/01/2020] [Indexed: 12/31/2022]

Vangala P, Murphy R, Quinodoz SA, Gellatly K, McDonel P, Guttman M, Garber M. High-Resolution Mapping of Multiway Enhancer-Promoter Interactions Regulating Pathogen Detection. Mol Cell 2020;80:359-373.e8. [PMID: 32991830 PMCID: PMC7572724 DOI: 10.1016/j.molcel.2020.09.005] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2020] [Revised: 06/04/2020] [Accepted: 09/04/2020] [Indexed: 11/19/2022]

Hammelman J, Krismer K, Banerjee B, Gifford DK, Sherwood RI. Identification of determinants of differential chromatin accessibility through a massively parallel genome-integrated reporter assay. Genome Res 2020;30:1468-1480. [PMID: 32973041 PMCID: PMC7605270 DOI: 10.1101/gr.263228.120] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 08/26/2020] [Indexed: 12/20/2022]

Beer MA, Shigaki D, Huangfu D. Enhancer Predictions and Genome-Wide Regulatory Circuits. Annu Rev Genomics Hum Genet 2020;21:37-54. [PMID: 32443951 DOI: 10.1146/annurev-genom-121719-010946] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

Leão FB, Vaughn LS, Bhatt D, Liao W, Maloney D, Carvalho BC, Oliveira L, Ghosh S, Silva AM. Toll-like Receptor (TLR)-induced Rasgef1b expression in macrophages is regulated by NF-κB through its proximal promoter. Int J Biochem Cell Biol 2020;127:105840. [PMID: 32866686 DOI: 10.1016/j.biocel.2020.105840] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2020] [Revised: 07/31/2020] [Accepted: 08/21/2020] [Indexed: 12/21/2022]

Srivastava D, Mahony S. Sequence and chromatin determinants of transcription factor binding and the establishment of cell type-specific binding patterns. BIOCHIMICA ET BIOPHYSICA ACTA. GENE REGULATORY MECHANISMS 2020;1863:194443. [PMID: 31639474 PMCID: PMC7166147 DOI: 10.1016/j.bbagrm.2019.194443] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Revised: 09/21/2019] [Accepted: 10/06/2019] [Indexed: 12/14/2022]

Tripodi IJ, Chowdhury M, Gruca M, Dowell RD. Combining signal and sequence to detect RNA polymerase initiation in ATAC-seq data. PLoS One 2020;15:e0232332. [PMID: 32353042 PMCID: PMC7192442 DOI: 10.1371/journal.pone.0232332] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Accepted: 04/13/2020] [Indexed: 01/12/2023] Open

Peng H. CFSP: a collaborative frequent sequence pattern discovery algorithm for nucleic acid sequence classification. PeerJ 2020;8:e8965. [PMID: 32341900 PMCID: PMC7179567 DOI: 10.7717/peerj.8965] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Accepted: 03/24/2020] [Indexed: 12/19/2022] Open

Abstract

BACKGROUND

Conserved nucleic acid sequences play an essential role in transcriptional regulation. The motifs/templates derived from nucleic acid sequence datasets are usually used as biomarkers to predict biochemical properties such as protein binding sites or to identify specific non-coding RNAs. In many cases, template-based nucleic acid sequence classification performs better than some feature extraction methods, such as N-gram and k-spaced pairs classification. The availability of large-scale experimental data provides an unprecedented opportunity to improve motif extraction methods. The process for pattern extraction from large-scale data is crucial for the creation of predictive models.

METHODS

In this article, a Teiresias-like feature extraction algorithm to discover frequent sub-sequences (CFSP) is proposed. Although gaps are allowed in some motif discovery algorithms, the distance and number of gaps are limited. The proposed algorithm can find frequent sequence pairs with a larger gap. The combinations of frequent sub-sequences in given protracted sequences capture the long-distance correlation, which implies a specific molecular biological property. Hence, the proposed algorithm intends to discover the combinations. A set of frequent sub-sequences derived from nucleic acid sequences with order is used as a base frequent sub-sequence array. The mutation information is attached to each sub-sequence array to implement fuzzy matching. Thus, a mutate records a single nucleotide variant or nucleotides insertion/deletion (indel) to encode a slight difference between frequent sequences and a matched subsequence of a sequence under investigation.

CONCLUSIONS

The proposed algorithm has been validated with several nucleic acid sequence prediction case studies. These data demonstrate better results than the recently available feature descriptors based methods based on experimental data sets such as miRNA, piRNA, and Sigma 54 promoters. CFSP is implemented in C++ and shell script; the source code and related data are available at https://github.com/HePeng2016/CFSP.

Collapse

Yang J, Ma A, Hoppe AD, Wang C, Li Y, Zhang C, Wang Y, Liu B, Ma Q. Prediction of regulatory motifs from human Chip-sequencing data using a deep learning framework. Nucleic Acids Res 2019;47:7809-7824. [PMID: 31372637 PMCID: PMC6735894 DOI: 10.1093/nar/gkz672] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Accepted: 07/23/2019] [Indexed: 11/24/2022] Open

Condition-Specific Modeling of Biophysical Parameters Advances Inference of Regulatory Networks. Cell Rep 2019;23:376-388. [PMID: 29641998 PMCID: PMC5987223 DOI: 10.1016/j.celrep.2018.03.048] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Revised: 01/12/2018] [Accepted: 03/12/2018] [Indexed: 12/31/2022] Open

Yuan H, Kshirsagar M, Zamparo L, Lu Y, Leslie CS. BindSpace decodes transcription factor binding signals by large-scale sequence embedding. Nat Methods 2019;16:858-861. [PMID: 31406384 PMCID: PMC6717532 DOI: 10.1038/s41592-019-0511-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2019] [Accepted: 07/10/2019] [Indexed: 01/04/2023]

Lai X, Stigliani A, Vachon G, Carles C, Smaczniak C, Zubieta C, Kaufmann K, Parcy F. Building Transcription Factor Binding Site Models to Understand Gene Regulation in Plants. MOLECULAR PLANT 2019;12:743-763. [PMID: 30447332 DOI: 10.1016/j.molp.2018.10.010] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 09/20/2018] [Accepted: 10/30/2018] [Indexed: 06/09/2023]

Epigenomic analysis reveals DNA motifs regulating histone modifications in human and mouse. Proc Natl Acad Sci U S A 2019;116:3668-3677. [PMID: 30755522 DOI: 10.1073/pnas.1813565116] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Samee MAH, Bruneau BG, Pollard KS. A De Novo Shape Motif Discovery Algorithm Reveals Preferences of Transcription Factors for DNA Shape Beyond Sequence Motifs. Cell Syst 2019;8:27-42.e6. [PMID: 30660610 PMCID: PMC6368855 DOI: 10.1016/j.cels.2018.12.001] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2018] [Revised: 08/18/2018] [Accepted: 12/03/2018] [Indexed: 12/17/2022]

Xu W, Zhu L, Huang DS. DCDE: An Efficient Deep Convolutional Divergence Encoding Method for Human Promoter Recognition. IEEE Trans Nanobioscience 2019;18:136-145. [PMID: 30624223 DOI: 10.1109/tnb.2019.2891239] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Specificity landscapes unmask submaximal binding site preferences of transcription factors. Proc Natl Acad Sci U S A 2018;115:E10586-E10595. [PMID: 30341220 DOI: 10.1073/pnas.1811431115] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open

Hughes AEO, Myers CA, Corbo JC. A massively parallel reporter assay reveals context-dependent activity of homeodomain binding sites in vivo. Genome Res 2018;28:1520-1531. [PMID: 30158147 PMCID: PMC6169884 DOI: 10.1101/gr.231886.117] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Accepted: 08/27/2018] [Indexed: 12/20/2022]

de Boer CG, Regev A. BROCKMAN: deciphering variance in epigenomic regulators by k-mer factorization. BMC Bioinformatics 2018;19:253. [PMID: 29970004 PMCID: PMC6029352 DOI: 10.1186/s12859-018-2255-6] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Accepted: 06/20/2018] [Indexed: 12/31/2022] Open

Guo Y, Tian K, Zeng H, Guo X, Gifford DK. A novel k-mer set memory (KSM) motif representation improves regulatory variant prediction. Genome Res 2018;28:891-900. [PMID: 29654070 PMCID: PMC5991515 DOI: 10.1101/gr.226852.117] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Accepted: 04/04/2018] [Indexed: 12/15/2022]

Li Y, Shi W, Wasserman WW. Genome-wide prediction of cis-regulatory regions using supervised deep learning methods. BMC Bioinformatics 2018;19:202. [PMID: 29855387 PMCID: PMC5984344 DOI: 10.1186/s12859-018-2187-1] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Accepted: 05/04/2018] [Indexed: 01/07/2023] Open

Abstract

Background

In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide.

Results

Based on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES based on supervised deep learning approaches for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis-regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data), and 26,000 candidate promoters (0.6% of the genome).

Conclusion

The predicted annotations of cis-regulatory regions will provide broad utility for genome interpretation from functional genomics to clinical applications. The DECRES model demonstrates potentials of deep learning technologies when combined with high-throughput sequencing data, and inspires the development of other advanced neural network models for further improvement of genome annotations.

Electronic supplementary material

The online version of this article (10.1186/s12859-018-2187-1) contains supplementary material, which is available to authorized users.

Collapse

Zhu L, Zhang HB, Huang DS. Direct AUC optimization of regulatory motifs. Bioinformatics 2018;33:i243-i251. [PMID: 28881989 PMCID: PMC5870558 DOI: 10.1093/bioinformatics/btx255] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open

Toenhake CG, Fraschka SAK, Vijayabaskar MS, Westhead DR, van Heeringen SJ, Bártfai R. Chromatin Accessibility-Based Characterization of the Gene Regulatory Network Underlying Plasmodium falciparum Blood-Stage Development. Cell Host Microbe 2018;23:557-569.e9. [PMID: 29649445 PMCID: PMC5899830 DOI: 10.1016/j.chom.2018.03.007] [Citation(s) in RCA: 114] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Revised: 02/05/2018] [Accepted: 03/05/2018] [Indexed: 02/07/2023]

Cusanovich DA, Reddington JP, Garfield DA, Daza RM, Aghamirzaie D, Marco-Ferreres R, Pliner HA, Christiansen L, Qiu X, Steemers FJ, Trapnell C, Shendure J, Furlong EEM. The cis-regulatory dynamics of embryonic development at single-cell resolution. Nature 2018. [PMID: 29539636 PMCID: PMC5866720 DOI: 10.1038/nature25981] [Citation(s) in RCA: 235] [Impact Index Per Article: 33.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Abstract

Understanding how gene regulatory networks control the progressive restriction of cell fates is a long-standing challenge. Recent advances in measuring single cell gene expression are providing new insights into lineage commitment. However, the regulatory events underlying these changes remain elusive. Here we investigate the dynamics of chromatin regulatory landscapes during embryogenesis at single cell resolution. Using single cell combinatorial indexing assay for transposase accessible chromatin (sci-ATAC-seq)1, we profiled chromatin accessibility in over 20,000 single nuclei from fixed Drosophila embryos spanning three landmark embryonic stages: 2-4 hours (hrs) after egg laying (predominantly stage 5 blastoderm nuclei), when each embryo comprises ~6,000 multipotent cells; 6-8hrs (predominantly stage 10-11), to capture a midpoint in embryonic development when major lineages in the mesoderm and ectoderm are specified; and 10-12hrs (predominantly stage 13), when each of the embryo’s >20,000 cells are undergoing terminal differentiation. Our results reveal spatial heterogeneity in the usage of the regulatory genome prior to gastrulation, a feature that aligns with future cell fate, and nuclei can be temporally ordered along developmental trajectories. During mid-embryogenesis, tissue granularity emerges such that individual cell types can be inferred by their chromatin accessibility, while maintaining a signature of their germ layer of origin. The data reveal overlapping usage of regulatory elements between cells of the endoderm and non-myogenic mesoderm, suggesting a common developmental program reminiscent of the mesendoderm lineage in other species2–4. Altogether, we identify over 30,000 distal regulatory elements exhibiting tissue-specific accessibility. We validated the germ layer specificity of a subset of these predicted enhancers in transgenic embryos, achieving 90% accuracy. Overall, our results demonstrate the power of shotgun single cell profiling of embryos to resolve dynamic changes in the chromatin landscape during development, and to uncover the cis-regulatory programs of metazoan germ layers and cell types.

Collapse

Kakumanu A, Velasco S, Mazzoni E, Mahony S. Deconvolving sequence features that discriminate between overlapping regulatory annotations. PLoS Comput Biol 2017;13:e1005795. [PMID: 29049320 PMCID: PMC5663517 DOI: 10.1371/journal.pcbi.1005795] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2017] [Revised: 10/31/2017] [Accepted: 09/26/2017] [Indexed: 11/19/2022] Open

Abstract

Genomic loci with regulatory potential can be annotated with various properties. For example, genomic sites bound by a given transcription factor (TF) can be divided according to whether they are proximal or distal to known promoters. Sites can be further labeled according to the cell types and conditions in which they are active. Given such a collection of labeled sites, it is natural to ask what sequence features are associated with each annotation label. However, discovering such label-specific sequence features is often confounded by overlaps between the labels; e.g. if regulatory sites specific to a given cell type are also more likely to be promoter-proximal, it is difficult to assess whether motifs identified in that set of sites are associated with the cell type or associated with promoters. In order to meet this challenge, we developed SeqUnwinder, a principled approach to deconvolving interpretable discriminative sequence features associated with overlapping annotation labels. We demonstrate the novel analysis abilities of SeqUnwinder using three examples. Firstly, SeqUnwinder is able to unravel sequence features associated with the dynamic binding behavior of TFs during motor neuron programming from features associated with chromatin state in the initial embryonic stem cells. Secondly, we characterize distinct sequence properties of multi-condition and cell-specific TF binding sites after controlling for uneven associations with promoter proximity. Finally, we demonstrate the scalability of SeqUnwinder to discover cell-specific sequence features from over one hundred thousand genomic loci that display DNase I hypersensitivity in one or more ENCODE cell lines.

Transcription factor proteins control gene expression by recognizing and interacting with short DNA sequence patterns in regulatory regions on the genome. Current genomics experiments allow us to find regulatory regions associated with a particular biochemical activity over the entire genome; for example, all regions where a particular transcription factor interacts with the genome in a given cell type. Given a collection of regulatory regions, we often aim to discover short DNA sequence patterns that are more common in the collection than in other regions. Performing such “DNA motif-finding” analysis can give us hints about the patterns that determine gene regulation in the analyzed cell type.

Here we describe a new method for DNA motif-finding called SeqUnwinder. Our approach analyzes collections of regulatory regions where each has been labeled according to various biological properties. For example, the labels could correspond to various cell types in which the regulatory region is active. SeqUnwinder then performs machine-learning analysis to unravel DNA sequence features that are characteristic of each label (e.g. features that distinguish regulatory regions in each cell type from other cell types). SeqUnwinder is the first method to enable analysis of regulatory region collections that contain several overlapping labels.

Collapse

Mariani L, Weinand K, Vedenko A, Barrera LA, Bulyk ML. Identification of Human Lineage-Specific Transcriptional Coregulators Enabled by a Glossary of Binding Modules and Tunable Genomic Backgrounds. Cell Syst 2017;5:187-201.e7. [PMID: 28957653 PMCID: PMC5657590 DOI: 10.1016/j.cels.2017.06.015] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2017] [Revised: 06/03/2017] [Accepted: 06/29/2017] [Indexed: 01/08/2023]

Oh H, Grinberg-Bleyer Y, Liao W, Maloney D, Wang P, Wu Z, Wang J, Bhatt DM, Heise N, Schmid RM, Hayden MS, Klein U, Rabadan R, Ghosh S. An NF-κB Transcription-Factor-Dependent Lineage-Specific Transcriptional Program Promotes Regulatory T Cell Identity and Function. Immunity 2017;47:450-465.e5. [PMID: 28889947 DOI: 10.1016/j.immuni.2017.08.010] [Citation(s) in RCA: 164] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2016] [Revised: 07/03/2017] [Accepted: 08/17/2017] [Indexed: 01/30/2023]

Affiliation(s)

Hyunju Oh Department of Microbiology & Immunology, Columbia University College of Physicians & Surgeons, New York, NY 10032, USA
Yenkel Grinberg-Bleyer Department of Microbiology & Immunology, Columbia University College of Physicians & Surgeons, New York, NY 10032, USA
Will Liao New York Genome Center, New York, NY 10013, USA
Dillon Maloney New York Genome Center, New York, NY 10013, USA
Pingzhang Wang Department of Systems Biology and Department of Biomedical Informatics, Columbia University College of Physicians and Surgeons, New York, NY 10032, USA
Zikai Wu Department of Systems Biology and Department of Biomedical Informatics, Columbia University College of Physicians and Surgeons, New York, NY 10032, USA
Jiguang Wang Department of Systems Biology and Department of Biomedical Informatics, Columbia University College of Physicians and Surgeons, New York, NY 10032, USA
Dev M Bhatt Department of Microbiology & Immunology, Columbia University College of Physicians & Surgeons, New York, NY 10032, USA
Nicole Heise Herbert Irving Comprehensive Cancer Center, College of Physicians & Surgeons, Columbia University, New York, NY 10032, USA
Roland M Schmid II Medizinische Klinik, Klinikum Rechts der Isar, Technische Universität Munich, Munich, Germany
Matthew S Hayden Department of Microbiology & Immunology, Columbia University College of Physicians & Surgeons, New York, NY 10032, USA; Section of Dermatology, Department of Surgery, Dartmouth-Hitchcock Medical Center, Lebanon, New Hampshire, 03756, USA
Ulf Klein Department of Microbiology & Immunology, Columbia University College of Physicians & Surgeons, New York, NY 10032, USA; Herbert Irving Comprehensive Cancer Center, College of Physicians & Surgeons, Columbia University, New York, NY 10032, USA; Department of Pathology & Cell Biology, College of Physicians & Surgeons, Columbia University, New York, NY 10032, USA
Raul Rabadan Department of Systems Biology and Department of Biomedical Informatics, Columbia University College of Physicians and Surgeons, New York, NY 10032, USA
Sankar Ghosh Department of Microbiology & Immunology, Columbia University College of Physicians & Surgeons, New York, NY 10032, USA.

Collapse

Lu R, Mucaki EJ, Rogan PK. Discovery and validation of information theory-based transcription factor and cofactor binding site motifs. Nucleic Acids Res 2017;45:e27. [PMID: 27899659 PMCID: PMC5389469 DOI: 10.1093/nar/gkw1036] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Accepted: 10/19/2016] [Indexed: 02/06/2023] Open

Zhang H, Zhu L, Huang DS. WSMD: weakly-supervised motif discovery in transcription factor ChIP-seq data. Sci Rep 2017;7:3217. [PMID: 28607381 PMCID: PMC5468353 DOI: 10.1038/s41598-017-03554-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2016] [Accepted: 05/02/2017] [Indexed: 01/24/2023] Open

Chen X, Yu B, Carriero N, Silva C, Bonneau R. Mocap: large-scale inference of transcription factor binding sites from chromatin accessibility. Nucleic Acids Res 2017;45:4315-4329. [PMID: 28334916 PMCID: PMC5416775 DOI: 10.1093/nar/gkx174] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2016] [Revised: 02/28/2017] [Accepted: 03/06/2017] [Indexed: 12/21/2022] Open

Chasman D, Roy S. Inference of cell type specific regulatory networks on mammalian lineages. ACTA ACUST UNITED AC 2017;2:130-139. [PMID: 29082337 DOI: 10.1016/j.coisb.2017.04.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]

Lanchantin J, Singh R, Wang B, Qi Y. DEEP MOTIF DASHBOARD: VISUALIZING AND UNDERSTANDING GENOMIC SEQUENCES USING DEEP NEURAL NETWORKS. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017;22:254-265. [PMID: 27896980 PMCID: PMC5787355 DOI: 10.1142/9789813207813_0025] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]

Hashimoto T, Sherwood RI, Kang DD, Rajagopal N, Barkal AA, Zeng H, Emons BJM, Srinivasan S, Jaakkola T, Gifford DK. A synergistic DNA logic predicts genome-wide chromatin accessibility. Genome Res 2016;26:1430-1440. [PMID: 27456004 PMCID: PMC5052050 DOI: 10.1101/gr.199778.115] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2015] [Accepted: 07/20/2016] [Indexed: 01/27/2023]

Kelley DR, Snoek J, Rinn JL. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res 2016;26:990-9. [PMID: 27197224 PMCID: PMC4937568 DOI: 10.1101/gr.200535.115] [Citation(s) in RCA: 553] [Impact Index Per Article: 61.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2015] [Accepted: 04/26/2016] [Indexed: 12/22/2022]

Lee D. LS-GKM: a new gkm-SVM for large-scale datasets. Bioinformatics 2016;32:2196-8. [PMID: 27153584 DOI: 10.1093/bioinformatics/btw142] [Citation(s) in RCA: 84] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Accepted: 03/09/2016] [Indexed: 11/12/2022] Open

González AJ, Setty M, Leslie CS. Early enhancer establishment and regulatory locus complexity shape transcriptional programs in hematopoietic differentiation. Nat Genet 2015;47:1249-59. [PMID: 26390058 PMCID: PMC4626279 DOI: 10.1038/ng.3402] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2015] [Accepted: 08/19/2015] [Indexed: 12/23/2022]