1
|
Zhang C, An H, Hu J, Li J, Zhang W, Lan X, Deng H, Zhang JR. MetR is a molecular adaptor for pneumococcal carriage in the healthy upper airway. Mol Microbiol 2021; 116:438-458. [PMID: 33811693 DOI: 10.1111/mmi.14724] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2019] [Revised: 03/29/2021] [Accepted: 03/30/2021] [Indexed: 11/26/2022]
Abstract
Streptococcus pneumoniae resides in the human upper airway as a commensal but also causes pneumonia, bacteremia, meningitis, and otitis media. It remains unclear how pneumococci adapt to nutritional conditions of various host niches. We here show that MetR, a LysR family transcriptional regulator, serves as a molecular adaptor for pneumococcal fitness, particularly in the upper airway. The metR mutant of strain D39 rapidly disappeared from the nasopharynx but was marginally attenuated in the lungs and bloodstream of mice. RNA-seq and ChIP-seq analyses showed that MetR broadly regulates transcription of the genes involved in methionine synthesis and other functions under methionine starvation. Genetic and biochemical analyses confirmed that MetR is essential for the activation of methionine synthesis but not uptake. Co-infection of influenza virus partially restored the colonization defect of the metR mutant. These results strongly suggest that MetR is particularly evolved for pneumococcal carriage in the upper airway of healthy individuals where free methionine is severely limited, but it becomes dispensable where environmental methionine is relatively more abundant (e.g., inflamed upper airway and sterile sites). To the best of our knowledge, MetR represents the first known regulator particularly for pneumococcal carriage in healthy individuals.
Collapse
Affiliation(s)
- Chengwang Zhang
- Center for Infectious Disease Research, School of Medicine, Tsinghua University, Beijing, China
| | - Haoran An
- Center for Infectious Disease Research, School of Medicine, Tsinghua University, Beijing, China.,Tsinghua-Peking Joint Center for Life Sciences, Tsinghua University, Beijing, China
| | - Jiao Hu
- Center for Infectious Disease Research, School of Medicine, Tsinghua University, Beijing, China
| | - Jing Li
- Center for Infectious Disease Research, School of Medicine, Tsinghua University, Beijing, China
| | - Wenhao Zhang
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
| | - Xun Lan
- Center for Infectious Disease Research, School of Medicine, Tsinghua University, Beijing, China.,Tsinghua-Peking Joint Center for Life Sciences, Tsinghua University, Beijing, China
| | - Haiteng Deng
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
| | - Jing-Ren Zhang
- Center for Infectious Disease Research, School of Medicine, Tsinghua University, Beijing, China.,Tsinghua-Peking Joint Center for Life Sciences, Tsinghua University, Beijing, China
| |
Collapse
|
2
|
Zhang X, Wang Y, Chiang HC, Hsieh YP, Lu C, Park BH, Jatoi I, Jin VX, Hu Y, Li R. BRCA1 mutations attenuate super-enhancer function and chromatin looping in haploinsufficient human breast epithelial cells. Breast Cancer Res 2019; 21:51. [PMID: 30995943 PMCID: PMC6472090 DOI: 10.1186/s13058-019-1132-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Accepted: 03/27/2019] [Indexed: 01/07/2023] Open
Abstract
Background BRCA1-associated breast cancer originates from luminal progenitor cells. BRCA1 functions in multiple biological processes, including double-strand break repair, replication stress suppression, transcriptional regulation, and chromatin reorganization. While non-malignant cells carrying cancer-predisposing BRCA1 mutations exhibit increased genomic instability, it remains unclear whether BRCA1 haploinsufficiency affects transcription and chromatin dynamics in breast epithelial cells. Methods H3K27ac-associated super-enhancers were compared in primary breast epithelial cells from BRCA1 mutation carriers (BRCA1mut/+) and non-carriers (BRCA1+/+). Non-tumorigenic MCF10A breast epithelial cells with engineered BRCA1 haploinsufficiency were used to confirm the H3K27ac changes. The impact of BRCA1 mutations on enhancer function and enhancer-promoter looping was assessed in MCF10A cells. Results Here, we show that primary mammary epithelial cells from women with BRCA1 mutations display significant loss of H3K27ac-associated super-enhancers. These BRCA1-dependent super-enhancers are enriched with binding motifs for the GATA family. Non-tumorigenic BRCA1mut/+ MCF10A cells recapitulate the H3K27ac loss. Attenuated histone mark and enhancer activity in these BRCA1mut/+ MCF10A cells can be partially restored with wild-type BRCA1. Furthermore, chromatin conformation analysis demonstrates impaired enhancer-promoter looping in BRCA1mut/+ MCF10A cells. Conclusions H3K27ac-associated super-enhancer loss is a previously unappreciated functional deficiency in ostensibly normal BRCA1 mutation-carrying breast epithelium. Our findings offer new mechanistic insights into BRCA1 mutation-associated transcriptional and epigenetic abnormality in breast epithelial cells and tissue/cell lineage-specific tumorigenesis. Electronic supplementary material The online version of this article (10.1186/s13058-019-1132-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiaowen Zhang
- Department of Biochemistry & Molecular Medicine, School of Medicine & Health Sciences, The George Washington University, Washington, DC, 20037, USA
| | - Yao Wang
- Department of Molecular Medicine, University of Texas Health Science Center at San Antonio, San Antonio, TX, 78229, USA
| | - Huai-Chin Chiang
- Department of Biochemistry & Molecular Medicine, School of Medicine & Health Sciences, The George Washington University, Washington, DC, 20037, USA
| | - Yuan-Pang Hsieh
- Department of Chemical Engineering, Virginia Tech, Blacksburg, VA, 24061, USA
| | - Chang Lu
- Department of Chemical Engineering, Virginia Tech, Blacksburg, VA, 24061, USA
| | - Ben Ho Park
- Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Ismail Jatoi
- Department of Surgery, University of Texas Health Science Center at San Antonio, San Antonio, TX, 78229, USA
| | - Victor X Jin
- Department of Molecular Medicine, University of Texas Health Science Center at San Antonio, San Antonio, TX, 78229, USA.
| | - Yanfen Hu
- Department of Anatomy & Cell Biology, School of Medicine & Health Sciences, The George Washington University, Washington, DC, 20037, USA.
| | - Rong Li
- Department of Biochemistry & Molecular Medicine, School of Medicine & Health Sciences, The George Washington University, Washington, DC, 20037, USA.
| |
Collapse
|
3
|
Tran L, Hamp T, Rost B. ProfPPIdb: Pairs of physical protein-protein interactions predicted for entire proteomes. PLoS One 2018; 13:e0199988. [PMID: 30020956 PMCID: PMC6051629 DOI: 10.1371/journal.pone.0199988] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2018] [Accepted: 06/17/2018] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Protein-protein interactions (PPIs) play a key role in many cellular processes. Most annotations of PPIs mix experimental and computational data. The mix optimizes coverage, but obfuscates the annotation origin. Some resources excel at focusing on reliable experimental data. Here, we focused on new pairs of interacting proteins for several model organisms based solely on sequence-based prediction methods. RESULTS We extracted reliable experimental data about which proteins interact (binary) for eight diverse model organisms from public databases, namely from Escherichia coli, Schizosaccharomyces pombe, Plasmodium falciparum, Drosophila melanogaster, Caenorhabditis elegans, Mus musculus, Rattus norvegicus, Arabidopsis thaliana, and for the previously used Homo sapiens and Saccharomyces cerevisiae. Those data were the base to develop a PPI prediction method for each model organism. The method used evolutionary information through a profile-kernel Support Vector Machine (SVM). With the resulting eight models, we predicted all possible protein pairs in each organism and made the top predictions available through a web application. Almost all of the PPIs made available were predicted between proteins that have not been observed in any interaction, in particular for less well-studied organisms. Thus, our work complements existing resources and is particularly helpful for designing experiments because of its uniqueness. Experimental annotations and computational predictions are strongly influenced by the fact that some proteins have many partners and others few. To optimize machine learning, recent methods explicitly ignored such a network-structure and rely either on domain knowledge or sequence-only methods. Our approach is independent of domain-knowledge and leverages evolutionary information. The database interface representing our results is accessible from https://rostlab.org/services/ppipair/. The data can also be downloaded from https://figshare.com/collections/ProfPPI-DB/4141784.
Collapse
Affiliation(s)
- Linh Tran
- Imperial College London (ICL), Department of Computing, United Kingdom
- Technical University of Munich (TUM), Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr, Germany
- * E-mail:
| | - Tobias Hamp
- Technical University of Munich (TUM), Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr, Germany
| | - Burkhard Rost
- Technical University of Munich (TUM), Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr, Germany
- Technical University of Munich (TUM), Institute for Advanced Study (TUM-IAS), Lichtenbergstr, Germany
| |
Collapse
|
4
|
Li ZW, You ZH, Chen X, Li LP, Huang DS, Yan GY, Nie R, Huang YA. Accurate prediction of protein-protein interactions by integrating potential evolutionary information embedded in PSSM profile and discriminative vector machine classifier. Oncotarget 2017; 8:23638-23649. [PMID: 28423569 PMCID: PMC5410333 DOI: 10.18632/oncotarget.15564] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2016] [Accepted: 01/11/2017] [Indexed: 11/25/2022] Open
Abstract
Identification of protein-protein interactions (PPIs) is of critical importance for deciphering the underlying mechanisms of almost all biological processes of cell and providing great insight into the study of human disease. Although much effort has been devoted to identifying PPIs from various organisms, existing high-throughput biological techniques are time-consuming, expensive, and have high false positive and negative results. Thus it is highly urgent to develop in silico methods to predict PPIs efficiently and accurately in this post genomic era. In this article, we report a novel computational model combining our newly developed discriminative vector machine classifier (DVM) and an improved Weber local descriptor (IWLD) for the prediction of PPIs. Two components, differential excitation and orientation, are exploited to build evolutionary features for each protein sequence. The main characteristics of the proposed method lies in introducing an effective feature descriptor IWLD which can capture highly discriminative evolutionary information from position-specific scoring matrixes (PSSM) of protein data, and employing the powerful and robust DVM classifier. When applying the proposed method to Yeast and H. pylori data sets, we obtained excellent prediction accuracies as high as 96.52% and 91.80%, respectively, which are significantly better than the previous methods. Extensive experiments were then performed for predicting cross-species PPIs and the predictive results were also pretty promising. To further validate the performance of the proposed method, we compared it with the state-of-the-art support vector machine (SVM) classifier on Human data set. The experimental results obtained indicate that our method is highly effective for PPIs prediction and can be taken as a supplementary tool for future proteomics research.
Collapse
Affiliation(s)
- Zheng-Wei Li
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China
| | - Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Li-Ping Li
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China
| | - De-Shuang Huang
- School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
| | - Gui-Ying Yan
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
| | - Ru Nie
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
| | - Yu-An Huang
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518060, China
| |
Collapse
|
5
|
Tang B. Genomic feature extraction and comparison based on global alignment of ChIP-sequencing data. Bioengineered 2017; 8:248-255. [PMID: 27690208 PMCID: PMC5470523 DOI: 10.1080/21655979.2016.1226714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
Abstract
Enhanced accuracy and high-throughput capability in capturing genetic activities lead ChIP-sequencing technology to be applied prevalently in diverse study for tackling DNA-protein interaction problems. Till now, such questions as deciding suitable ChIP-seq arguments and comparing sample quality still haunt biologists. We propose the methods for answering such questions as deciding optimal argument pairs in global alignment of ChIP sequencing data; then we employ a modern signal processing approach to extract inherent genomic features from the global alignments of transcriptional binding activities; together with pairwise comparison from intra- and inter-sample perspectives; thus we can further determine alignment quality and decide the optimal candidate for multi-source heterogeneous high-throughput sequences. The work provides a practical approach to quantitatively compare the alignment quality for heterogeneous sequencing data, especially in determining the efficiency of transcriptional binding from replicate samples, thus it helps to exploit the potentiality of ChIP-seq for deep comprehension of inherent biological meanings from the high-throughput genomic sequences.
Collapse
Affiliation(s)
- Binhua Tang
- a Epigenetics & Function Group , College of the Internet of Things, Hohai University , Jiangsu , China.,b School of Public Health , Shanghai Jiao Tong University , Shanghai , China
| |
Collapse
|
6
|
Thomas R, Thomas S, Holloway AK, Pollard KS. Features that define the best ChIP-seq peak calling algorithms. Brief Bioinform 2017; 18:441-450. [PMID: 27169896 PMCID: PMC5429005 DOI: 10.1093/bib/bbw035] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2016] [Revised: 03/01/2016] [Indexed: 12/20/2022] Open
Abstract
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is an important tool for studying gene regulatory proteins, such as transcription factors and histones. Peak calling is one of the first steps in the analysis of these data. Peak calling consists of two sub-problems: identifying candidate peaks and testing candidate peaks for statistical significance. We surveyed 30 methods and identified 12 features of the two sub-problems that distinguish methods from each other. We picked six methods GEM, MACS2, MUSIC, BCP, Threshold-based method (TM) and ZINBA] that span this feature space and used a combination of 300 simulated ChIP-seq data sets, 3 real data sets and mathematical analyses to identify features of methods that allow some to perform better than the others. We prove that methods that explicitly combine the signals from ChIP and input samples are less powerful than methods that do not. Methods that use windows of different sizes are more powerful than the ones that do not. For statistical testing of candidate peaks, methods that use a Poisson test to rank their candidate peaks are more powerful than those that use a Binomial test. BCP and MACS2 have the best operating characteristics on simulated transcription factor binding data. GEM has the highest fraction of the top 500 peaks containing the binding motif of the immunoprecipitated factor, with 50% of its peaks within 10 base pairs of a motif. BCP and MUSIC perform best on histone data. These findings provide guidance and rationale for selecting the best peak caller for a given application.
Collapse
Affiliation(s)
| | - Sean Thomas
- Gladstone Institutes, San Francisco, CA, USA
- Division of Biostatistics, University of California, San Francisco, CA, USA
| | - Alisha K Holloway
- Gladstone Institutes, San Francisco, CA, USA
- Division of Biostatistics, University of California, San Francisco, CA, USA
- Phylos Biosciences, Portland, OR, USA
| | - Katherine S Pollard
- Gladstone Institutes, San Francisco, CA, USA
- Division of Biostatistics, University of California, San Francisco, CA, USA
- Institute for Human Genetics and Institute for Computational Health Sciences, University of California, San Francisco, CA, USA
| |
Collapse
|
7
|
Gan Y, Tao H, Guan J, Zhou S. iHMS: a database integrating human histone modification data across developmental stages and tissues. BMC Bioinformatics 2017; 18:103. [PMID: 28187703 PMCID: PMC5303264 DOI: 10.1186/s12859-017-1461-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2016] [Accepted: 01/03/2017] [Indexed: 11/17/2022] Open
Abstract
Background Differences in chromatin states are critical to the multiplicity of cell states. Recently genome-wide histone modification maps of diverse human developmental stages and tissues have been charted. Description To facilitate the investigation of epigenetic dynamics and regulatory mechanisms in cellular differentiation processes, we developed iHMS, an integrated human histone modification database that incorporates massive histone modification maps spanning different developmental stages, lineages and tissues (http://www.tongjidmb.com/human/index.html). It also includes genome-wide expression data of different conditions, reference gene annotations, GC content and CpG island information. By providing an intuitive and user-friendly query interface, iHMS enables comprehensive query and comparative analysis based on gene names, genomic region locations, histone modification marks and cell types. Moreover, it offers an efficient browser that allows users to visualize and compare multiple genome-wide histone modification maps and related expression profiles across different developmental stages and tissues. Conclusion iHMS is of great helpfulness to understand how global histone modification state transitions impact cellular phenotypes across different developmental stages and tissues in the human genome. This extensive catalog of histone modification states thus presents an important resource for epigenetic and developmental studies.
Collapse
Affiliation(s)
- Yanglan Gan
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Han Tao
- Department of Computer Science and Technology, Tongji University, Shanghai, China
| | - Jihong Guan
- Department of Computer Science and Technology, Tongji University, Shanghai, China.
| | - Shuigeng Zhou
- Shanghai Key Lab of Intelligent Information Processing and School of Computer Science, Fudan University, Shanghai, China
| |
Collapse
|
8
|
COPAR: A ChIP-Seq Optimal Peak Analyzer. BIOMED RESEARCH INTERNATIONAL 2017; 2017:5346793. [PMID: 28357402 PMCID: PMC5357551 DOI: 10.1155/2017/5346793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/28/2016] [Accepted: 02/14/2017] [Indexed: 11/17/2022]
Abstract
Sequencing data quality and peak alignment efficiency of ChIP-sequencing profiles are directly related to the reliability and reproducibility of NGS experiments. Till now, there is no tool specifically designed for optimal peak alignment estimation and quality-related genomic feature extraction for ChIP-sequencing profiles. We developed open-sourced COPAR, a user-friendly package, to statistically investigate, quantify, and visualize the optimal peak alignment and inherent genomic features using ChIP-seq data from NGS experiments. It provides a versatile perspective for biologists to perform quality-check for high-throughput experiments and optimize their experiment design. The package COPAR can process mapped ChIP-seq read file in BED format and output statistically sound results for multiple high-throughput experiments. Together with three public ChIP-seq data sets verified with the developed package, we have deposited COPAR on GitHub under a GNU GPL license.
Collapse
|
9
|
Ambrosini G, Dreos R, Kumar S, Bucher P. The ChIP-Seq tools and web server: a resource for analyzing ChIP-seq and other types of genomic data. BMC Genomics 2016; 17:938. [PMID: 27863463 PMCID: PMC5116162 DOI: 10.1186/s12864-016-3288-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Accepted: 11/15/2016] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND ChIP-seq and related high-throughput chromatin profilig assays generate ever increasing volumes of highly valuable biological data. To make sense out of it, biologists need versatile, efficient and user-friendly tools for access, visualization and itegrative analysis of such data. RESULTS Here we present the ChIP-Seq command line tools and web server, implementing basic algorithms for ChIP-seq data analysis starting with a read alignment file. The tools are optimized for memory-efficiency and speed thus allowing for processing of large data volumes on inexpensive hardware. The web interface provides access to a large database of public data. The ChIP-Seq tools have a modular and interoperable design in that the output from one application can serve as input to another one. Complex and innovative tasks can thus be achieved by running several tools in a cascade. CONCLUSIONS The various ChIP-Seq command line tools and web services either complement or compare favorably to related bioinformatics resources in terms of computational efficiency, ease of access to public data and interoperability with other web-based tools. The ChIP-Seq server is accessible at http://ccg.vital-it.ch/chipseq/ .
Collapse
Affiliation(s)
- Giovanna Ambrosini
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland
| | - René Dreos
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland
| | - Sunil Kumar
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland
| | - Philipp Bucher
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland
| |
Collapse
|
10
|
Jadhav RR, Wang YV, Hsu YT, Liu J, Garcia D, Lai Z, Huang THM, Jin VX. Methyl-binding DNA capture Sequencing for Patient Tissues. J Vis Exp 2016. [PMID: 27842364 DOI: 10.3791/54131] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Methylation is one of the essential epigenetic modifications to the DNA, which is responsible for the precise regulation of genes required for stable development and differentiation of different tissue types. Dysregulation of this process is often the hallmark of various diseases like cancer. Here, we outline one of the recent sequencing techniques, Methyl-Binding DNA Capture sequencing (MBDCap-seq), used to quantify methylation in various normal and disease tissues for large patient cohorts. We describe a detailed protocol of this affinity enrichment approach along with a bioinformatics pipeline to achieve optimal quantification. This technique has been used to sequence hundreds of patients across various cancer types as a part of the 1,000 methylome project (Cancer Methylome System).
Collapse
Affiliation(s)
- Rohit R Jadhav
- Department of Molecular Medicine, University of Texas Health Science Center at San Antonio
| | - Yao V Wang
- Department of Molecular Medicine, University of Texas Health Science Center at San Antonio
| | - Ya-Ting Hsu
- Department of Molecular Medicine, University of Texas Health Science Center at San Antonio
| | - Joseph Liu
- Department of Molecular Medicine, University of Texas Health Science Center at San Antonio
| | - Dawn Garcia
- Greehey Children's Cancer Research Institute, University of Texas Health Science Center at San Antonio
| | - Zhao Lai
- Greehey Children's Cancer Research Institute, University of Texas Health Science Center at San Antonio
| | - Tim H M Huang
- Department of Molecular Medicine, University of Texas Health Science Center at San Antonio
| | - Victor X Jin
- Department of Molecular Medicine, University of Texas Health Science Center at San Antonio;
| |
Collapse
|
11
|
Hsu YT, Osmulski P, Wang Y, Huang YW, Liu L, Ruan J, Jin VX, Kirma NB, Gaczynska ME, Huang THM. EpCAM-Regulated Transcription Exerts Influences on Nanomechanical Properties of Endometrial Cancer Cells That Promote Epithelial-to-Mesenchymal Transition. Cancer Res 2016; 76:6171-6182. [PMID: 27569206 DOI: 10.1158/0008-5472.can-16-0752] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2016] [Accepted: 08/15/2016] [Indexed: 12/24/2022]
Abstract
Overexpression of epithelial cell adhesion molecule (EpCAM) has been implicated in advanced endometrial cancer, but its roles in this progression remain to be elucidated. In addition to its structural role in modulating cell-surface adhesion, here we demonstrate that EpCAM is a regulatory molecule in which its internalization into the nucleus turns on a transcription program. Activation of EGF/EGFR signal transduction triggered cell-surface cleavage of EpCAM, leading to nuclear internalization of its cytoplasmic domain EpICD. ChIP-seq analysis identified target genes that are coregulated by EpICD and its transcription partner, LEF-1. Network enrichment analysis further uncovered a group of 105 genes encoding functions for tight junction, adherent, and cell migration. Furthermore, nanomechanical analysis by atomic force microscopy revealed increased softness and decreased adhesiveness of EGF-stimulated cancer cells, implicating acquisition of an epithelial-mesenchymal transition (EMT) phenotype. Thus, genome editing of EpCAM could be associated with altering these nanomechanical properties towards a less aggressive phenotype. Using this integrative genomic-biophysical approach, we demonstrate for the first time an intricate relationship between EpCAM-regulated transcription and altered biophysical properties of cells that promote EMT in advanced endometrial cancer. Cancer Res; 76(21); 6171-82. ©2016 AACR.
Collapse
Affiliation(s)
- Ya-Ting Hsu
- Departments of Molecular Medicine/Institute of Biotechnology, University of Texas Health Science Center at San Antonio, San Antonio, Texas
| | - Pawel Osmulski
- Departments of Molecular Medicine/Institute of Biotechnology, University of Texas Health Science Center at San Antonio, San Antonio, Texas
| | - Yao Wang
- Departments of Molecular Medicine/Institute of Biotechnology, University of Texas Health Science Center at San Antonio, San Antonio, Texas
| | - Yi-Wen Huang
- Department of Obstetrics and Gynecology, Medical College of Wisconsin, Milwaukee, Wisconsin
| | - Lu Liu
- Department of Computer Science, University of Texas at San Antonio, San Antonio, Texas
| | - Jianhua Ruan
- Department of Computer Science, University of Texas at San Antonio, San Antonio, Texas
| | - Victor X Jin
- Departments of Molecular Medicine/Institute of Biotechnology, University of Texas Health Science Center at San Antonio, San Antonio, Texas
| | - Nameer B Kirma
- Departments of Molecular Medicine/Institute of Biotechnology, University of Texas Health Science Center at San Antonio, San Antonio, Texas
| | - Maria E Gaczynska
- Departments of Molecular Medicine/Institute of Biotechnology, University of Texas Health Science Center at San Antonio, San Antonio, Texas.
| | - Tim Hui-Ming Huang
- Departments of Molecular Medicine/Institute of Biotechnology, University of Texas Health Science Center at San Antonio, San Antonio, Texas.
| |
Collapse
|
12
|
Abstract
Background Peak calling is a fundamental step in the analysis of data generated by ChIP-seq or similar techniques to acquire epigenetics information. Current peak callers are often hard to parameterise and may therefore be difficult to use for non-bioinformaticians. In this paper, we present the ChIP-seq analysis tool available in CLC Genomics Workbench and CLC Genomics Server (version 7.5 and up), a user-friendly peak-caller designed to be not specific to a particular *-seq protocol. Results We illustrate the advantages of a shape-based approach and describe the algorithmic principles underlying the implementation. Thanks to the generality of the idea and the fact the algorithm is able to learn the peak shape from the data, the implementation requires only minimal user input, while still being applicable to a range of *-seq protocols. Using independently validated benchmark datasets, we compare our implementation to other state-of-the-art algorithms explicitly designed to analyse ChIP-seq data and provide an evaluation in terms of receiver-operator characteristic (ROC) plots. In order to show the applicability of the method to similar *-seq protocols, we also investigate algorithmic performances on DNase-seq data. Conclusions The results show that CLC shape-based peak caller ranks well among popular state-of-the-art peak callers while providing flexibility and ease-of-use.
Collapse
Affiliation(s)
| | - Michael Lappe
- Qiagen Aarhus, Silkeborgvej 2, Aarhus, 8000, DK, Denmark.
| |
Collapse
|
13
|
Using the Relevance Vector Machine Model Combined with Local Phase Quantization to Predict Protein-Protein Interactions from Protein Sequences. BIOMED RESEARCH INTERNATIONAL 2016; 2016:4783801. [PMID: 27314023 PMCID: PMC4893571 DOI: 10.1155/2016/4783801] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/05/2016] [Accepted: 04/12/2016] [Indexed: 01/08/2023]
Abstract
We propose a novel computational method known as RVM-LPQ that combines the Relevance Vector Machine (RVM) model and Local Phase Quantization (LPQ) to predict PPIs from protein sequences. The main improvements are the results of representing protein sequences using the LPQ feature representation on a Position Specific Scoring Matrix (PSSM), reducing the influence of noise using a Principal Component Analysis (PCA), and using a Relevance Vector Machine (RVM) based classifier. We perform 5-fold cross-validation experiments on Yeast and Human datasets, and we achieve very high accuracies of 92.65% and 97.62%, respectively, which is significantly better than previous works. To further evaluate the proposed method, we compare it with the state-of-the-art support vector machine (SVM) classifier on the Yeast dataset. The experimental results demonstrate that our RVM-LPQ method is obviously better than the SVM-based method. The promising experimental results show the efficiency and simplicity of the proposed method, which can be an automatic decision support tool for future proteomics research.
Collapse
|
14
|
Wang Y, Wang R, Jin VX. Inference of hierarchical regulatory network of TCF7L2 binding sites in MCF7 cell line. ACTA ACUST UNITED AC 2016; 9:25-53. [PMID: 28066512 DOI: 10.1504/ijcbdd.2016.074990] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The TCF7L2 transcription factor (TF) is a member of Wnt signalling pathway, and may influence transcription of several genes by binding to distinct regulatory regions. Genome-wide studies have identified thousands of TCF7L2 binding sites and have revealed some associated TF partners. However, there is still a large uncharted region in the hierarchical regulatory network for TCF7L2 and the partner TFs in MCF7 cells. We analysed ChIP-seq data by searching for motifs in the enriched peak region based on TF-specific position weight matrix (PWM). We found association of FOXO1 and CAD with up-regulated genes, AP2α, PBF and AP1 with down-regulated genes. TCF7L2 and GATA3 were found to be associated with both up and down-regulated genes. Our study uncovers new TCF7L2 associated regulatory networks by mining ChIP-seq data in MCF7 cell, which may contribute to further study of the mechanisms related to Wnt pathway in breast cancer or other diseases.
Collapse
Affiliation(s)
- Yao Wang
- Departments of Molecular Medicine and Epidemiology and Biostatistics, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA,
| | - Rui Wang
- School of Chemical and Environment Science, Shaanxi University of Technology, Hanzhong, Shaanxi 723000, China,
| | - Victor X Jin
- Departments of Molecular Medicine and Epidemiology and Biostatistics, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA,
| |
Collapse
|
15
|
Ligand-dependent genomic function of glucocorticoid receptor in triple-negative breast cancer. Nat Commun 2015; 6:8323. [PMID: 26374485 PMCID: PMC4573460 DOI: 10.1038/ncomms9323] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2014] [Accepted: 08/11/2015] [Indexed: 01/07/2023] Open
Abstract
Glucocorticoids (GCs) have been widely used as coadjuvants in the treatment of solid tumours, but GC treatment may be associated with poor pharmacotherapeutic response or prognosis. The genomic action of GC in these tumours is largely unknown. Here we find that dexamethasone (Dex, a synthetic GC)-regulated genes in triple-negative breast cancer (TNBC) cells are associated with drug resistance. Importantly, these GC-regulated genes are aberrantly expressed in TNBC patients and are associated with unfavourable clinical outcomes. Interestingly, in TNBC cells, Compound A (CpdA, a selective GR modulator) only regulates a small number of genes not involved in carcinogenesis and therapy resistance. Mechanistic studies using a ChIP-exo approach reveal that Dex- but not CpdA-liganded glucocorticoid receptor (GR) binds to a single glucocorticoid response element (GRE), which drives the expression of pro-tumorigenic genes. Our data suggest that development of safe coadjuvant therapy should consider the distinct genomic function between Dex- and CpdA-liganded GR.
Collapse
|
16
|
Chen Z, Lan X, Thomas-Ahner JM, Wu D, Liu X, Ye Z, Wang L, Sunkel B, Grenade C, Chen J, Zynger DL, Yan PS, Huang J, Nephew KP, Huang THM, Lin S, Clinton SK, Li W, Jin VX, Wang Q. Agonist and antagonist switch DNA motifs recognized by human androgen receptor in prostate cancer. EMBO J 2014; 34:502-16. [PMID: 25535248 DOI: 10.15252/embj.201490306] [Citation(s) in RCA: 71] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Human transcription factors recognize specific DNA sequence motifs to regulate transcription. It is unknown whether a single transcription factor is able to bind to distinctly different motifs on chromatin, and if so, what determines the usage of specific motifs. By using a motif-resolution chromatin immunoprecipitation-exonuclease (ChIP-exo) approach, we find that agonist-liganded human androgen receptor (AR) and antagonist-liganded AR bind to two distinctly different motifs, leading to distinct transcriptional outcomes in prostate cancer cells. Further analysis on clinical prostate tissues reveals that the binding of AR to these two distinct motifs is involved in prostate carcinogenesis. Together, these results suggest that unique ligands may switch DNA motifs recognized by ligand-dependent transcription factors in vivo. Our findings also provide a broad mechanistic foundation for understanding ligand-specific induction of gene expression profiles.
Collapse
Affiliation(s)
- Zhong Chen
- Department of Molecular Virology, Immunology and Medical Genetics and the Comprehensive Cancer Center, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Xun Lan
- Department of Biomedical Informatics, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Jennifer M Thomas-Ahner
- Division of Medical Oncology, Department of Internal Medicine and the Comprehensive Cancer Center, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Dayong Wu
- Department of Molecular Virology, Immunology and Medical Genetics and the Comprehensive Cancer Center, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Xiangtao Liu
- Department of Molecular Virology, Immunology and Medical Genetics and the Comprehensive Cancer Center, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Zhenqing Ye
- Department of Biomedical Informatics, The Ohio State University College of Medicine, Columbus, OH, USA Department of Molecular Medicine, Cancer Therapy and Research Center, University of Texas Health Science Center, San Antonio, TX, USA
| | - Liguo Wang
- Division of Biostatistics, Dan L. Duncan Cancer Center Baylor College of Medicine, Houston, TX, USA Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA
| | - Benjamin Sunkel
- Department of Molecular Virology, Immunology and Medical Genetics and the Comprehensive Cancer Center, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Cassandra Grenade
- Department of Molecular Virology, Immunology and Medical Genetics and the Comprehensive Cancer Center, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Junsheng Chen
- Division of Biostatistics, Dan L. Duncan Cancer Center Baylor College of Medicine, Houston, TX, USA Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA
| | - Debra L Zynger
- Department of Pathology, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Pearlly S Yan
- Department of Molecular Virology, Immunology and Medical Genetics and the Comprehensive Cancer Center, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Jiaoti Huang
- Departments of Pathology and Urology, Jonsson Comprehensive Cancer Center, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA Broad Center for Regenerative Medicine and Stem Cell Research, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Kenneth P Nephew
- Medical Sciences Program, Department of Cellular and Integrative Physiology, Indiana University School of Medicine, Bloomington, IN, USA
| | - Tim H-M Huang
- Department of Molecular Medicine, Cancer Therapy and Research Center, University of Texas Health Science Center, San Antonio, TX, USA
| | - Shili Lin
- Department of Statistics, The Ohio State University, Columbus, OH, USA
| | - Steven K Clinton
- Division of Medical Oncology, Department of Internal Medicine and the Comprehensive Cancer Center, The Ohio State University College of Medicine, Columbus, OH, USA
| | - Wei Li
- Division of Biostatistics, Dan L. Duncan Cancer Center Baylor College of Medicine, Houston, TX, USA Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA
| | - Victor X Jin
- Department of Biomedical Informatics, The Ohio State University College of Medicine, Columbus, OH, USA Department of Molecular Medicine, Cancer Therapy and Research Center, University of Texas Health Science Center, San Antonio, TX, USA
| | - Qianben Wang
- Department of Molecular Virology, Immunology and Medical Genetics and the Comprehensive Cancer Center, The Ohio State University College of Medicine, Columbus, OH, USA
| |
Collapse
|
17
|
Transcriptional regulation and spatial interactions of head-to-head genes. BMC Genomics 2014; 15:519. [PMID: 24962804 PMCID: PMC4089025 DOI: 10.1186/1471-2164-15-519] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2014] [Accepted: 06/19/2014] [Indexed: 11/10/2022] Open
Abstract
Background In eukaryotic genomes, about 10% of genes are arranged in a head-to-head (H2H) orientation, and the distance between the transcription start sites of each gene pair is closer than 1 kb. Two genes in an H2H pair are prone to co-express and co-function. There have been many studies on bidirectional promoters. However, the mechanism by which H2H genes are regulated at the transcriptional level still needs further clarification, especially with regard to the co-regulation of H2H pairs. In this study, we first used the Hi-C data of chromatin linkages to identify spatially interacting H2H pairs, and then integrated ChIP-seq data to compare H2H gene pairs with and without evidence of spatial interactions in terms of their binding transcription factors (TFs). Using ChIP-seq and DNase-seq data, histones and DNase associated with H2H pairs were identified. Furthermore, we looked into the connections between H2H genes in a human co-expression network. Results We found that i) Similar to the behaviour of two genes within an H2H pair (intra-H2H pair), a gene pair involving two distinct H2H pairs (inter-H2H pair) which interact with each other spatially, share common transcription factors (TFs); ii) TFs of intra- and inter-H2H pairs are distributed differently. Factors such as HEY1, GABP, Sin3Ak-20, POL2, E2F6, and c-MYC are essential for the bidirectional transcription of intra-H2H pairs; while factors like CTCF, BDP1, GATA2, RAD21, and POL3 play important roles in coherently regulating inter-H2H pairs; iii) H2H gene blocks are enriched with hypersensitive DNase and modified histones, which participate in active transcriptions; and iv) H2H genes tend to be highly connected compared with non-H2H genes in the human co-expression network. Conclusions Our findings shed new light on the mechanism of the transcriptional regulation of H2H genes through their linear and spatial interactions. For intra-H2H gene pairs, transcription factors regulate their transcriptions through bidirectional promoters, whereas for inter-H2H gene pairs, transcription factors are likely to regulate their activities depending on the spatial interaction of H2H gene pairs. In this way, two distinctive groups of transcription factors mediate intra- and inter-H2H gene transcriptions respectively, resulting in a highly compact gene regulatory network. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-519) contains supplementary material, which is available to authorized users.
Collapse
|
18
|
Tran NTL, Huang CH. A survey of motif finding Web tools for detecting binding site motifs in ChIP-Seq data. Biol Direct 2014; 9:4. [PMID: 24555784 PMCID: PMC4022013 DOI: 10.1186/1745-6150-9-4] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2013] [Revised: 01/08/2014] [Accepted: 02/11/2014] [Indexed: 12/24/2022] Open
Abstract
Abstract ChIP-Seq (chromatin immunoprecipitation sequencing) has provided the advantage for finding motifs as ChIP-Seq experiments narrow down the motif finding to binding site locations. Recent motif finding tools facilitate the motif detection by providing user-friendly Web interface. In this work, we reviewed nine motif finding Web tools that are capable for detecting binding site motifs in ChIP-Seq data. We showed each motif finding Web tool has its own advantages for detecting motifs that other tools may not discover. We recommended the users to use multiple motif finding Web tools that implement different algorithms for obtaining significant motifs, overlapping resemble motifs, and non-overlapping motifs. Finally, we provided our suggestions for future development of motif finding Web tool that better assists researchers for finding motifs in ChIP-Seq data. Reviewers This article was reviewed by Prof. Sandor Pongor, Dr. Yuriy Gusev, and Dr. Shyam Prabhakar (nominated by Prof. Limsoon Wong).
Collapse
Affiliation(s)
- Ngoc Tam L Tran
- Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Way, Unit 4155, Storrs, CT 06269, USA.
| | | |
Collapse
|
19
|
Abstract
BACKGROUND Modern genomic technologies produce large amounts of data that can be mapped to specific regions in the genome. Among the first steps in interpreting the results is annotation of genomic regions with known features such as genes, promoters, CpG islands etc. Several tools have been published to perform this task. However, using these tools often requires a significant amount of bioinformatics skills and/or downloading and installing dedicated software. RESULTS Here we present AnnotateGenomicRegions, a web application that accepts genomic regions as input and outputs a selection of overlapping and/or neighboring genome annotations. Supported organisms include human (hg18, hg19), mouse (mm8, mm9, mm10), zebrafish (danRer7), and Saccharomyces cerevisiae (sacCer2, sacCer3). AnnotateGenomicRegions is accessible online on a public server or can be installed locally. Some frequently used annotations and genomes are embedded in the application while custom annotations may be added by the user. CONCLUSIONS The increasing spread of genomic technologies generates the need for a simple-to-use annotation tool for genomic regions that can be used by biologists and bioinformaticians alike. AnnotateGenomicRegions meets this demand. AnnotateGenomicRegions is an open-source web application that can be installed on any personal computer or institute server. AnnotateGenomicRegions is available at: http://cru.genomics.iit.it/AnnotateGenomicRegions.
Collapse
Affiliation(s)
- Luca Zammataro
- Computational Research, Center for Genomic Science of IIT@SEMM, Istituto Italiano di Tecnologia (IIT), Via Adamello 16, 20139 Milan, Italy
| | - Rita DeMolfetta
- Computational Research, Center for Genomic Science of IIT@SEMM, Istituto Italiano di Tecnologia (IIT), Via Adamello 16, 20139 Milan, Italy
- European School of Molecular Medicine (SEMM), Via Adamello 16, 20139 Milan, Italy
| | - Gabriele Bucci
- Computational Research, Center for Genomic Science of IIT@SEMM, Istituto Italiano di Tecnologia (IIT), Via Adamello 16, 20139 Milan, Italy
| | - Arnaud Ceol
- Computational Research, Center for Genomic Science of IIT@SEMM, Istituto Italiano di Tecnologia (IIT), Via Adamello 16, 20139 Milan, Italy
| | - Heiko Muller
- Computational Research, Center for Genomic Science of IIT@SEMM, Istituto Italiano di Tecnologia (IIT), Via Adamello 16, 20139 Milan, Italy
| |
Collapse
|
20
|
Liu B, Yi J, Sv A, Lan X, Ma Y, Huang THM, Leone G, Jin VX. QChIPat: a quantitative method to identify distinct binding patterns for two biological ChIP-seq samples in different experimental conditions. BMC Genomics 2013; 14 Suppl 8:S3. [PMID: 24564479 PMCID: PMC4042236 DOI: 10.1186/1471-2164-14-s8-s3] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Many computational programs have been developed to identify enriched regions for a single biological ChIP-seq sample. Given that many biological questions are often asked to compare the difference between two different conditions, it is important to develop new programs that address the comparison of two biological ChIP-seq samples. Despite several programs designed to address this question, these programs suffer from some drawbacks, such as inability to distinguish whether the identified differential enriched regions are indeed significantly enriched, lack of distinguishing binding patterns, and neglect of the normalization between samples. Results In this study, we developed a novel quantitative method for comparing two biological ChIP-seq samples, called QChIPat. Our method employs a new global normalization method: nonparametric empirical Bayes (NEB) correction normalization, utilizes pre-defined enriched regions identified from single-sample peak calling programs, uses statistical methods to define differential enriched regions, then defines binding (histone modification) pattern information for those differential enriched regions. Our program was tested on a benchmark data: histone modifications data used by ChIPDiffs. It was then applied on two study cases: one to identify differential histone modification sites for ChIP-seq of H3K27me3 and H3K9me2 data in AKT1-transfected MCF10A cells; the other to identify differential binding sites for ChIP-seq of TCF7L2 data in MCF7 and PANC1 cells. Conclusions Several advantages of our program include: 1) it considers a control (or input) experiment; 2) it incorporates a novel global normalization strategy: nonparametric empirical Bayes correction normalization; 3) it provides the binding pattern information among different enriched regions. QChIPat is implemented in R, Perl and C++, and has been tested under Linux. The R package is available at http://motif.bmi.ohio-state.edu/QChIPat.
Collapse
|
21
|
Wang R, Hsu HK, Blattler A, Wang Y, Lan X, Wang Y, Hsu PY, Leu YW, Huang THM, Farnham PJ, Jin VX. LOcating non-unique matched tags (LONUT) to improve the detection of the enriched regions for ChIP-seq data. PLoS One 2013; 8:e67788. [PMID: 23825685 PMCID: PMC3692479 DOI: 10.1371/journal.pone.0067788] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2012] [Accepted: 05/23/2013] [Indexed: 12/21/2022] Open
Abstract
One big limitation of computational tools for analyzing ChIP-seq data is that most of them ignore non-unique tags (NUTs) that match the human genome even though NUTs comprise up to 60% of all raw tags in ChIP-seq data. Effectively utilizing these NUTs would increase the sequencing depth and allow a more accurate detection of enriched binding sites, which in turn could lead to more precise and significant biological interpretations. In this study, we have developed a computational tool, LOcating Non-Unique matched Tags (LONUT), to improve the detection of enriched regions from ChIP-seq data. Our LONUT algorithm applies a linear and polynomial regression model to establish an empirical score (ES) formula by considering two influential factors, the distance of NUTs to peaks identified using uniquely matched tags (UMTs) and the enrichment score for those peaks resulting in each NUT being assigned to a unique location on the reference genome. The newly located tags from the set of NUTs are combined with the original UMTs to produce a final set of combined matched tags (CMTs). LONUT was tested on many different datasets representing three different characteristics of biological data types. The detected sites were validated using de novo motif discovery and ChIP-PCR. We demonstrate the specificity and accuracy of LONUT and show that our program not only improves the detection of binding sites for ChIP-seq, but also identifies additional binding sites.
Collapse
Affiliation(s)
- Rui Wang
- Department of Chemistry, Lanzhou University, Lanzhou, China
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, United States of America
| | - Hang-Kai Hsu
- Department of Molecular Medicine, Institute of Biotechnology, University of Texas Health Science Center, San Antonio, Texas, United States of America
| | - Adam Blattler
- Department of Biochemistry and Molecular Biology, Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, United States of America
- Genetic Graduate Group, University of California-Davis, Davis, California, United States of America
| | - Yisong Wang
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, United States of America
| | - Xun Lan
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, United States of America
| | - Yao Wang
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, United States of America
| | - Pei-Yin Hsu
- Department of Molecular Medicine, Institute of Biotechnology, University of Texas Health Science Center, San Antonio, Texas, United States of America
| | - Yu-Wei Leu
- Human Epigenomics Center, Department of Life Science, Institute of Molecular Biology and Institute of Biomedical Science, National Chung Cheng University, Chia-Yi, Taiwan
| | - Tim H.-M. Huang
- Department of Molecular Medicine, Institute of Biotechnology, University of Texas Health Science Center, San Antonio, Texas, United States of America
| | - Peggy J. Farnham
- Department of Biochemistry and Molecular Biology, Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, United States of America
| | - Victor X. Jin
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, United States of America
- * E-mail:
| |
Collapse
|
22
|
Bao Y, Vinciotti V, Wit E, 't Hoen PAC. Accounting for immunoprecipitation efficiencies in the statistical analysis of ChIP-seq data. BMC Bioinformatics 2013; 14:169. [PMID: 23721376 PMCID: PMC3717085 DOI: 10.1186/1471-2105-14-169] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2012] [Accepted: 05/21/2013] [Indexed: 12/25/2022] Open
Abstract
Background ImmunoPrecipitation (IP) efficiencies may vary largely between different antibodies and between repeated experiments with the same antibody. These differences have a large impact on the quality of ChIP-seq data: a more efficient experiment will necessarily lead to a higher signal to background ratio, and therefore to an apparent larger number of enriched regions, compared to a less efficient experiment. In this paper, we show how IP efficiencies can be explicitly accounted for in the joint statistical modelling of ChIP-seq data. Results We fit a latent mixture model to eight experiments on two proteins, from two laboratories where different antibodies are used for the two proteins. We use the model parameters to estimate the efficiencies of individual experiments, and find that these are clearly different for the different laboratories, and amongst technical replicates from the same lab. When we account for ChIP efficiency, we find more regions bound in the more efficient experiments than in the less efficient ones, at the same false discovery rate. A priori knowledge of the same number of binding sites across experiments can also be included in the model for a more robust detection of differentially bound regions among two different proteins. Conclusions We propose a statistical model for the detection of enriched and differentially bound regions from multiple ChIP-seq data sets. The framework that we present accounts explicitly for IP efficiencies in ChIP-seq data, and allows to model jointly, rather than individually, replicates and experiments from different proteins, leading to more robust biological conclusions.
Collapse
Affiliation(s)
- Yanchun Bao
- School of Information Systems, Computing and Mathematics, Brunel University, London, UK
| | | | | | | |
Collapse
|
23
|
Blattler A, Yao L, Wang Y, Ye Z, Jin VX, Farnham PJ. ZBTB33 binds unmethylated regions of the genome associated with actively expressed genes. Epigenetics Chromatin 2013; 6:13. [PMID: 23693142 PMCID: PMC3663758 DOI: 10.1186/1756-8935-6-13] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2013] [Accepted: 04/16/2013] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND DNA methylation and repressive histone modifications cooperate to silence promoters. One mechanism by which regions of methylated DNA could acquire repressive histone modifications is via methyl DNA-binding transcription factors. The zinc finger protein ZBTB33 (also known as Kaiso) has been shown in vitro to bind preferentially to methylated DNA and to interact with the SMRT/NCoR histone deacetylase complexes. We have performed bioinformatic analyses of Kaiso ChIP-seq and DNA methylation datasets to test a model whereby binding of Kaiso to methylated CpGs leads to loss of acetylated histones at target promoters. RESULTS Our results suggest that, contrary to expectations, Kaiso does not bind to methylated DNA in vivo but instead binds to highly active promoters that are marked with high levels of acetylated histones. In addition, our studies suggest that DNA methylation and nucleosome occupancy patterns restrict access of Kaiso to potential binding sites and influence cell type-specific binding. CONCLUSIONS We propose a new model for the genome-wide binding and function of Kaiso whereby Kaiso binds to unmethylated regulatory regions and contributes to the active state of target promoters.
Collapse
Affiliation(s)
- Adam Blattler
- Department of Biochemistry & Molecular Biology, Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA, 90089, USA
- Genetics Graduate Group, University of California-Davis, Davis, CA, 95616, USA
| | - Lijing Yao
- Department of Biochemistry & Molecular Biology, Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA, 90089, USA
| | - Yao Wang
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Zhenqing Ye
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Victor X Jin
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Peggy J Farnham
- Department of Biochemistry & Molecular Biology, Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA, 90089, USA
| |
Collapse
|
24
|
van den Oord EJCG, Bukszar J, Rudolf G, Nerella S, McClay JL, Xie LY, Aberg KA. Estimation of CpG coverage in whole methylome next-generation sequencing studies. BMC Bioinformatics 2013; 14:50. [PMID: 23398781 PMCID: PMC3599116 DOI: 10.1186/1471-2105-14-50] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2012] [Accepted: 02/08/2013] [Indexed: 01/28/2023] Open
Abstract
Background Methylation studies are a promising complement to genetic studies of DNA sequence. However, detailed prior biological knowledge is typically lacking, so methylome-wide association studies (MWAS) will be critical to detect disease relevant sites. A cost-effective approach involves the next-generation sequencing (NGS) of single-end libraries created from samples that are enriched for methylated DNA fragments. A limitation of single-end libraries is that the fragment size distribution is not observed. This hampers several aspects of the data analysis such as the calculation of enrichment measures that are based on the number of fragments covering the CpGs. Results We developed a non-parametric method that uses isolated CpGs to estimate sample-specific fragment size distributions from the empirical sequencing data. Through simulations we show that our method is highly accurate. While the traditional (extended) read count methods resulted in severely biased coverage estimates and introduces artificial inter-individual differences, through the use of the estimated fragment size distributions we could remove these biases almost entirely. Furthermore, we found correlations of 0.999 between coverage estimates obtained using fragment size distributions that were estimated with our method versus those that were “observed” in paired-end sequencing data. Conclusions We propose a non-parametric method for estimating fragment size distributions that is highly precise and can improve the analysis of cost-effective MWAS studies that sequence single-end libraries created from samples that are enriched for methylated DNA fragments.
Collapse
Affiliation(s)
- Edwin J C G van den Oord
- Center for Biomarker Research and Personalized Medicine, School of Pharmacy, Virginia Commonwealth University, 1112 East Clay Street, P.O. Box 980533, Richmond, VA 23298, USA.
| | | | | | | | | | | | | |
Collapse
|
25
|
Wang J, Lan X, Hsu PY, Hsu HK, Huang K, Parvin J, Huang THM, Jin VX. Genome-wide analysis uncovers high frequency, strong differential chromosomal interactions and their associated epigenetic patterns in E2-mediated gene regulation. BMC Genomics 2013; 14:70. [PMID: 23368971 PMCID: PMC3599885 DOI: 10.1186/1471-2164-14-70] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2012] [Accepted: 01/26/2013] [Indexed: 01/07/2023] Open
Abstract
Background An emerging Hi-C protocol has the ability to probe three-dimensional (3D) architecture and capture chromatin interactions in a genome-wide scale. It provides informative results to address how chromatin organization changes contribute to disease/tumor occurrence and progression in response to stimulation of environmental chemicals or hormones. Results In this study, using MCF7 cells as a model system, we found estrogen stimulation significantly impact chromatin interactions, leading to alteration of gene regulation and the associated histone modification states. Many chromosomal interaction regions at different levels of interaction frequency were identified. In particular, the top 10 hot regions with the highest interaction frequency are enriched with breast cancer specific genes. Furthermore, four types of E2-mediated strong differential (gain- or loss-) chromosomal (intra- or inter-) interactions were classified, in which the number of gain-chromosomal interactions is less than the number of loss-chromosomal interactions upon E2 stimulation. Finally, by integrating with eight histone modification marks, DNA methylation, regulatory elements regions, ERα and Pol-II binding activities, associations between epigenetic patterns and high chromosomal interaction frequency were revealed in E2-mediated gene regulation. Conclusions The work provides insight into the effect of chromatin interaction on E2/ERα regulated downstream genes in breast cancer cells.
Collapse
Affiliation(s)
- Junbai Wang
- Department of Pathology, Oslo University Hospital - Norwegian Radium Hospital, Montebello, 0310 Oslo, Norway.
| | | | | | | | | | | | | | | |
Collapse
|
26
|
Taskesen E, Wouters B, Delwel R. HAT: a novel statistical approach to discover functional regions in the genome. Methods Mol Biol 2013; 1067:125-141. [PMID: 23975790 DOI: 10.1007/978-1-62703-607-8_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Tiling arrays are useful for exploring local functions of regions of the genome in an unbiased fashion. The exact determination of those genomic regions based on tiling-array data, e.g., generated by means of hybridization with immunopreciptated DNA-fragments to the arrays is a challenge. Many different statistical methodologies have been developed to find biological relevant regions-of-interest (ROI) by using the quantitative signal intensity of each probe. We previously developed a method called Hypergeometric Analysis of Tiling arrays (HAT) for the analysis of tiling-array data, but it is developed such that it can also be used to study data derived by genome-wide deep sequencing approaches. Here we applied HAT to analyze two publicly available tiling-array data sets. After the detection of statistically significant ROI, these are often used in additional analysis for hypothesis testing. We therefore discuss, by using the results of the tiling-array experiment, pathway and motif analyses.
Collapse
Affiliation(s)
- Erdogan Taskesen
- Department of Hematology, Erasmus University Medical Center, Rotterdam, The Netherlands
| | | | | |
Collapse
|
27
|
Abstract
Current limitation in cancer genomic studies is a lack of the integration of various omics data generated through next generation sequencing technologies, as well as a lack of the sounding and comprehensive epigenomic and genomic information about a particular cancer cell type. In this review, we will discuss main aspects of current genomics research with its application in cancer topics. We will first overview the next-generation sequencing technologies, then outline the major computational approaches, particularly focusing on ChIP-based omics data, and list several remaining open questions facing computational biologists, further present regulatory network analysis inferred from the ChIP-based omics data; finally implicate the clinical outcomes from the network and pathway analysis.
Collapse
Affiliation(s)
- Binhua Tang
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA
| | | | | | | |
Collapse
|
28
|
Bonneville R, Jin VX. A hidden Markov model to identify combinatorial epigenetic regulation patterns for estrogen receptor α target genes. ACTA ACUST UNITED AC 2012; 29:22-8. [PMID: 23104890 DOI: 10.1093/bioinformatics/bts639] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
MOTIVATION Many studies have shown that epigenetic changes, such as altered DNA methylation and histone modifications, are linked to estrogen receptor α (ERα)-positive tumors and disease prognoses. Several recent studies have applied high-throughput technologies such as ChIP-seq and MBD-seq to interrogate the altered architectures of ERα regulation in tamoxifen (Tam)-resistant breast cancer cells. However, the details of combinatorial epigenetic regulation of ERα target genes in breast cancers with acquired Tam resistance have not yet been fully examined. RESULTS We developed a computational approach to identify and analyze epigenetic patterns associated with Tam resistance in the MCF7-T cell line as opposed to the Tam-sensitive MCF7 cell line, with the goal of understanding the underlying mechanisms of epigenetic regulatory influence on resistance to Tam treatment in breast cancer. In this study, we used ChIP-seq of ERα, RNA polymerase II, three histone modifications and MBD-seq data of DNA methylation in MCF7 and MCF7-T cells to train hidden Markov models (HMMs). We applied the Bayesian information criterion to determine that a 20-state HMM was best, which was reduced to a 14-state HMM with a Bayesian information criterion score of 1.21291 × 10(7). We further identified four classes of biologically meaningful states in this breast cancer cell model system, and a set of ERα combinatorial epigenetic regulated target genes. The correlated gene expression level and gene ontology analyses showed that different gene ontology terms were enriched with Tam-resistant versus sensitive breast cancer cells. Our study illustrates the applicability of HMM-based analysis of genome-wide high-throughput genomic data to study epigenetic influences on E2/ERα regulation in breast cancer.
Collapse
Affiliation(s)
- Russell Bonneville
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43210, USA
| | | |
Collapse
|
29
|
Frietze S, Wang R, Yao L, Tak YG, Ye Z, Gaddis M, Witt H, Farnham PJ, Jin VX. Cell type-specific binding patterns reveal that TCF7L2 can be tethered to the genome by association with GATA3. Genome Biol 2012; 13:R52. [PMID: 22951069 PMCID: PMC3491396 DOI: 10.1186/gb-2012-13-9-r52] [Citation(s) in RCA: 87] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2012] [Revised: 03/09/2012] [Accepted: 05/25/2012] [Indexed: 12/23/2022] Open
Abstract
Background The TCF7L2 transcription factor is linked to a variety of human diseases, including type 2 diabetes and cancer. One mechanism by which TCF7L2 could influence expression of genes involved in diverse diseases is by binding to distinct regulatory regions in different tissues. To test this hypothesis, we performed ChIP-seq for TCF7L2 in six human cell lines. Results We identified 116,000 non-redundant TCF7L2 binding sites, with only 1,864 sites common to the six cell lines. Using ChIP-seq, we showed that many genomic regions that are marked by both H3K4me1 and H3K27Ac are also bound by TCF7L2, suggesting that TCF7L2 plays a critical role in enhancer activity. Bioinformatic analysis of the cell type-specific TCF7L2 binding sites revealed enrichment for multiple transcription factors, including HNF4alpha and FOXA2 motifs in HepG2 cells and the GATA3 motif in MCF7 cells. ChIP-seq analysis revealed that TCF7L2 co-localizes with HNF4alpha and FOXA2 in HepG2 cells and with GATA3 in MCF7 cells. Interestingly, in MCF7 cells the TCF7L2 motif is enriched in most TCF7L2 sites but is not enriched in the sites bound by both GATA3 and TCF7L2. This analysis suggested that GATA3 might tether TCF7L2 to the genome at these sites. To test this hypothesis, we depleted GATA3 in MCF7 cells and showed that TCF7L2 binding was lost at a subset of sites. RNA-seq analysis suggested that TCF7L2 represses transcription when tethered to the genome via GATA3. Conclusions Our studies demonstrate a novel relationship between GATA3 and TCF7L2, and reveal important insights into TCF7L2-mediated gene regulation.
Collapse
Affiliation(s)
- Seth Frietze
- Department of Biochemistry and Molecular Biology, Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA 90089, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Lan X, Farnham PJ, Jin VX. Uncovering transcription factor modules using one- and three-dimensional analyses. J Biol Chem 2012; 287:30914-21. [PMID: 22952238 DOI: 10.1074/jbc.r111.309229] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
Transcriptional regulation is a critical mediator of many normal cellular processes, as well as disease progression. Transcription factors (TFs) often co-localize at cis-regulatory elements on the DNA, form protein complexes, and collaboratively regulate gene expression. Machine learning and Bayesian approaches have been used to identify TF modules in a one-dimensional context. However, recent studies using high throughput technologies have shown that TF interactions should also be considered in three-dimensional nuclear space. Here, we describe methods for identifying TF modules and discuss how moving from a one-dimensional to a three-dimensional paradigm, along with integrated experimental and computational approaches, can lead to a better understanding of TF association networks.
Collapse
Affiliation(s)
- Xun Lan
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio 43210, USA
| | | | | |
Collapse
|
31
|
Lan X, Witt H, Katsumura K, Ye Z, Wang Q, Bresnick EH, Farnham PJ, Jin VX. Integration of Hi-C and ChIP-seq data reveals distinct types of chromatin linkages. Nucleic Acids Res 2012; 40:7690-704. [PMID: 22675074 PMCID: PMC3439894 DOI: 10.1093/nar/gks501] [Citation(s) in RCA: 83] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
We have analyzed publicly available K562 Hi-C data, which enable genome-wide unbiased capturing of chromatin interactions, using a Mixture Poisson Regression Model and a power-law decay background to define a highly specific set of interacting genomic regions. We integrated multiple ENCODE Consortium resources with the Hi-C data, using DNase-seq data and ChIP-seq data for 45 transcription factors and 9 histone modifications. We classified 12 different sets (clusters) of interacting loci that can be distinguished by their chromatin modifications and which can be categorized into two types of chromatin linkages. The different clusters of loci display very different relationships with transcription factor-binding sites. As expected, many of the transcription factors show binding patterns specific to clusters composed of interacting loci that encompass promoters or enhancers. However, cluster 9, which is distinguished by marks of open chromatin but not by active enhancer or promoter marks, was not bound by most transcription factors but was highly enriched for three transcription factors (GATA1, GATA2 and c-Jun) and three chromatin modifiers (BRG1, INI1 and SIRT6). To investigate the impact of chromatin organization on gene regulation, we performed ribonucleicacid-seq analyses before and after knockdown of GATA1 or GATA2. We found that knockdown of the GATA factors not only alters the expression of genes having a nearby bound GATA but also affects expression of genes in interacting loci. Our work, in combination with previous studies linking regulation by GATA factors with c-Jun and BRG1, provides genome-wide evidence that Hi-C data identify sets of biologically relevant interacting loci.
Collapse
Affiliation(s)
- Xun Lan
- Department of Biomedical Informatics, 460 W 12th Avenue, 212 BRT, The Ohio State University, Columbus, OH 43210, USA
| | | | | | | | | | | | | | | |
Collapse
|
32
|
Micsinai M, Parisi F, Strino F, Asp P, Dynlacht BD, Kluger Y. Picking ChIP-seq peak detectors for analyzing chromatin modification experiments. Nucleic Acids Res 2012; 40:e70. [PMID: 22307239 PMCID: PMC3351193 DOI: 10.1093/nar/gks048] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Numerous algorithms have been developed to analyze ChIP-Seq data. However, the complexity of analyzing diverse patterns of ChIP-Seq signals, especially for epigenetic marks, still calls for the development of new algorithms and objective comparisons of existing methods. We developed Qeseq, an algorithm to detect regions of increased ChIP read density relative to background. Qeseq employs critical novel elements, such as iterative recalibration and neighbor joining of reads to identify enriched regions of any length. To objectively assess its performance relative to other 14 ChIP-Seq peak finders, we designed a novel protocol based on Validation Discriminant Analysis (VDA) to optimally select validation sites and generated two validation datasets, which are the most comprehensive to date for algorithmic benchmarking of key epigenetic marks. In addition, we systematically explored a total of 315 diverse parameter configurations from these algorithms and found that typically optimal parameters in one dataset do not generalize to other datasets. Nevertheless, default parameters show the most stable performance, suggesting that they should be used. This study also provides a reproducible and generalizable methodology for unbiased comparative analysis of high-throughput sequencing tools that can facilitate future algorithmic development.
Collapse
Affiliation(s)
- Mariann Micsinai
- Yale University School of Medicine, Department of Pathology, New Haven, CT 06520, USA
| | | | | | | | | | | |
Collapse
|
33
|
Kennedy BA, Deatherage DE, Gu F, Tang B, Chan MWY, Nephew KP, Huang THM, Jin VX. ChIP-seq defined genome-wide map of TGFβ/SMAD4 targets: implications with clinical outcome of ovarian cancer. PLoS One 2011; 6:e22606. [PMID: 21799915 PMCID: PMC3143154 DOI: 10.1371/journal.pone.0022606] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2011] [Accepted: 06/26/2011] [Indexed: 12/11/2022] Open
Abstract
Deregulation of the transforming growth factor-β (TGFβ) signaling pathway in epithelial ovarian cancer has been reported, but the precise mechanism underlying disrupted TGFβ signaling in the disease remains unclear. We performed chromatin immunoprecipitation followed by sequencing (ChIP-seq) to investigate genome-wide screening of TGFβ-induced SMAD4 binding in epithelial ovarian cancer. Following TGFβ stimulation of the A2780 epithelial ovarian cancer cell line, we identified 2,362 SMAD4 binding loci and 318 differentially expressed SMAD4 target genes. Comprehensive examination of SMAD4-bound loci, revealed four distinct binding patterns: 1) Basal; 2) Shift; 3) Stimulated Only; 4) Unstimulated Only. TGFβ stimulated SMAD4-bound loci were primarily classified as either Stimulated only (74%) or Shift (25%), indicating that TGFβ-stimulation alters SMAD4 binding patterns in epithelial ovarian cancer cells. Furthermore, based on gene regulatory network analysis, we determined that the TGFβ-induced, SMAD4-dependent regulatory network was strikingly different in ovarian cancer compared to normal cells. Importantly, the TGFβ/SMAD4 target genes identified in the A2780 epithelial ovarian cancer cell line were predictive of patient survival, based on in silico mining of publically available patient data bases. In conclusion, our data highlight the utility of next generation sequencing technology to identify genome-wide SMAD4 target genes in epithelial ovarian cancer and link aberrant TGFβ/SMAD signaling to ovarian tumorigenesis. Furthermore, the identified SMAD4 binding loci, combined with gene expression profiling and in silico data mining of patient cohorts, may provide a powerful approach to determine potential gene signatures with biological and future translational research in ovarian and other cancers.
Collapse
Affiliation(s)
- Brian A. Kennedy
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, United States of America
| | - Daniel E. Deatherage
- Human Cancer Genetics Program, The Ohio State University, Columbus, Ohio, United States of America
| | - Fei Gu
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, United States of America
| | - Binhua Tang
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, United States of America
| | - Michael W. Y. Chan
- Department of Life Science, National Chung Cheng University, Min-Hsiung, Chia-Yi, Taiwan, Republic of China
| | - Kenneth P. Nephew
- Medical Sciences, Indiana University School of Medicine, Bloomington, Indiana, United States of America
| | - Tim H-M. Huang
- Human Cancer Genetics Program, The Ohio State University, Columbus, Ohio, United States of America
| | - Victor X. Jin
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, United States of America
- * E-mail:
| |
Collapse
|
34
|
Lan X, Adams C, Landers M, Dudas M, Krissinger D, Marnellos G, Bonneville R, Xu M, Wang J, Huang THM, Meredith G, Jin VX. High resolution detection and analysis of CpG dinucleotides methylation using MBD-Seq technology. PLoS One 2011; 6:e22226. [PMID: 21779396 PMCID: PMC3136941 DOI: 10.1371/journal.pone.0022226] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2011] [Accepted: 06/19/2011] [Indexed: 01/22/2023] Open
Abstract
Methyl-CpG binding domain protein sequencing (MBD-seq) is widely used to survey DNA methylation patterns. However, the optimal experimental parameters for MBD-seq remain unclear and the data analysis remains challenging. In this study, we generated high depth MBD-seq data in MCF-7 cell and developed a bi-asymmetric-Laplace model (BALM) to perform data analysis. We found that optimal efficiency of MBD-seq experiments was achieved by sequencing ∼100 million unique mapped tags from a combination of 500 mM and 1000 mM salt concentration elution in MCF-7 cells. Clonal bisulfite sequencing results showed that the methylation status of each CpG dinucleotides in the tested regions was accurately detected with high resolution using the proposed model. These results demonstrated the combination of MBD-seq and BALM could serve as a useful tool to investigate DNA methylome due to its low cost, high specificity, efficiency and resolution.
Collapse
Affiliation(s)
- Xun Lan
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, United States of America
| | | | - Mark Landers
- Life Technologies, Carlsbad, California, United States of America
| | - Miroslav Dudas
- Life Technologies, Carlsbad, California, United States of America
| | | | - George Marnellos
- Life Technologies, Carlsbad, California, United States of America
| | - Russell Bonneville
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, United States of America
| | - Maoxiong Xu
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, United States of America
| | - Junbai Wang
- Department of Pathology, The Norwegian Radium Hospital, Oslo University, Oslo, Norway
| | - Tim H.-M. Huang
- Human Cancer Genetics Program, The Ohio State University, Columbus, Ohio, United States of America
| | - Gavin Meredith
- Life Technologies, Carlsbad, California, United States of America
| | - Victor X. Jin
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, United States of America
- * E-mail:
| |
Collapse
|
35
|
Giannopoulou EG, Elemento O. An integrated ChIP-seq analysis platform with customizable workflows. BMC Bioinformatics 2011; 12:277. [PMID: 21736739 PMCID: PMC3145611 DOI: 10.1186/1471-2105-12-277] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2011] [Accepted: 07/07/2011] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Chromatin immunoprecipitation followed by next generation sequencing (ChIP-seq), enables unbiased and genome-wide mapping of protein-DNA interactions and epigenetic marks. The first step in ChIP-seq data analysis involves the identification of peaks (i.e., genomic locations with high density of mapped sequence reads). The next step consists of interpreting the biological meaning of the peaks through their association with known genes, pathways, regulatory elements, and integration with other experiments. Although several programs have been published for the analysis of ChIP-seq data, they often focus on the peak detection step and are usually not well suited for thorough, integrative analysis of the detected peaks. RESULTS To address the peak interpretation challenge, we have developed ChIPseeqer, an integrative, comprehensive, fast and user-friendly computational framework for in-depth analysis of ChIP-seq datasets. The novelty of our approach is the capability to combine several computational tools in order to create easily customized workflows that can be adapted to the user's needs and objectives. In this paper, we describe the main components of the ChIPseeqer framework, and also demonstrate the utility and diversity of the analyses offered, by analyzing a published ChIP-seq dataset. CONCLUSIONS ChIPseeqer facilitates ChIP-seq data analysis by offering a flexible and powerful set of computational tools that can be used in combination with one another. The framework is freely available as a user-friendly GUI application, but all programs are also executable from the command line, thus providing flexibility and automatability for advanced users.
Collapse
Affiliation(s)
- Eugenia G Giannopoulou
- HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, 1305 York Avenue, New York, NY 10021, USA
| | | |
Collapse
|
36
|
Muiño JM, Hoogstraat M, van Ham RCHJ, van Dijk ADJ. PRI-CAT: a web-tool for the analysis, storage and visualization of plant ChIP-seq experiments. Nucleic Acids Res 2011; 39:W524-7. [PMID: 21609962 PMCID: PMC3125775 DOI: 10.1093/nar/gkr373] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Although several tools for the analysis of ChIP-seq data have been published recently, there is a growing demand, in particular in the plant research community, for computational resources with which such data can be processed, analyzed, stored, visualized and integrated within a single, user-friendly environment. To accommodate this demand, we have developed PRI-CAT (Plant Research International ChIP-seq analysis tool), a web-based workflow tool for the management and analysis of ChIP-seq experiments. PRI-CAT is currently focused on Arabidopsis, but will be extended with other plant species in the near future. Users can directly submit their sequencing data to PRI-CAT for automated analysis. A QuickLoad server compatible with genome browsers is implemented for the storage and visualization of DNA-binding maps. Submitted datasets and results can be made publicly available through PRI-CAT, a feature that will enable community-based integrative analysis and visualization of ChIP-seq experiments. Secondary analysis of data can be performed with the aid of GALAXY, an external framework for tool and data integration. PRI-CAT is freely available at http://www.ab.wur.nl/pricat. No login is required.
Collapse
Affiliation(s)
- Jose M Muiño
- Applied Bioinformatics, Plant Research International, PO Box 619, 6700 AP Wageningen, The Netherlands.
| | | | | | | |
Collapse
|
37
|
Cao AR, Rabinovich R, Xu M, Xu X, Jin VX, Farnham PJ. Genome-wide analysis of transcription factor E2F1 mutant proteins reveals that N- and C-terminal protein interaction domains do not participate in targeting E2F1 to the human genome. J Biol Chem 2011; 286:11985-96. [PMID: 21310950 PMCID: PMC3069401 DOI: 10.1074/jbc.m110.217158] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Previous studies of E2F family members have suggested that protein-protein interactions may be the mechanism by which E2F proteins are recruited to specific genomic regions. We have addressed this hypothesis on a genome-wide scale using ChIP-seq analysis of MCF7 cell lines that express tagged wild type and mutant E2F1 proteins. First, we performed ChIP-seq for tagged WT E2F1. Then, we analyzed E2F1 proteins that lacked the N-terminal SP1 and cyclin A binding domains, the C-terminal transactivation and pocket protein binding domains, and the internal marked box domain. Surprisingly, we found that the ChIP-seq patterns of the mutant proteins were identical to that of WT E2F1. However, mutation of the DNA binding domain abrogated all E2F1 binding to the genome. These results suggested that the interaction between the E2F1 DNA binding domain and a consensus motif may be the primary determinant of E2F1 recruitment. To address this possibility, we analyzed the in vivo binding sites for the in vitro-derived consensus E2F1 motif (TTTSSCGC) and also performed de novo motif analysis. We found that only 12% of the ChIP-seq peaks contained the TTTSSCGC motif. De novo motif analysis indicated that most of the in vivo sites lacked the 5′ half of the in vitro-derived consensus, having instead the in vivo consensus of CGCGC. In summary, our findings do not provide support for the model that protein-protein interactions are involved in recruiting E2F1 to the genome, but rather suggest that recognition of a motif found at most human promoters is the critical determinant.
Collapse
Affiliation(s)
- Alina R Cao
- Genome Center, University of California, Davis, California 95616, USA
| | | | | | | | | | | |
Collapse
|