1
|
Qin Q, Mei S, Wu Q, Sun H, Li L, Taing L, Chen S, Li F, Liu T, Zang C, Xu H, Chen Y, Meyer CA, Zhang Y, Brown M, Long HW, Liu XS. ChiLin: a comprehensive ChIP-seq and DNase-seq quality control and analysis pipeline. BMC Bioinformatics 2016; 17:404. [PMID: 27716038 PMCID: PMC5048594 DOI: 10.1186/s12859-016-1274-4] [Citation(s) in RCA: 78] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2016] [Accepted: 09/21/2016] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Transcription factor binding, histone modification, and chromatin accessibility studies are important approaches to understanding the biology of gene regulation. ChIP-seq and DNase-seq have become the standard techniques for studying protein-DNA interactions and chromatin accessibility respectively, and comprehensive quality control (QC) and analysis tools are critical to extracting the most value from these assay types. Although many analysis and QC tools have been reported, few combine ChIP-seq and DNase-seq data analysis and quality control in a unified framework with a comprehensive and unbiased reference of data quality metrics. RESULTS ChiLin is a computational pipeline that automates the quality control and data analyses of ChIP-seq and DNase-seq data. It is developed using a flexible and modular software framework that can be easily extended and modified. ChiLin is ideal for batch processing of many datasets and is well suited for large collaborative projects involving ChIP-seq and DNase-seq from different designs. ChiLin generates comprehensive quality control reports that include comparisons with historical data derived from over 23,677 public ChIP-seq and DNase-seq samples (11,265 datasets) from eight literature-based classified categories. To the best of our knowledge, this atlas represents the most comprehensive ChIP-seq and DNase-seq related quality metric resource currently available. These historical metrics provide useful heuristic quality references for experiment across all commonly used assay types. Using representative datasets, we demonstrate the versatility of the pipeline by applying it to different assay types of ChIP-seq data. The pipeline software is available open source at https://github.com/cfce/chilin . CONCLUSION ChiLin is a scalable and powerful tool to process large batches of ChIP-seq and DNase-seq datasets. The analysis output and quality metrics have been structured into user-friendly directories and reports. We have successfully compiled 23,677 profiles into a comprehensive quality atlas with fine classification for users.
Collapse
Affiliation(s)
- Qian Qin
- Shanghai Key laboratory of tuberculosis, Shanghai Pulmonary Hospital, Shanghai, China
- Department of Bioinformatics, School of Life Science and Technology, Tongji University, Shanghai, China
| | - Shenglin Mei
- Shanghai Key laboratory of tuberculosis, Shanghai Pulmonary Hospital, Shanghai, China
- Department of Bioinformatics, School of Life Science and Technology, Tongji University, Shanghai, China
| | - Qiu Wu
- Shanghai Key laboratory of tuberculosis, Shanghai Pulmonary Hospital, Shanghai, China
- Department of Bioinformatics, School of Life Science and Technology, Tongji University, Shanghai, China
| | - Hanfei Sun
- Shanghai Key laboratory of tuberculosis, Shanghai Pulmonary Hospital, Shanghai, China
- Department of Bioinformatics, School of Life Science and Technology, Tongji University, Shanghai, China
| | - Lewyn Li
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA USA
| | - Len Taing
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, Boston, MA USA
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA USA
| | - Sujun Chen
- Shanghai Key laboratory of tuberculosis, Shanghai Pulmonary Hospital, Shanghai, China
- Department of Bioinformatics, School of Life Science and Technology, Tongji University, Shanghai, China
| | - Fugen Li
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA USA
| | - Tao Liu
- Department of Biochemistry, University at Buffalo, Buffalo, NY USA
| | - Chongzhi Zang
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, Boston, MA USA
| | - Han Xu
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, Boston, MA USA
| | - Yiwen Chen
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, Boston, MA USA
| | - Clifford A. Meyer
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, Boston, MA USA
| | - Yong Zhang
- Department of Bioinformatics, School of Life Science and Technology, Tongji University, Shanghai, China
| | - Myles Brown
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA USA
- Division of Molecular and Cellular Oncology, Department of Medical Oncology, Dana-Farber Cancer Institute and Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA USA
| | - Henry W. Long
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA USA
| | - X. Shirley Liu
- Shanghai Key laboratory of tuberculosis, Shanghai Pulmonary Hospital, Shanghai, China
- Department of Bioinformatics, School of Life Science and Technology, Tongji University, Shanghai, China
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, Boston, MA USA
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA USA
| |
Collapse
|