1
|
Bang I, Khanh Nong L, Young Park J, Thi Le H, Mok Lee S, Kim D. ChEAP: ChIP-exo analysis pipeline and the investigation of Escherichia coli RpoN protein-DNA interactions. Comput Struct Biotechnol J 2022; 21:99-104. [PMID: 36544470 PMCID: PMC9735260 DOI: 10.1016/j.csbj.2022.11.053] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 11/25/2022] [Accepted: 11/25/2022] [Indexed: 12/03/2022] Open
Abstract
Genome-scale studies of the bacterial regulatory network have been leveraged by declining sequencing cost and advances in ChIP (chromatin immunoprecipitation) methods. Of which, ChIP-exo has proven competent with its near-single base-pair resolution. While several algorithms and programs have been developed for different analytical steps in ChIP-exo data processing, there is a lack of effort in incorporating them into a convenient bioinformatics pipeline that is intuitive and publicly available. In this paper, we developed ChIP-exo Analysis Pipeline (ChEAP) that executes the one-step process, starting from trimming and aligning raw sequencing reads to visualization of ChIP-exo results. The pipeline was implemented on the interactive web-based Python development environment - Jupyter Notebook, which is compatible with the Google Colab cloud platform to facilitate the sharing of codes and collaboration among researchers. Additionally, users could exploit the free GPU and CPU resources allocated by Colab to carry out computing tasks regardless of the performance of their local machines. The utility of ChEAP was demonstrated with the ChIP-exo datasets of RpoN sigma factor in E. coli K-12 MG1655. To analyze two raw data files, ChEAP runtime was 2 min and 25 s. Subsequent analyses identified 113 RpoN binding sites showing a conserved RpoN binding pattern in the motif search. ChEAP application in ChIP-exo data analysis is extensive and flexible for the parallel processing of data from various organisms.
Collapse
Affiliation(s)
- Ina Bang
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Linh Khanh Nong
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Joon Young Park
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Hoa Thi Le
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Sang- Mok Lee
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Donghyuk Kim
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea,Schools of Life Sciences, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea,Corresponding author at: School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea.
| |
Collapse
|
2
|
O Adetunji M, J Abraham B. SEAseq: a portable and cloud-based chromatin occupancy analysis suite. BMC Bioinformatics 2022; 23:77. [PMID: 35193506 PMCID: PMC8864840 DOI: 10.1186/s12859-022-04588-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 01/28/2022] [Indexed: 11/26/2022] Open
Abstract
Background Genome-wide protein-DNA binding is popularly assessed using specific antibody pulldown in Chromatin Immunoprecipitation Sequencing (ChIP-Seq) or Cleavage Under Targets and Release Using Nuclease (CUT&RUN) sequencing experiments. These technologies generate high-throughput sequencing data that necessitate the use of multiple sophisticated, computationally intensive genomic tools to make discoveries, but these genomic tools often have a high barrier to use because of computational resource constraints. Results We present a comprehensive, infrastructure-independent, computational pipeline called SEAseq, which leverages field-standard, open-source tools for processing and analyzing ChIP-Seq/CUT&RUN data. SEAseq performs extensive analyses from the raw output of the experiment, including alignment, peak calling, motif analysis, promoters and metagene coverage profiling, peak annotation distribution, clustered/stitched peaks (e.g. super-enhancer) identification, and multiple relevant quality assessment metrics, as well as automatic interfacing with data in GEO/SRA. SEAseq enables rapid and cost-effective resource for analysis of both new and publicly available datasets as demonstrated in our comparative case studies. Conclusions The easy-to-use and versatile design of SEAseq makes it a reliable and efficient resource for ensuring high quality analysis. Its cloud implementation enables a broad suite of analyses in environments with constrained computational resources. SEAseq is platform-independent and is aimed to be usable by everyone with or without programming skills. It is available on the cloud at https://platform.stjude.cloud/workflows/seaseq and can be locally installed from the repository at https://github.com/stjude/seaseq. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04588-z.
Collapse
Affiliation(s)
- Modupeore O Adetunji
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA
| | - Brian J Abraham
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, 38105, USA.
| |
Collapse
|
3
|
Jiang M, Xu SF, Tang TS, Miao L, Luo BZ, Ni Y, Kong FD, Liu C. Development and evaluation of a meat mitochondrial metagenomic (3MG) method for composition determination of meat from fifteen mammalian and avian species. BMC Genomics 2022; 23:36. [PMID: 34996352 PMCID: PMC8742424 DOI: 10.1186/s12864-021-08263-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Accepted: 12/17/2021] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Bioassessment and biomonitoring of meat products are aimed at identifying and quantifying adulterants and contaminants, such as meat from unexpected sources and microbes. Several methods for determining the biological composition of mixed samples have been used, including metabarcoding, metagenomics and mitochondrial metagenomics. In this study, we aimed to develop a method based on next-generation DNA sequencing to estimate samples that might contain meat from 15 mammalian and avian species that are commonly related to meat bioassessment and biomonitoring. RESULTS In this project, we found the meat composition from 15 species could not be identified with the metabarcoding approach because of the lack of universal primers or insufficient discrimination power. Consequently, we developed and evaluated a meat mitochondrial metagenomics (3MG) method. The 3MG method has four steps: (1) extraction of sequencing reads from mitochondrial genomes (mitogenomes); (2) assembly of mitogenomes; (3) mapping of mitochondrial reads to the assembled mitogenomes; and (4) biomass estimation based on the number of uniquely mapped reads. The method was implemented in a python script called 3MG. The analysis of simulated datasets showed that the method can determine contaminant composition at a proportion of 2% and the relative error was < 5%. To evaluate the performance of 3MG, we constructed and analysed mixed samples derived from 15 animal species in equal mass. Then, we constructed and analysed mixed samples derived from two animal species (pork and chicken) in different ratios. DNAs were extracted and used in constructing 21 libraries for next-generation sequencing. The analysis of the 15 species mix with the method showed the successful identification of 12 of the 15 (80%) animal species tested. The analysis of the mixed samples of the two species revealed correlation coefficients of 0.98 for pork and 0.98 for chicken between the number of uniquely mapped reads and the mass proportion. CONCLUSION To the best of our knowledge, this study is the first to demonstrate the potential of the non-targeted 3MG method as a tool for accurately estimating biomass in meat mix samples. The method has potential broad applications in meat product safety.
Collapse
Affiliation(s)
- Mei Jiang
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences & Peking Union Medical College, 100193 Beijing, PR China
| | - Shu-Fei Xu
- Technology Center of Xiamen Entry-exit Inspection and Quarantine Bureau, Xiamen, Fujian 361026 PR China
| | - Tai-Shan Tang
- Technology Center of Jiangsu Entry-exit Inspection and Quarantine Bureau, Nanjing, Jiangsu 210009 PR China
| | - Li Miao
- Technology Center of Henan Entry-exit Inspection and Quarantine Bureau, Zhengzhou, Henan 450003 PR China
| | - Bao-Zheng Luo
- Technology Center of Zhuhai Entry-exit Inspection and Quarantine Bureau, Zhuhai, Guangdong 519000 PR China
| | - Yang Ni
- College of Agriculture, Fujian Agriculture and Forestry University, Fuzhou, Fujian Province 350002 PR China
| | - Fan-De Kong
- Technology Center of Xiamen Entry-exit Inspection and Quarantine Bureau, Xiamen, Fujian 361026 PR China
| | - Chang Liu
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences & Peking Union Medical College, 100193 Beijing, PR China
| |
Collapse
|
4
|
Dall'Olio D, Curti N, Fonzi E, Sala C, Remondini D, Castellani G, Giampieri E. Impact of concurrency on the performance of a whole exome sequencing pipeline. BMC Bioinformatics 2021; 22:60. [PMID: 33563206 PMCID: PMC7874478 DOI: 10.1186/s12859-020-03780-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Accepted: 09/24/2020] [Indexed: 11/12/2022] Open
Abstract
Background Current high-throughput technologies—i.e. whole genome sequencing, RNA-Seq, ChIP-Seq, etc.—generate huge amounts of data and their usage gets more widespread with each passing year. Complex analysis pipelines involving several computationally-intensive steps have to be applied on an increasing number of samples. Workflow management systems allow parallelization and a more efficient usage of computational power. Nevertheless, this mostly happens by assigning the available cores to a single or few samples’ pipeline at a time. We refer to this approach as naive parallel strategy (NPS). Here, we discuss an alternative approach, which we refer to as concurrent execution strategy (CES), which equally distributes the available processors across every sample’s pipeline. Results Theoretically, we show that the CES results, under loose conditions, in a substantial speedup, with an ideal gain range spanning from 1 to the number of samples. Also, we observe that the CES yields even faster executions since parallelly computable tasks scale sub-linearly. Practically, we tested both strategies on a whole exome sequencing pipeline applied to three publicly available matched tumour-normal sample pairs of gastrointestinal stromal tumour. The CES achieved speedups in latency up to 2–2.4 compared to the NPS. Conclusions Our results hint that if resources distribution is further tailored to fit specific situations, an even greater gain in performance of multiple samples pipelines execution could be achieved. For this to be feasible, a benchmarking of the tools included in the pipeline would be necessary. It is our opinion these benchmarks should be consistently performed by the tools’ developers. Finally, these results suggest that concurrent strategies might also lead to energy and cost savings by making feasible the usage of low power machine clusters.
Collapse
Affiliation(s)
- Daniele Dall'Olio
- Department of Physics and Astronomy, University of Bologna, 40127, Bologna, BO, Italy
| | - Nico Curti
- Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, 40138, Bologna, BO, Italy
| | - Eugenio Fonzi
- Istituto Scientifico Romagnolo per lo Studio e la Cura dei Tumori (IRST) IRCCS, 47014, Meldola, Italy
| | - Claudia Sala
- Department of Physics and Astronomy, University of Bologna, 40127, Bologna, BO, Italy.
| | - Daniel Remondini
- Department of Physics and Astronomy, University of Bologna, 40127, Bologna, BO, Italy
| | - Gastone Castellani
- Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, 40138, Bologna, BO, Italy
| | - Enrico Giampieri
- Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, 40138, Bologna, BO, Italy
| |
Collapse
|
5
|
Abstract
Translation is a central biological process in living cells. Ribosome profiling approach enables assessing translation on a global, cell-wide level. Extracting versatile information from the ribosome profiling data usually requires specialized expertise for handling the sequencing data that is not available to the broad community of experimentalists. Here, we provide an easy-to-use and modifiable workflow that uses a small set of commands and enables full data analysis in a standardized way, including precise positioning of the ribosome-protected fragments, for determining codon-specific translation features. The workflow is complemented with simple step-by-step explanations and is accessible to scientists with no computational background.
Collapse
Affiliation(s)
| | - Zoya Ignatova
- Institute for Biochemistry and Molecular Biology, Department of Chemistry, University of Hamburg, Hamburg, Germany.
| |
Collapse
|
6
|
Wöste M, Leitão E, Laurentino S, Horsthemke B, Rahmann S, Schröder C. wg-blimp: an end-to-end analysis pipeline for whole genome bisulfite sequencing data. BMC Bioinformatics 2020; 21:169. [PMID: 32357829 PMCID: PMC7195798 DOI: 10.1186/s12859-020-3470-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Accepted: 03/24/2020] [Indexed: 11/20/2022] Open
Abstract
Background Analysing whole genome bisulfite sequencing datasets is a data-intensive task that requires comprehensive and reproducible workflows to generate valid results. While many algorithms have been developed for tasks such as alignment, comprehensive end-to-end pipelines are still sparse. Furthermore, previous pipelines lack features or show technical deficiencies, thus impeding analyses. Results We developed wg-blimp (whole genome bisulfite sequencing methylation analysis pipeline) as an end-to-end pipeline to ease whole genome bisulfite sequencing data analysis. It integrates established algorithms for alignment, quality control, methylation calling, detection of differentially methylated regions, and methylome segmentation, requiring only a reference genome and raw sequencing data as input. Comparing wg-blimp to previous end-to-end pipelines reveals similar setups for common sequence processing tasks, but shows differences for post-alignment analyses. We improve on previous pipelines by providing a more comprehensive analysis workflow as well as an interactive user interface. To demonstrate wg-blimp’s ability to produce correct results we used it to call differentially methylated regions for two publicly available datasets. We were able to replicate 112 of 114 previously published regions, and found results to be consistent with previous findings. We further applied wg-blimp to a publicly available sample of embryonic stem cells to showcase methylome segmentation. As expected, unmethylated regions were in close proximity of transcription start sites. Segmentation results were consistent with previous analyses, despite different reference genomes and sequencing techniques. Conclusions wg-blimp provides a comprehensive analysis pipeline for whole genome bisulfite sequencing data as well as a user interface for simplified result inspection. We demonstrated its applicability by analysing multiple publicly available datasets. Thus, wg-blimp is a relevant alternative to previous analysis pipelines and may facilitate future epigenetic research.
Collapse
Affiliation(s)
- Marius Wöste
- Institute of Medical Informatics, University of Münster, Albert-Schweitzer-Campus 1, Münster, 48149, Germany.
| | - Elsa Leitão
- Institute of Human Genetics, University of Duisburg-Essen, University Hospital Essen, Hufelandstraße 55, Essen, 45147, Germany
| | - Sandra Laurentino
- Centre of Reproductive Medicine and Andrology, Institute of Reproductive and Regenerative Biology, University Hospital Münster, Albert-Schweitzer-Campus 1, Münster, 48149, Germany
| | - Bernhard Horsthemke
- Institute of Human Genetics, University of Duisburg-Essen, University Hospital Essen, Hufelandstraße 55, Essen, 45147, Germany.,Institute of Human Genetics, University of Münster, Vesaliusweg 12-14, Münster, 48149, Germany
| | - Sven Rahmann
- Genome Informatics, Institute of Human Genetics, University of Duisburg-Essen, University Hospital Essen, Hufelandstraße 55, Essen, 45147, Germany
| | - Christopher Schröder
- Institute of Human Genetics, University of Duisburg-Essen, University Hospital Essen, Hufelandstraße 55, Essen, 45147, Germany.,Genome Informatics, Institute of Human Genetics, University of Duisburg-Essen, University Hospital Essen, Hufelandstraße 55, Essen, 45147, Germany
| |
Collapse
|
7
|
Kikuchi A, Nakazato T, Ito K, Nojima Y, Yokoyama T, Iwabuchi K, Bono H, Toyoda A, Fujiyama A, Sato R, Tabunoki H. Identification of functional enolase genes of the silkworm Bombyx mori from public databases with a combination of dry and wet bench processes. BMC Genomics 2017; 18:83. [PMID: 28086791 PMCID: PMC5237310 DOI: 10.1186/s12864-016-3455-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2016] [Accepted: 12/22/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Various insect species have been added to genomic databases over the years. Thus, researchers can easily obtain online genomic information on invertebrates and insects. However, many incorrectly annotated genes are included in these databases, which can prevent the correct interpretation of subsequent functional analyses. To address this problem, we used a combination of dry and wet bench processes to select functional genes from public databases. RESULTS Enolase is an important glycolytic enzyme in all organisms. We used a combination of dry and wet bench processes to identify functional enolases in the silkworm Bombyx mori (BmEno). First, we detected five annotated enolases from public databases using a Hidden Markov Model (HMM) search, and then through cDNA cloning, Northern blotting, and RNA-seq analysis, we revealed three functional enolases in B. mori: BmEno1, BmEno2, and BmEnoC. BmEno1 contained a conserved key amino acid residue for metal binding and substrate binding in other species. However, BmEno2 and BmEnoC showed a change in this key amino acid. Phylogenetic analysis showed that BmEno2 and BmEnoC were distinct from BmEno1 and other enolases, and were distributed only in lepidopteran clusters. BmEno1 was expressed in all of the tissues used in our study. In contrast, BmEno2 was mainly expressed in the testis with some expression in the ovary and suboesophageal ganglion. BmEnoC was weakly expressed in the testis. Quantitative RT-PCR showed that the mRNA expression of BmEno2 and BmEnoC correlated with testis development; thus, BmEno2 and BmEnoC may be related to lepidopteran-specific spermiogenesis. CONCLUSIONS We identified and characterized three functional enolases from public databases with a combination of dry and wet bench processes in the silkworm B. mori. In addition, we determined that BmEno2 and BmEnoC had species-specific functions. Our strategy could be helpful for the detection of minor genes and functional genes in non-model organisms from public databases.
Collapse
Affiliation(s)
- Akira Kikuchi
- Department of Science of Biological Production, Graduate School of Agriculture, Tokyo University of Agriculture and Technology, 3-5-8 Saiwai-cho, Fuchu, Tokyo, 183-8509, Japan
| | - Takeru Nakazato
- Database Center for Life Science (DBCLS), Joint Support-Center for Data Science Research, Research Organization of Information and Systems (ROIS), Yata 1111, Mishima, Shizuoka, 411-8540, Japan
| | - Katsuhiko Ito
- Department of Science of Biological Production, Graduate School of Agriculture, Tokyo University of Agriculture and Technology, 3-5-8 Saiwai-cho, Fuchu, Tokyo, 183-8509, Japan
| | - Yosui Nojima
- Department of Science of Biological Production, Graduate School of Agriculture, Tokyo University of Agriculture and Technology, 3-5-8 Saiwai-cho, Fuchu, Tokyo, 183-8509, Japan
| | - Takeshi Yokoyama
- Department of Science of Biological Production, Graduate School of Agriculture, Tokyo University of Agriculture and Technology, 3-5-8 Saiwai-cho, Fuchu, Tokyo, 183-8509, Japan
| | - Kikuo Iwabuchi
- Department of Bioregulation and Biointeraction, Graduate School of Agriculture, Tokyo University of Agriculture and Technology, 3-5-8 Saiwai-cho, Fuchu, Tokyo, 183-8509, Japan
| | - Hidemasa Bono
- Database Center for Life Science (DBCLS), Joint Support-Center for Data Science Research, Research Organization of Information and Systems (ROIS), Yata 1111, Mishima, Shizuoka, 411-8540, Japan
| | - Atsushi Toyoda
- Center for Information Biology, National Institute of Genetics, Yata 1111, Mishima, Shizuoka, 411-8540, Japan
| | - Asao Fujiyama
- Center for Information Biology, National Institute of Genetics, Yata 1111, Mishima, Shizuoka, 411-8540, Japan
| | - Ryoichi Sato
- Graduate School of Bio-Applications and Systems Engineering (BASE), 2-24-16, Naka-cho, Koganei, Tokyo, 184-8588, Japan
| | - Hiroko Tabunoki
- Department of Science of Biological Production, Graduate School of Agriculture, Tokyo University of Agriculture and Technology, 3-5-8 Saiwai-cho, Fuchu, Tokyo, 183-8509, Japan.
| |
Collapse
|
8
|
Qin Q, Mei S, Wu Q, Sun H, Li L, Taing L, Chen S, Li F, Liu T, Zang C, Xu H, Chen Y, Meyer CA, Zhang Y, Brown M, Long HW, Liu XS. ChiLin: a comprehensive ChIP-seq and DNase-seq quality control and analysis pipeline. BMC Bioinformatics 2016; 17:404. [PMID: 27716038 PMCID: PMC5048594 DOI: 10.1186/s12859-016-1274-4] [Citation(s) in RCA: 78] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2016] [Accepted: 09/21/2016] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Transcription factor binding, histone modification, and chromatin accessibility studies are important approaches to understanding the biology of gene regulation. ChIP-seq and DNase-seq have become the standard techniques for studying protein-DNA interactions and chromatin accessibility respectively, and comprehensive quality control (QC) and analysis tools are critical to extracting the most value from these assay types. Although many analysis and QC tools have been reported, few combine ChIP-seq and DNase-seq data analysis and quality control in a unified framework with a comprehensive and unbiased reference of data quality metrics. RESULTS ChiLin is a computational pipeline that automates the quality control and data analyses of ChIP-seq and DNase-seq data. It is developed using a flexible and modular software framework that can be easily extended and modified. ChiLin is ideal for batch processing of many datasets and is well suited for large collaborative projects involving ChIP-seq and DNase-seq from different designs. ChiLin generates comprehensive quality control reports that include comparisons with historical data derived from over 23,677 public ChIP-seq and DNase-seq samples (11,265 datasets) from eight literature-based classified categories. To the best of our knowledge, this atlas represents the most comprehensive ChIP-seq and DNase-seq related quality metric resource currently available. These historical metrics provide useful heuristic quality references for experiment across all commonly used assay types. Using representative datasets, we demonstrate the versatility of the pipeline by applying it to different assay types of ChIP-seq data. The pipeline software is available open source at https://github.com/cfce/chilin . CONCLUSION ChiLin is a scalable and powerful tool to process large batches of ChIP-seq and DNase-seq datasets. The analysis output and quality metrics have been structured into user-friendly directories and reports. We have successfully compiled 23,677 profiles into a comprehensive quality atlas with fine classification for users.
Collapse
Affiliation(s)
- Qian Qin
- Shanghai Key laboratory of tuberculosis, Shanghai Pulmonary Hospital, Shanghai, China
- Department of Bioinformatics, School of Life Science and Technology, Tongji University, Shanghai, China
| | - Shenglin Mei
- Shanghai Key laboratory of tuberculosis, Shanghai Pulmonary Hospital, Shanghai, China
- Department of Bioinformatics, School of Life Science and Technology, Tongji University, Shanghai, China
| | - Qiu Wu
- Shanghai Key laboratory of tuberculosis, Shanghai Pulmonary Hospital, Shanghai, China
- Department of Bioinformatics, School of Life Science and Technology, Tongji University, Shanghai, China
| | - Hanfei Sun
- Shanghai Key laboratory of tuberculosis, Shanghai Pulmonary Hospital, Shanghai, China
- Department of Bioinformatics, School of Life Science and Technology, Tongji University, Shanghai, China
| | - Lewyn Li
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA USA
| | - Len Taing
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, Boston, MA USA
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA USA
| | - Sujun Chen
- Shanghai Key laboratory of tuberculosis, Shanghai Pulmonary Hospital, Shanghai, China
- Department of Bioinformatics, School of Life Science and Technology, Tongji University, Shanghai, China
| | - Fugen Li
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA USA
| | - Tao Liu
- Department of Biochemistry, University at Buffalo, Buffalo, NY USA
| | - Chongzhi Zang
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, Boston, MA USA
| | - Han Xu
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, Boston, MA USA
| | - Yiwen Chen
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, Boston, MA USA
| | - Clifford A. Meyer
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, Boston, MA USA
| | - Yong Zhang
- Department of Bioinformatics, School of Life Science and Technology, Tongji University, Shanghai, China
| | - Myles Brown
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA USA
- Division of Molecular and Cellular Oncology, Department of Medical Oncology, Dana-Farber Cancer Institute and Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA USA
| | - Henry W. Long
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA USA
| | - X. Shirley Liu
- Shanghai Key laboratory of tuberculosis, Shanghai Pulmonary Hospital, Shanghai, China
- Department of Bioinformatics, School of Life Science and Technology, Tongji University, Shanghai, China
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, Boston, MA USA
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA USA
| |
Collapse
|
9
|
Abstract
With the development of novel assay technologies, biomedical experiments and analyses have gone through substantial evolution. Today, a typical experiment can simultaneously measure hundreds to thousands of individual features (e.g. genes) in dozens of biological conditions, resulting in gigabytes of data that need to be processed and analyzed. Because of the multiple steps involved in the data generation and analysis and the lack of details provided, it can be difficult for independent researchers to try to reproduce a published study. With the recent outrage following the halt of a cancer clinical trial due to the lack of reproducibility of the published study, researchers are now facing heavy pressure to ensure that their results are reproducible. Despite the global demand, too many published studies remain non-reproducible mainly due to the lack of availability of experimental protocol, data and/or computer code. Scientific discovery is an iterative process, where a published study generates new knowledge and data, resulting in new follow-up studies or clinical trials based on these results. As such, it is important for the results of a study to be quickly confirmed or discarded to avoid wasting time and money on novel projects. The availability of high-quality, reproducible data will also lead to more powerful analyses (or meta-analyses) where multiple data sets are combined to generate new knowledge. In this article, we review some of the recent developments regarding biomedical reproducibility and comparability and discuss some of the areas where the overall field could be improved.
Collapse
Affiliation(s)
- Yunda Huang
- Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, Mailstop M2-C200, Seattle, WA 98109-1024, USA
| | | |
Collapse
|