1
|
Navgire GS, Goel N, Sawhney G, Sharma M, Kaushik P, Mohanta YK, Mohanta TK, Al-Harrasi A. Analysis and Interpretation of metagenomics data: an approach. Biol Proced Online 2022; 24:18. [PMID: 36402995 PMCID: PMC9675974 DOI: 10.1186/s12575-022-00179-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Accepted: 10/19/2022] [Indexed: 11/20/2022] Open
Abstract
Advances in next-generation sequencing technologies have accelerated the momentum of metagenomic studies, which is increasing yearly. The metagenomics field is one of the versatile applications in microbiology, where any interaction in the environment involving microorganisms can be the topic of study. Due to this versatility, the number of applications of this omics technology reached its horizons. Agriculture is a crucial sector involving crop plants and microorganisms interacting together. Hence, studying these interactions through the lenses of metagenomics would completely disclose a new meaning to crop health and development. The rhizosphere is an essential reservoir of the microbial community for agricultural soil. Hence, we focus on the R&D of metagenomic studies on the rhizosphere of crops such as rice, wheat, legumes, chickpea, and sorghum. These recent developments are impossible without the continuous advancement seen in the next-generation sequencing platforms; thus, a brief introduction and analysis of the available sequencing platforms are presented here to have a clear picture of the workflow. Concluding the topic is the discussion about different pipelines applied to analyze data produced by sequencing techniques and have a significant role in interpreting the outcome of a particular experiment. A plethora of different software and tools are incorporated in the automated pipelines or individually available to perform manual metagenomic analysis. Here we describe 8-10 advanced, efficient pipelines used for analysis that explain their respective workflows to simplify the whole analysis process.
Collapse
Affiliation(s)
- Gauri S Navgire
- Department of Microbiology, Savitribai Phule Pune University, Pune, Maharastra, 411007, India
| | - Neha Goel
- Department of Genetics and Tree Improvement, Forest Research Institute, 248006, Dehradun, India
| | - Gifty Sawhney
- Inflammation Pharmacology Division, Academy of Scientific and Innovative Research (AcSIR), CSIR-Indian Institute of Integrative Medicine, Jammu-180001, Jammu Kashmir, India
| | - Mohit Sharma
- Department of Molecular Medicine, Medical University of Warsaw and Malopolska Center of Biotechnology, Karkow, Poland
| | | | | | - Tapan Kumar Mohanta
- Natural and Medical Sciences Research Center, University of Nizwa, Nizwa, 616, Oman.
| | - Ahmed Al-Harrasi
- Natural and Medical Sciences Research Center, University of Nizwa, Nizwa, 616, Oman.
| |
Collapse
|
2
|
Luo L, Gribskov M, Wang S. Bibliometric review of ATAC-Seq and its application in gene expression. Brief Bioinform 2022; 23:6543486. [PMID: 35255493 PMCID: PMC9116206 DOI: 10.1093/bib/bbac061] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 02/06/2022] [Accepted: 02/09/2022] [Indexed: 11/30/2022] Open
Abstract
With recent advances in high-throughput next-generation sequencing, it is possible to describe the regulation and expression of genes at multiple levels. An assay for transposase-accessible chromatin using sequencing (ATAC-seq), which uses Tn5 transposase to sequence protein-free binding regions of the genome, can be combined with chromatin immunoprecipitation coupled with deep sequencing (ChIP-seq) and ribonucleic acid sequencing (RNA-seq) to provide a detailed description of gene expression. Here, we reviewed the literature on ATAC-seq and described the characteristics of ATAC-seq publications. We then briefly introduced the principles of RNA-seq, ChIP-seq and ATAC-seq, focusing on the main features of the techniques. We built a phylogenetic tree from species that had been previously studied by using ATAC-seq. Studies of Mus musculus and Homo sapiens account for approximately 90% of the total ATAC-seq data, while other species are still in the process of accumulating data. We summarized the findings from human diseases and other species, illustrating the cutting-edge discoveries and the role of multi-omics data analysis in current research. Moreover, we collected and compared ATAC-seq analysis pipelines, which allowed biological researchers who lack programming skills to better analyze and explore ATAC-seq data. Through this review, it is clear that multi-omics analysis and single-cell sequencing technology will become the mainstream approach in future research.
Collapse
Affiliation(s)
- Liheng Luo
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, Shaanxi, China, 710072
| | - Michael Gribskov
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA
| | - Sufang Wang
- School of Life Sciences, Northwestern Polytechnical University, Xi'an, Shaanxi, China, 710072
| |
Collapse
|
3
|
Tuft S, Somerville TF, Li JPO, Neal T, De S, Horsburgh MJ, Fothergill JL, Foulkes D, Kaye S. Bacterial keratitis: identifying the areas of clinical uncertainty. Prog Retin Eye Res 2021; 89:101031. [PMID: 34915112 DOI: 10.1016/j.preteyeres.2021.101031] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 11/24/2021] [Accepted: 11/29/2021] [Indexed: 12/12/2022]
Abstract
Bacterial keratitis is a common corneal infection that is treated with topical antimicrobials. By the time of presentation there may already be severe visual loss from corneal ulceration and opacity, which may persist despite treatment. There are significant differences in the associated risk factors and the bacterial isolates between high income and low- or middle-income countries, so that general management guidelines may not be appropriate. Although the diagnosis of bacterial keratitis may seem intuitive there are multiple uncertainties about the criteria that are used, which impacts the interpretation of investigations and recruitment to clinical studies. Importantly, the concept that bacterial keratitis can only be confirmed by culture ignores the approximately 50% of cases clinically consistent with bacterial keratitis in which investigations are negative. The aetiology of these culture-negative cases is unknown. Currently, the estimation of bacterial susceptibility to antimicrobials is based on data from systemic administration and achievable serum or tissue concentrations, rather than relevant corneal concentrations and biological activity in the cornea. The provision to the clinician of minimum inhibitory concentrations of the antimicrobials for the isolated bacteria would be an important step forward. An increase in the prevalence of antimicrobial resistance is a concern, but the effect this has on disease outcomes is yet unclear. Virulence factors are not routinely assessed although they may affect the pathogenicity of bacteria within species and affect outcomes. New technologies have been developed to detect and kill bacteria, and their application to bacterial keratitis is discussed. In this review we present the multiple areas of clinical uncertainty that hamper research and the clinical management of bacterial keratitis, and we address some of the assumptions and dogma that have become established in the literature.
Collapse
Affiliation(s)
- Stephen Tuft
- Moorfields Eye Hospital NHS Foundation Trust, 162 City Road, London, EC1V 2PD, UK.
| | - Tobi F Somerville
- Department of Eye and Vision Sciences, University of Liverpool, 6 West Derby Street, Liverpool, L7 8TX, UK.
| | - Ji-Peng Olivia Li
- Moorfields Eye Hospital NHS Foundation Trust, 162 City Road, London, EC1V 2PD, UK.
| | - Timothy Neal
- Department of Clinical Microbiology, Liverpool Clinical Laboratories, Liverpool University Hospital NHS Foundation Trust, Prescot Street, Liverpool, L7 8XP, UK.
| | - Surjo De
- Department of Clinical Microbiology, University College London Hospitals NHS Foundation Trust, 250 Euston Road, London, NW1 2PG, UK.
| | - Malcolm J Horsburgh
- Department of Infection and Microbiomes, University of Liverpool, Crown Street, Liverpool, L69 7BX, UK.
| | - Joanne L Fothergill
- Department of Eye and Vision Sciences, University of Liverpool, 6 West Derby Street, Liverpool, L7 8TX, UK.
| | - Daniel Foulkes
- Department of Eye and Vision Sciences, University of Liverpool, 6 West Derby Street, Liverpool, L7 8TX, UK.
| | - Stephen Kaye
- Department of Eye and Vision Sciences, University of Liverpool, 6 West Derby Street, Liverpool, L7 8TX, UK.
| |
Collapse
|
4
|
Smith JP, Corces MR, Xu J, Reuter VP, Chang HY, Sheffield NC. PEPATAC: an optimized pipeline for ATAC-seq data analysis with serial alignments. NAR Genom Bioinform 2021; 3:lqab101. [PMID: 34859208 PMCID: PMC8632735 DOI: 10.1093/nargab/lqab101] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Revised: 09/30/2021] [Accepted: 11/15/2021] [Indexed: 12/18/2022] Open
Abstract
As chromatin accessibility data from ATAC-seq experiments continues to expand, there is continuing need for standardized analysis pipelines. Here, we present PEPATAC, an ATAC-seq pipeline that is easily applied to ATAC-seq projects of any size, from one-off experiments to large-scale sequencing projects. PEPATAC leverages unique features of ATAC-seq data to optimize for speed and accuracy, and it provides several unique analytical approaches. Output includes convenient quality control plots, summary statistics, and a variety of generally useful data formats to set the groundwork for subsequent project-specific data analysis. Downstream analysis is simplified by a standard definition format, modularity of components, and metadata APIs in R and Python. It is restartable, fault-tolerant, and can be run on local hardware, using any cluster resource manager, or in provided Linux containers. We also demonstrate the advantage of aligning to the mitochondrial genome serially, which improves the accuracy of alignment statistics and quality control metrics. PEPATAC is a robust and portable first step for any ATAC-seq project. BSD2-licensed code and documentation are available at https://pepatac.databio.org.
Collapse
Affiliation(s)
- Jason P Smith
- Center for Public Health Genomics, University of Virginia, VA,22908, USA
- Department of Biochemistry and Molecular Genetics, University of Virginia, VA 22908 USA
| | - M Ryan Corces
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA 94304, USA
| | - Jin Xu
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA 94304, USA
| | - Vincent P Reuter
- Genomics and Computational Biology Graduate Group, University of Pennsylvania, PA 19087, USA
| | - Howard Y Chang
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA 94304, USA
| | - Nathan C Sheffield
- Center for Public Health Genomics, University of Virginia, VA,22908, USA
- Department of Biochemistry and Molecular Genetics, University of Virginia, VA 22908 USA
- Department of Public Health Sciences, University of Virginia, VA 22908, USA
- Department of Biomedical Engineering, University of Virginia, VA 22908, USA
| |
Collapse
|
5
|
Neubert K, Zuchantke E, Leidenfrost RM, Wünschiers R, Grützke J, Malorny B, Brendebach H, Al Dahouk S, Homeier T, Hotzel H, Reinert K, Tomaso H, Busch A. Testing assembly strategies of Francisella tularensis genomes to infer an evolutionary conservation analysis of genomic structures. BMC Genomics 2021; 22:822. [PMID: 34773979 PMCID: PMC8590783 DOI: 10.1186/s12864-021-08115-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Accepted: 10/12/2021] [Indexed: 02/08/2023] Open
Abstract
Background We benchmarked sequencing technology and assembly strategies for short-read, long-read, and hybrid assemblers in respect to correctness, contiguity, and completeness of assemblies in genomes of Francisella tularensis. Benchmarking allowed in-depth analyses of genomic structures of the Francisella pathogenicity islands and insertion sequences. Five major high-throughput sequencing technologies were applied, including next-generation “short-read” and third-generation “long-read” sequencing methods. Results We focused on short-read assemblers, hybrid assemblers, and analysis of the genomic structure with particular emphasis on insertion sequences and the Francisella pathogenicity island. The A5-miseq pipeline performed best for MiSeq data, Mira for Ion Torrent data, and ABySS for HiSeq data from eight short-read assembly methods. Two approaches were applied to benchmark long-read and hybrid assembly strategies: long-read-first assembly followed by correction with short reads (Canu/Pilon, Flye/Pilon) and short-read-first assembly along with scaffolding based on long reads (Unicyler, SPAdes). Hybrid assembly can resolve large repetitive regions best with a “long-read first” approach. Conclusions Genomic structures of the Francisella pathogenicity islands frequently showed misassembly. Insertion sequences (IS) could be used to perform an evolutionary conservation analysis. A phylogenetic structure of insertion sequences and the evolution within the clades elucidated the clade structure of the highly conservative F. tularensis. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-08115-x.
Collapse
Affiliation(s)
- Kerstin Neubert
- Department of Mathematics and Computer Science, Algorithmic Bioinformatics, Freie Universität Berlin, Institute of Computer Science, Takustr. 9, 14195, Berlin, Germany.,German Federal Institute for Risk Assessment, Diedersdorfer Weg 1, 12277, Berlin, Germany
| | - Eric Zuchantke
- Friedrich-Loeffler-Institut, Institute of Bacterial Infections and Zoonoses, Naumburger Str. 96a, 07749, Jena, Germany
| | - Robert Maximilian Leidenfrost
- Department of Biotechnology and Chemistry, Mittweida University of Applied Sciences, Technikumplatz 17a, 09648, Mittweida, Germany
| | - Röbbe Wünschiers
- Department of Biotechnology and Chemistry, Mittweida University of Applied Sciences, Technikumplatz 17a, 09648, Mittweida, Germany
| | - Josephine Grützke
- German Federal Institute for Risk Assessment, Diedersdorfer Weg 1, 12277, Berlin, Germany
| | - Burkhard Malorny
- German Federal Institute for Risk Assessment, Diedersdorfer Weg 1, 12277, Berlin, Germany
| | - Holger Brendebach
- German Federal Institute for Risk Assessment, Diedersdorfer Weg 1, 12277, Berlin, Germany
| | - Sascha Al Dahouk
- German Federal Institute for Risk Assessment, Diedersdorfer Weg 1, 12277, Berlin, Germany
| | - Timo Homeier
- Friedrich-Loeffler-Institut, Institute of Epidemiology, Südufer, 10 17493, Greifswald, Insel Riems, Germany
| | - Helmut Hotzel
- Friedrich-Loeffler-Institut, Institute of Bacterial Infections and Zoonoses, Naumburger Str. 96a, 07749, Jena, Germany
| | - Knut Reinert
- Department of Mathematics and Computer Science, Algorithmic Bioinformatics, Freie Universität Berlin, Institute of Computer Science, Takustr. 9, 14195, Berlin, Germany
| | - Herbert Tomaso
- Friedrich-Loeffler-Institut, Institute of Bacterial Infections and Zoonoses, Naumburger Str. 96a, 07749, Jena, Germany
| | - Anne Busch
- Friedrich-Loeffler-Institut, Institute of Bacterial Infections and Zoonoses, Naumburger Str. 96a, 07749, Jena, Germany. .,Department of Anaesthesiology and Intensive Care Medicine, University Hospital Jena, Jena, Germany.
| |
Collapse
|
6
|
Chao KH, Hsiao YW, Lee YF, Lee CY, Lai LC, Tsai MH, Lu TP, Chuang EY. RNASeqR: An R Package for Automated Two-Group RNA-Seq Analysis Workflow. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2023-2031. [PMID: 31796413 DOI: 10.1109/tcbb.2019.2956708] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
RNA-Seq analysis has revolutionized researchers' understanding of the transcriptome in biological research. Assessing the differences in transcriptomic profiles between tissue samples or patient groups enables researchers to explore the underlying biological impact of transcription. RNA-Seq analysis requires multiple processing steps and huge computational capabilities. There are many well-developed R packages for individual steps; however, there are few R/Bioconductor packages that integrate existing software tools into a comprehensive RNA-Seq analysis and provide fundamental end-to-end results in pure R environment so that researchers can quickly and easily get fundamental information in big sequencing data. To address this need, we have developed the open source R/Bioconductor package, RNASeqR. It allows users to run an automated RNA-Seq analysis with only six steps, producing essential tabular and graphical results for further biological interpretation. The features of RNASeqR include: six-step analysis, comprehensive visualization, background execution version, and the integration of both R and command-line software. RNASeqR provides fast, light-weight, and easy-to-run RNA-Seq analysis pipeline in pure R environment. It allows users to efficiently utilize popular software tools, including both R/Bioconductor and command-line tools, without predefining the resources or environments. RNASeqR is freely available for Linux and macOS operating systems from Bioconductor (https://bioconductor.org/packages/release/bioc/html/RNASeqR.html).
Collapse
|
7
|
Meiler A, Marchiano F, Haering M, Weitkunat M, Schnorrer F, Habermann BH. AnnoMiner is a new web-tool to integrate epigenetics, transcription factor occupancy and transcriptomics data to predict transcriptional regulators. Sci Rep 2021; 11:15463. [PMID: 34326396 PMCID: PMC8322331 DOI: 10.1038/s41598-021-94805-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 07/14/2021] [Indexed: 11/23/2022] Open
Abstract
Gene expression regulation requires precise transcriptional programs, led by transcription factors in combination with epigenetic events. Recent advances in epigenomic and transcriptomic techniques provided insight into different gene regulation mechanisms. However, to date it remains challenging to understand how combinations of transcription factors together with epigenetic events control cell-type specific gene expression. We have developed the AnnoMiner web-server, an innovative and flexible tool to annotate and integrate epigenetic, and transcription factor occupancy data. First, AnnoMiner annotates user-provided peaks with gene features. Second, AnnoMiner can integrate genome binding data from two different transcriptional regulators together with gene features. Third, AnnoMiner offers to explore the transcriptional deregulation of genes nearby, or within a specified genomic region surrounding a user-provided peak. AnnoMiner’s fourth function performs transcription factor or histone modification enrichment analysis for user-provided gene lists by utilizing hundreds of public, high-quality datasets from ENCODE for the model organisms human, mouse, Drosophila and C. elegans. Thus, AnnoMiner can predict transcriptional regulators for a studied process without the strict need for chromatin data from the same process. We compared AnnoMiner to existing tools and experimentally validated several transcriptional regulators predicted by AnnoMiner to indeed contribute to muscle morphogenesis in Drosophila. AnnoMiner is freely available at http://chimborazo.ibdm.univ-mrs.fr/AnnoMiner/.
Collapse
Affiliation(s)
- Arno Meiler
- Max Planck Institute of Biochemistry, Am Klopferspitz 18, 82152, Martinsried, Germany
| | - Fabio Marchiano
- Aix-Marseille University, CNRS, IBDM UMR 7288, The Turing Centre for Living systems (CENTURI), Aix-Marseille University, Parc Scientifique de Luminy Case 907, 163, Avenue de Luminy, 13009, Marseille, France
| | - Margaux Haering
- Aix-Marseille University, CNRS, IBDM UMR 7288, The Turing Centre for Living systems (CENTURI), Aix-Marseille University, Parc Scientifique de Luminy Case 907, 163, Avenue de Luminy, 13009, Marseille, France
| | - Manuela Weitkunat
- Max Planck Institute of Biochemistry, Am Klopferspitz 18, 82152, Martinsried, Germany
| | - Frank Schnorrer
- Max Planck Institute of Biochemistry, Am Klopferspitz 18, 82152, Martinsried, Germany.,Aix-Marseille University, CNRS, IBDM UMR 7288, The Turing Centre for Living systems (CENTURI), Aix-Marseille University, Parc Scientifique de Luminy Case 907, 163, Avenue de Luminy, 13009, Marseille, France
| | - Bianca H Habermann
- Max Planck Institute of Biochemistry, Am Klopferspitz 18, 82152, Martinsried, Germany. .,Aix-Marseille University, CNRS, IBDM UMR 7288, The Turing Centre for Living systems (CENTURI), Aix-Marseille University, Parc Scientifique de Luminy Case 907, 163, Avenue de Luminy, 13009, Marseille, France.
| |
Collapse
|
8
|
John A, Muenzen K, Ausmees K. Evaluation of serverless computing for scalable execution of a joint variant calling workflow. PLoS One 2021; 16:e0254363. [PMID: 34242357 PMCID: PMC8270184 DOI: 10.1371/journal.pone.0254363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 06/24/2021] [Indexed: 11/18/2022] Open
Abstract
Advances in whole-genome sequencing have greatly reduced the cost and time of obtaining raw genetic information, but the computational requirements of analysis remain a challenge. Serverless computing has emerged as an alternative to using dedicated compute resources, but its utility has not been widely evaluated for standardized genomic workflows. In this study, we define and execute a best-practice joint variant calling workflow using the SWEEP workflow management system. We present an analysis of performance and scalability, and discuss the utility of the serverless paradigm for executing workflows in the field of genomics research. The GATK best-practice short germline joint variant calling pipeline was implemented as a SWEEP workflow comprising 18 tasks. The workflow was executed on Illumina paired-end read samples from the European and African super populations of the 1000 Genomes project phase III. Cost and runtime increased linearly with increasing sample size, although runtime was driven primarily by a single task for larger problem sizes. Execution took a minimum of around 3 hours for 2 samples, up to nearly 13 hours for 62 samples, with costs ranging from $2 to $70.
Collapse
Affiliation(s)
- Aji John
- Department of Biology, University of Washington, Seattle, Washington, United States of America
- * E-mail:
| | - Kathleen Muenzen
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, United States of America
| | - Kristiina Ausmees
- Department of Information Technology, Uppsala University, Uppsala, Sweden
| |
Collapse
|
9
|
Kumagai A, Dunphy WG. Binding of the Treslin-MTBP Complex to Specific Regions of the Human Genome Promotes the Initiation of DNA Replication. Cell Rep 2021; 32:108178. [PMID: 32966791 PMCID: PMC7523632 DOI: 10.1016/j.celrep.2020.108178] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Revised: 06/12/2020] [Accepted: 08/31/2020] [Indexed: 12/16/2022] Open
Abstract
The processes that control where higher eukaryotic cells initiate DNA replication throughout the genome are not understood clearly. In metazoans, the Treslin-MTBP complex mediates critical final steps in formation of the activated replicative helicase prior to initiation of replication. Here, we map the genome-wide distribution of the MTBP subunit of this complex in human cells. Our results indicate that MTBP binds to at least 30,000 sites in the genome. A majority of these sites reside in regions of open chromatin that contain transcriptional-regulatory elements (e.g., promoters, enhancers, and super-enhancers), which are known to be preferred areas for initiation of replication. Furthermore, many binding sites encompass two genomic features: a nucleosome-free DNA sequence (e.g., G-quadruplex DNA or AP-1 motif) and a nucleosome bearing histone marks characteristic of open chromatin, such as H3K4me2. Taken together, these findings indicate that Treslin-MTBP associates coordinately with multiple genomic signals to promote initiation of replication. Kumagai and Dunphy show that Treslin-MTBP, activator of the replicative helicase, binds to at least 30,000 sites in the human genome. Many sites contain a nucleosome with active chromatin marks and nucleosome-free DNA (G-quadruplex or AP-1 site). Thus, Treslin-MTBP associates with multiple genomic elements to promote initiation of DNA replication.
Collapse
Affiliation(s)
- Akiko Kumagai
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - William G Dunphy
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA.
| |
Collapse
|
10
|
Global Analyses to Identify Direct Transcriptional Targets of p53. Methods Mol Biol 2021. [PMID: 33786783 DOI: 10.1007/978-1-0716-1217-0_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
The transcription factor p53 controls a gene expression program with pleiotropic effects on cell biology including cell cycle arrest and apoptosis. Identifying direct p53 target genes within this network and determining how they influence cell fate decisions downstream of p53 activation is a prerequisite for designing therapeutic approaches that target p53 to effectively kill cancer cells. Here we describe a comprehensive multi-omics approach for identifying genes that are direct transcriptional targets of p53. We provide detailed procedures for measuring global RNA polymerase activity, defining p53 binding sites across the genome, and quantifying changes in steady-state mRNA in response to p53 activation.
Collapse
|
11
|
Smith JP, Sheffield NC. Analytical Approaches for ATAC-seq Data Analysis. CURRENT PROTOCOLS IN HUMAN GENETICS 2020; 106:e101. [PMID: 32543102 PMCID: PMC8191135 DOI: 10.1002/cphg.101] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
ATAC-seq, the assay for transposase-accessible chromatin using sequencing, is a quick and efficient approach to investigating the chromatin accessibility landscape. Investigating chromatin accessibility has broad utility for answering many biological questions, such as mapping nucleosomes, identifying transcription factor binding sites, and measuring differential activity of DNA regulatory elements. Because the ATAC-seq protocol is both simple and relatively inexpensive, there has been a rapid increase in the availability of chromatin accessibility data. Furthermore, advances in ATAC-seq protocols are rapidly extending its breadth to additional experimental conditions, cell types, and species. Accompanying the increase in data, there has also been an explosion of new tools and analytical approaches for analyzing it. Here, we explain the fundamentals of ATAC-seq data processing, summarize common analysis approaches, and review computational tools to provide recommendations for different research questions. This primer provides a starting point and a reference for analysis of ATAC-seq data. © 2020 Wiley Periodicals LLC.
Collapse
Affiliation(s)
- Jason P. Smith
- Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, Virginia
| | - Nathan C. Sheffield
- Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, Virginia
- Department of Public Health Sciences, University of Virginia, Charlottesville, Virginia
- Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia
| |
Collapse
|
12
|
Yukselen O, Turkyilmaz O, Ozturk AR, Garber M, Kucukural A. DolphinNext: a distributed data processing platform for high throughput genomics. BMC Genomics 2020; 21:310. [PMID: 32306927 PMCID: PMC7168977 DOI: 10.1186/s12864-020-6714-x] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Accepted: 04/01/2020] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND The emergence of high throughput technologies that produce vast amounts of genomic data, such as next-generation sequencing (NGS) is transforming biological research. The dramatic increase in the volume of data, the variety and continuous change of data processing tools, algorithms and databases make analysis the main bottleneck for scientific discovery. The processing of high throughput datasets typically involves many different computational programs, each of which performs a specific step in a pipeline. Given the wide range of applications and organizational infrastructures, there is a great need for highly parallel, flexible, portable, and reproducible data processing frameworks. Several platforms currently exist for the design and execution of complex pipelines. Unfortunately, current platforms lack the necessary combination of parallelism, portability, flexibility and/or reproducibility that are required by the current research environment. To address these shortcomings, workflow frameworks that provide a platform to develop and share portable pipelines have recently arisen. We complement these new platforms by providing a graphical user interface to create, maintain, and execute complex pipelines. Such a platform will simplify robust and reproducible workflow creation for non-technical users as well as provide a robust platform to maintain pipelines for large organizations. RESULTS To simplify development, maintenance, and execution of complex pipelines we created DolphinNext. DolphinNext facilitates building and deployment of complex pipelines using a modular approach implemented in a graphical interface that relies on the powerful Nextflow workflow framework by providing 1. A drag and drop user interface that visualizes pipelines and allows users to create pipelines without familiarity in underlying programming languages. 2. Modules to execute and monitor pipelines in distributed computing environments such as high-performance clusters and/or cloud 3. Reproducible pipelines with version tracking and stand-alone versions that can be run independently. 4. Modular process design with process revisioning support to increase reusability and pipeline development efficiency. 5. Pipeline sharing with GitHub and automated testing 6. Extensive reports with R-markdown and shiny support for interactive data visualization and analysis. CONCLUSION DolphinNext is a flexible, intuitive, web-based data processing and analysis platform that enables creating, deploying, sharing, and executing complex Nextflow pipelines with extensive revisioning and interactive reporting to enhance reproducible results.
Collapse
Affiliation(s)
- Onur Yukselen
- Bioinformatics Core, University of Massachusetts Medical School, Worcester, MA, 01605, USA
| | - Osman Turkyilmaz
- RNA Therapeutics Institute, University of Massachusetts Medical School, Worcester, MA, 01605, USA
| | - Ahmet Rasit Ozturk
- RNA Therapeutics Institute, University of Massachusetts Medical School, Worcester, MA, 01605, USA
| | - Manuel Garber
- Bioinformatics Core, University of Massachusetts Medical School, Worcester, MA, 01605, USA.
- Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA, 01605, USA.
- Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, 01605, USA.
| | - Alper Kucukural
- Bioinformatics Core, University of Massachusetts Medical School, Worcester, MA, 01605, USA.
- Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA, 01605, USA.
- Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, 01605, USA.
| |
Collapse
|
13
|
Yan F, Powell DR, Curtis DJ, Wong NC. From reads to insight: a hitchhiker's guide to ATAC-seq data analysis. Genome Biol 2020; 21:22. [PMID: 32014034 PMCID: PMC6996192 DOI: 10.1186/s13059-020-1929-3] [Citation(s) in RCA: 182] [Impact Index Per Article: 45.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Accepted: 01/08/2020] [Indexed: 12/16/2022] Open
Abstract
Assay of Transposase Accessible Chromatin sequencing (ATAC-seq) is widely used in studying chromatin biology, but a comprehensive review of the analysis tools has not been completed yet. Here, we discuss the major steps in ATAC-seq data analysis, including pre-analysis (quality check and alignment), core analysis (peak calling), and advanced analysis (peak differential analysis and annotation, motif enrichment, footprinting, and nucleosome position analysis). We also review the reconstruction of transcriptional regulatory networks with multiomics data and highlight the current challenges of each step. Finally, we describe the potential of single-cell ATAC-seq and highlight the necessity of developing ATAC-seq specific analysis tools to obtain biologically meaningful insights.
Collapse
Affiliation(s)
- Feng Yan
- Australian Centre for Blood Diseases, Central Clinical School, Monash University, Melbourne, VIC, Australia
| | - David R Powell
- Monash Bioinformatics Platform, Monash University, Melbourne, VIC, Australia
| | - David J Curtis
- Australian Centre for Blood Diseases, Central Clinical School, Monash University, Melbourne, VIC, Australia.,Department of Clinical Haematology, Alfred Health, Melbourne, VIC, Australia
| | - Nicholas C Wong
- Australian Centre for Blood Diseases, Central Clinical School, Monash University, Melbourne, VIC, Australia. .,Monash Bioinformatics Platform, Monash University, Melbourne, VIC, Australia.
| |
Collapse
|
14
|
Thibodeau A, Uyar A, Khetan S, Stitzel ML, Ucar D. A neural network based model effectively predicts enhancers from clinical ATAC-seq samples. Sci Rep 2018; 8:16048. [PMID: 30375457 PMCID: PMC6207744 DOI: 10.1038/s41598-018-34420-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Accepted: 10/16/2018] [Indexed: 01/06/2023] Open
Abstract
Enhancers are cis-acting sequences that regulate transcription rates of their target genes in a cell-specific manner and harbor disease-associated sequence variants in cognate cell types. Many complex diseases are associated with enhancer malfunction, necessitating the discovery and study of enhancers from clinical samples. Assay for Transposase Accessible Chromatin (ATAC-seq) technology can interrogate chromatin accessibility from small cell numbers and facilitate studying enhancers in pathologies. However, on average, ~35% of open chromatin regions (OCRs) from ATAC-seq samples map to enhancers. We developed a neural network-based model, Predicting Enhancers from ATAC-Seq data (PEAS), to effectively infer enhancers from clinical ATAC-seq samples by extracting ATAC-seq data features and integrating these with sequence-related features (e.g., GC ratio). PEAS recapitulated ChromHMM-defined enhancers in CD14+ monocytes, CD4+ T cells, GM12878, peripheral blood mononuclear cells, and pancreatic islets. PEAS models trained on these 5 cell types effectively predicted enhancers in four cell types that are not used in model training (EndoC-βH1, naïve CD8+ T, MCF7, and K562 cells). Finally, PEAS inferred individual-specific enhancers from 19 islet ATAC-seq samples and revealed variability in enhancer activity across individuals, including those driven by genetic differences. PEAS is an easy-to-use tool developed to study enhancers in pathologies by taking advantage of the increasing number of clinical epigenomes.
Collapse
Affiliation(s)
- Asa Thibodeau
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Asli Uyar
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | - Shubham Khetan
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA.,Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT, 06030, USA
| | - Michael L Stitzel
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA.,Institute for Systems Genomics, University of Connecticut Health Center, Farmington, CT, 06030, USA
| | - Duygu Ucar
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA. .,Institute for Systems Genomics, University of Connecticut Health Center, Farmington, CT, 06030, USA.
| |
Collapse
|
15
|
Martins-Santana L, Nora LC, Sanches-Medeiros A, Lovate GL, Cassiano MHA, Silva-Rocha R. Systems and Synthetic Biology Approaches to Engineer Fungi for Fine Chemical Production. Front Bioeng Biotechnol 2018; 6:117. [PMID: 30338257 PMCID: PMC6178918 DOI: 10.3389/fbioe.2018.00117] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Accepted: 08/02/2018] [Indexed: 01/16/2023] Open
Abstract
Since the advent of systems and synthetic biology, many studies have sought to harness microbes as cell factories through genetic and metabolic engineering approaches. Yeast and filamentous fungi have been successfully harnessed to produce fine and high value-added chemical products. In this review, we present some of the most promising advances from recent years in the use of fungi for this purpose, focusing on the manipulation of fungal strains using systems and synthetic biology tools to improve metabolic flow and the flow of secondary metabolites by pathway redesign. We also review the roles of bioinformatics analysis and predictions in synthetic circuits, highlighting in silico systemic approaches to improve the efficiency of synthetic modules.
Collapse
Affiliation(s)
- Leonardo Martins-Santana
- Systems and Synthetic Biology Laboratory, Cell and Molecular Biology Department, Ribeirão Preto Medical School, São Paulo University (FMRP-USP), Ribeirão Preto, Brazil
| | - Luisa C Nora
- Systems and Synthetic Biology Laboratory, Cell and Molecular Biology Department, Ribeirão Preto Medical School, São Paulo University (FMRP-USP), Ribeirão Preto, Brazil
| | - Ananda Sanches-Medeiros
- Systems and Synthetic Biology Laboratory, Cell and Molecular Biology Department, Ribeirão Preto Medical School, São Paulo University (FMRP-USP), Ribeirão Preto, Brazil
| | - Gabriel L Lovate
- Systems and Synthetic Biology Laboratory, Cell and Molecular Biology Department, Ribeirão Preto Medical School, São Paulo University (FMRP-USP), Ribeirão Preto, Brazil
| | - Murilo H A Cassiano
- Systems and Synthetic Biology Laboratory, Cell and Molecular Biology Department, Ribeirão Preto Medical School, São Paulo University (FMRP-USP), Ribeirão Preto, Brazil
| | - Rafael Silva-Rocha
- Systems and Synthetic Biology Laboratory, Cell and Molecular Biology Department, Ribeirão Preto Medical School, São Paulo University (FMRP-USP), Ribeirão Preto, Brazil
| |
Collapse
|
16
|
Dual Roles of Poly(dA:dT) Tracts in Replication Initiation and Fork Collapse. Cell 2018; 174:1127-1142.e19. [PMID: 30078706 DOI: 10.1016/j.cell.2018.07.011] [Citation(s) in RCA: 133] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Revised: 05/25/2018] [Accepted: 07/06/2018] [Indexed: 12/30/2022]
Abstract
Replication origins, fragile sites, and rDNA have been implicated as sources of chromosomal instability. However, the defining genomic features of replication origins and fragile sites are among the least understood elements of eukaryote genomes. Here, we map sites of replication initiation and breakage in primary cells at high resolution. We find that replication initiates between transcribed genes within nucleosome-depleted structures established by long asymmetrical poly(dA:dT) tracts flanking the initiation site. Paradoxically, long (>20 bp) (dA:dT) tracts are also preferential sites of polar replication fork stalling and collapse within early-replicating fragile sites (ERFSs) and late-replicating common fragile sites (CFSs) and at the rDNA replication fork barrier. Poly(dA:dT) sequences are fragile because long single-strand poly(dA) stretches at the replication fork are unprotected by the replication protein A (RPA). We propose that the evolutionary expansion of poly(dA:dT) tracts in eukaryotic genomes promotes replication initiation, but at the cost of chromosome fragility.
Collapse
|
17
|
Chen A, Chen D, Chen Y. Advances of DNase-seq for mapping active gene regulatory elements across the genome in animals. Gene 2018; 667:83-94. [DOI: 10.1016/j.gene.2018.05.033] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2017] [Revised: 05/04/2018] [Accepted: 05/10/2018] [Indexed: 12/16/2022]
|
18
|
Visconti A, Martin TC, Falchi M. YAMP: a containerized workflow enabling reproducibility in metagenomics research. Gigascience 2018; 7:5039705. [PMID: 29917068 PMCID: PMC6047416 DOI: 10.1093/gigascience/giy072] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2018] [Revised: 05/01/2018] [Accepted: 06/11/2018] [Indexed: 01/12/2023] Open
Abstract
YAMP ("Yet Another Metagenomics Pipeline") is a user-friendly workflow that enables the analysis of whole shotgun metagenomic data while using containerization to ensure computational reproducibility and facilitate collaborative research. YAMP can be executed on any UNIX-like system and offers seamless support for multiple job schedulers as well as for the Amazon AWS cloud. Although YAMP was developed to be ready to use by nonexperts, bioinformaticians will appreciate its flexibility, modularization, and simple customization.
Collapse
Affiliation(s)
- Alessia Visconti
- Department of Twin Research and Genetic Epidemiology, King’s College London, Westminster Bridge Road, SE1 7EH, London, UK
| | - Tiphaine C Martin
- Department of Twin Research and Genetic Epidemiology, King’s College London, Westminster Bridge Road, SE1 7EH, London, UK
| | - Mario Falchi
- Department of Twin Research and Genetic Epidemiology, King’s College London, Westminster Bridge Road, SE1 7EH, London, UK
| |
Collapse
|