1
|
Ventolero MF, Wang S, Hu H, Li X. Computational analyses of bacterial strains from shotgun reads. Brief Bioinform 2022; 23:6524011. [PMID: 35136954 DOI: 10.1093/bib/bbac013] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 01/10/2022] [Accepted: 01/11/2022] [Indexed: 12/21/2022] Open
Abstract
Shotgun sequencing is routinely employed to study bacteria in microbial communities. With the vast amount of shotgun sequencing reads generated in a metagenomic project, it is crucial to determine the microbial composition at the strain level. This study investigated 20 computational tools that attempt to infer bacterial strain genomes from shotgun reads. For the first time, we discussed the methodology behind these tools. We also systematically evaluated six novel-strain-targeting tools on the same datasets and found that BHap, mixtureS and StrainFinder performed better than other tools. Because the performance of the best tools is still suboptimal, we discussed future directions that may address the limitations.
Collapse
Affiliation(s)
| | - Saidi Wang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Haiyan Hu
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA.,Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, USA
| | - Xiaoman Li
- Burnett School of Biomedical Science, University of Central Florida, Orlando, FL 32816, USA
| |
Collapse
|
2
|
Wang S, Hu H, Li X. A systematic study of motif pairs that may facilitate enhancer-promoter interactions. J Integr Bioinform 2022; 19:jib-2021-0038. [PMID: 35130376 PMCID: PMC9069648 DOI: 10.1515/jib-2021-0038] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 01/20/2022] [Indexed: 01/06/2023] Open
Abstract
Pairs of interacting transcription factors (TFs) have previously been shown to bind to enhancers and promoters and contribute to their physical interactions. However, to date, we have limited knowledge about such TF pairs. To fill this void, we systematically studied the co-occurrence of TF-binding motifs in interacting enhancer-promoter (EP) pairs in seven human cell lines. We discovered 423 motif pairs that significantly co-occur in enhancers and promoters of interacting EP pairs. We demonstrated that these motif pairs are biologically meaningful and significantly enriched with motif pairs of known interacting TF pairs. We also showed that the identified motif pairs facilitated the discovery of the interacting EP pairs. The developed pipeline, EPmotifPair, together with the predicted motifs and motif pairs, is available at https://doi.org/10.6084/m9.figshare.14192000. Our study provides a comprehensive list of motif pairs that may contribute to EP physical interactions, which facilitate generating meaningful hypotheses for experimental validation.
Collapse
Affiliation(s)
- Saidi Wang
- Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA
| | - Haiyan Hu
- Department of Computer Science, University of Central Florida, Orlando, FL, 32816, USA
| | - Xiaoman Li
- Burnett school of Biomedical Science, College of Medicine, University of Central Florida, Orlando, FL, 32816, USA
| |
Collapse
|
3
|
Systematic discovery of cofactor motifs from ChIP-seq data by SIOMICS. Methods 2015; 79-80:47-51. [DOI: 10.1016/j.ymeth.2014.08.006] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2014] [Revised: 07/19/2014] [Accepted: 08/06/2014] [Indexed: 11/19/2022] Open
|
4
|
Wang Y, Hu H, Li X. MBBC: an efficient approach for metagenomic binning based on clustering. BMC Bioinformatics 2015; 16:36. [PMID: 25652152 PMCID: PMC4339733 DOI: 10.1186/s12859-015-0473-8] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2014] [Accepted: 01/22/2015] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Binning environmental shotgun reads is one of the most fundamental tasks in metagenomic studies, in which mixed reads from different species or operational taxonomical units (OTUs) are separated into different groups. While dozens of binning methods are available, there is still room for improvement. RESULTS We developed a novel taxonomy-independent approach called MBBC (Metagenomic Binning Based on Clustering) to cluster environmental shotgun reads, by considering k-mer frequency in reads and Markov properties of the inferred OTUs. Tested on twelve simulated datasets, MBBC reliably estimated the species number, the genome size, and the relative abundance of each species, independent of whether there are errors in reads. Tested on multiple experimental datasets, MBBC outperformed two state-of-the-art taxonomy-independent methods, in terms of the accuracy of the estimated species number, genome sizes, and percentages of correctly assigned reads, among other metrics. CONCLUSIONS We have developed a novel method for binning metagenomic reads based on clustering. This method is demonstrated to reliably predict species numbers, genome sizes, relative species abundances, and k-mer coverage in simple datasets. Our method also has a high accuracy in read binning. The MBBC software is freely available at http://eecs.ucf.edu/~xiaoman/MBBC/MBBC.html .
Collapse
Affiliation(s)
- Ying Wang
- Department of Electric Engineering and Computer Science, University of Central Florida, Orlando, FL, 32816, USA.
| | - Haiyan Hu
- Department of Electric Engineering and Computer Science, University of Central Florida, Orlando, FL, 32816, USA.
| | - Xiaoman Li
- Department of Electric Engineering and Computer Science, University of Central Florida, Orlando, FL, 32816, USA.
- Burnett School of Biomedical Science, University of Central Florida, Orlando, FL, 32816, USA.
| |
Collapse
|
5
|
Yao Q, Gao J, Bollinger C, Thelen JJ, Xu D. Predicting and analyzing protein phosphorylation sites in plants using musite. FRONTIERS IN PLANT SCIENCE 2012; 3:186. [PMID: 22934099 PMCID: PMC3423629 DOI: 10.3389/fpls.2012.00186] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/01/2012] [Accepted: 07/31/2012] [Indexed: 05/29/2023]
Abstract
Although protein phosphorylation sites can be reliably identified with high-resolution mass spectrometry, the experimental approach is time-consuming and resource-dependent. Furthermore, it is unlikely that an experimental approach could catalog an entire phosphoproteome. Computational prediction of phosphorylation sites provides an efficient and flexible way to reveal potential phosphorylation sites and provide hypotheses in experimental design. Musite is a tool that we previously developed to predict phosphorylation sites based solely on protein sequence. However, it was not comprehensively applied to plants. In this study, the phosphorylation data from Arabidopsis thaliana, B. napus, G. max, M. truncatula, O. sativa, and Z. mays were collected for cross-species testing and the overall plant-specific prediction as well. The results show that the model for A. thaliana can be extended to other organisms, and the overall plant model from Musite outperforms the current plant-specific prediction tools, Plantphos, and PhosphAt, in prediction accuracy. Furthermore, a comparative study of predicted phosphorylation sites across orthologs among different plants was conducted to reveal potential evolutionary features. A bipolar distribution of isolated, non-conserved phosphorylation sites, and highly conserved ones in terms of the amino acid type was observed. It also shows that predicted phosphorylation sites conserved within orthologs do not necessarily share more sequence similarity in the flanking regions than the background, but they often inherit protein disorder, a property that does not necessitate high sequence conservation. Our analysis also suggests that the phosphorylation frequencies among serine, threonine, and tyrosine correlate with their relative proportion in disordered regions. Musite can be used as a web server (http://musite.net) or downloaded as an open-source standalone tool (http://musite.sourceforge.net/).
Collapse
Affiliation(s)
- Qiuming Yao
- Department of Computer Science, University of MissouriColumbia, MO, USA
- Bond Life Science Center, University of MissouriColumbia, MO, USA
| | - Jianjiong Gao
- Computational Biology Center, Memorial Sloan-Kettering Cancer CenterNew York, NY, USA
| | - Curtis Bollinger
- Department of Computer Science, University of MissouriColumbia, MO, USA
- Bond Life Science Center, University of MissouriColumbia, MO, USA
| | - Jay J. Thelen
- Bond Life Science Center, University of MissouriColumbia, MO, USA
- Department of Biochemistry, University of MissouriColumbia, MO, USA
| | - Dong Xu
- Department of Computer Science, University of MissouriColumbia, MO, USA
- Bond Life Science Center, University of MissouriColumbia, MO, USA
| |
Collapse
|
6
|
Teng M, Balch C, Liu Y, Li M, Huang THM, Wang Y, Nephew KP, Li L. The influence of cis-regulatory elements on DNA methylation fidelity. PLoS One 2012; 7:e32928. [PMID: 22412954 PMCID: PMC3295790 DOI: 10.1371/journal.pone.0032928] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2011] [Accepted: 02/05/2012] [Indexed: 12/22/2022] Open
Abstract
It is now established that, as compared to normal cells, the cancer cell genome has an overall inverse distribution of DNA methylation (“methylome”), i.e., predominant hypomethylation and localized hypermethylation, within “CpG islands” (CGIs). Moreover, although cancer cells have reduced methylation “fidelity” and genomic instability, accurate maintenance of aberrant methylomes that underlie malignant phenotypes remains necessary. However, the mechanism(s) of cancer methylome maintenance remains largely unknown. Here, we assessed CGI methylation patterns propagated over 1, 3, and 5 divisions of A2780 ovarian cancer cells, concurrent with exposure to the DNA cross-linking chemotherapeutic cisplatin, and observed cell generation-successive increases in total hyper- and hypo-methylated CGIs. Empirical Bayesian modeling revealed five distinct modes of methylation propagation: (1) heritable (i.e., unchanged) high- methylation (1186 probe loci in CGI microarray); (2) heritable (i.e., unchanged) low-methylation (286 loci); (3) stochastic hypermethylation (i.e., progressively increased, 243 loci); (4) stochastic hypomethylation (i.e., progressively decreased, 247 loci); and (5) considerable “random” methylation (582 loci). These results support a “stochastic model” of DNA methylation equilibrium deriving from the efficiency of two distinct processes, methylation maintenance and de novo methylation. A role for cis-regulatory elements in methylation fidelity was also demonstrated by highly significant (p<2.2×10−5) enrichment of transcription factor binding sites in CGI probe loci showing heritably high (118 elements) and low (47 elements) methylation, and also in loci demonstrating stochastic hyper-(30 elements) and hypo-(31 elements) methylation. Notably, loci having “random” methylation heritability displayed nearly no enrichment. These results demonstrate an influence of cis-regulatory elements on the nonrandom propagation of both strictly heritable and stochastically heritable CGIs.
Collapse
Affiliation(s)
- Mingxiang Teng
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin, Heilongjiang, China
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
| | - Curt Balch
- Medical Sciences Program, Indiana University, Bloomington, Indiana, United States of America
- Indiana University Melvin and Bren Simon Cancer, Indianapolis, Indiana, United States of America
| | - Yunlong Liu
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
- Indiana University Melvin and Bren Simon Cancer, Indianapolis, Indiana, United States of America
| | - Meng Li
- Medical Sciences Program, Indiana University, Bloomington, Indiana, United States of America
| | - Tim H. M. Huang
- Department of Molecular Virology, Immunology and Medical Genetics, Comprehensive Cancer Center, The Ohio State University, Columbus, Ohio, United States of America
| | - Yadong Wang
- Harbin Institute of Technology, School of Computer Science and Technology, Harbin, Heilongjiang, China
- * E-mail: (YW); (KPN); (LL)
| | - Kenneth P. Nephew
- Medical Sciences Program, Indiana University, Bloomington, Indiana, United States of America
- Indiana University Melvin and Bren Simon Cancer, Indianapolis, Indiana, United States of America
- Departments of Cellular and Integrative Physiology and Obstetrics and Gynecology, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
- * E-mail: (YW); (KPN); (LL)
| | - Lang Li
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
- Indiana University Melvin and Bren Simon Cancer, Indianapolis, Indiana, United States of America
- Indiana Institute of Personalized Medicine, Departments of Cellular and Integrative Physiology and Obstetrics and Gynecology, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
- * E-mail: (YW); (KPN); (LL)
| |
Collapse
|
7
|
Shen C, Huang Y, Liu Y, Wang G, Zhao Y, Wang Z, Teng M, Wang Y, Flockhart DA, Skaar TC, Yan P, Nephew KP, Huang THM, Li L. A modulated empirical Bayes model for identifying topological and temporal estrogen receptor α regulatory networks in breast cancer. BMC SYSTEMS BIOLOGY 2011; 5:67. [PMID: 21554733 PMCID: PMC3117732 DOI: 10.1186/1752-0509-5-67] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/24/2010] [Accepted: 05/09/2011] [Indexed: 12/27/2022]
Abstract
BACKGROUND Estrogens regulate diverse physiological processes in various tissues through genomic and non-genomic mechanisms that result in activation or repression of gene expression. Transcription regulation upon estrogen stimulation is a critical biological process underlying the onset and progress of the majority of breast cancer. Dynamic gene expression changes have been shown to characterize the breast cancer cell response to estrogens, the every molecular mechanism of which is still not well understood. RESULTS We developed a modulated empirical Bayes model, and constructed a novel topological and temporal transcription factor (TF) regulatory network in MCF7 breast cancer cell line upon stimulation by 17β-estradiol stimulation. In the network, significant TF genomic hubs were identified including ER-alpha and AP-1; significant non-genomic hubs include ZFP161, TFDP1, NRF1, TFAP2A, EGR1, E2F1, and PITX2. Although the early and late networks were distinct (<5% overlap of ERα target genes between the 4 and 24 h time points), all nine hubs were significantly represented in both networks. In MCF7 cells with acquired resistance to tamoxifen, the ERα regulatory network was unresponsive to 17β-estradiol stimulation. The significant loss of hormone responsiveness was associated with marked epigenomic changes, including hyper- or hypo-methylation of promoter CpG islands and repressive histone methylations. CONCLUSIONS We identified a number of estrogen regulated target genes and established estrogen-regulated network that distinguishes the genomic and non-genomic actions of estrogen receptor. Many gene targets of this network were not active anymore in anti-estrogen resistant cell lines, possibly because their DNA methylation and histone acetylation patterns have changed.
Collapse
Affiliation(s)
- Changyu Shen
- Center for Computational Biology, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Division of Biostatistics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Indiana University Melvin and Bren Simon Cancer Center, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Yiwen Huang
- Division of Human Cancer Genetics, Ohio State University, Columbus, OH, 43210, USA
- Department of Molecular Virology, Immunology, and Medical Genetics, Ohio State University, Columbus, OH, 43210, USA
- Comprehensive Cancer Center, Ohio State University, Columbus, OH, 43210, USA
| | - Yunlong Liu
- Center for Computational Biology, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Division of Biostatistics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Indiana University Melvin and Bren Simon Cancer Center, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Center for Medical Genomics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Guohua Wang
- Center for Computational Biology, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001, China
| | - Yuming Zhao
- Center for Computational Biology, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Information and Computer Engineering College, Northeast Forestry University, Harbin, Heilongjiang, 150001, China
| | - Zhiping Wang
- Center for Computational Biology, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Division of Biostatistics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Mingxiang Teng
- Center for Computational Biology, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001, China
| | - David A Flockhart
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Division of Clinical Pharmacology, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Indiana University Melvin and Bren Simon Cancer Center, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Todd C Skaar
- Division of Clinical Pharmacology, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Indiana University Melvin and Bren Simon Cancer Center, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Pearlly Yan
- Division of Human Cancer Genetics, Ohio State University, Columbus, OH, 43210, USA
- Department of Molecular Virology, Immunology, and Medical Genetics, Ohio State University, Columbus, OH, 43210, USA
- Comprehensive Cancer Center, Ohio State University, Columbus, OH, 43210, USA
| | - Kenneth P Nephew
- Indiana University Melvin and Bren Simon Cancer Center, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Departments of Cellular and Integrative Physiology, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Medical Sciences, Indiana University School of Medicine, Bloomington, IN, 47405, USA
| | - Tim HM Huang
- Division of Human Cancer Genetics, Ohio State University, Columbus, OH, 43210, USA
- Department of Molecular Virology, Immunology, and Medical Genetics, Ohio State University, Columbus, OH, 43210, USA
- Comprehensive Cancer Center, Ohio State University, Columbus, OH, 43210, USA
| | - Lang Li
- Center for Computational Biology, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Division of Biostatistics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Division of Clinical Pharmacology, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Indiana University Melvin and Bren Simon Cancer Center, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| |
Collapse
|
8
|
Waclawovsky AJ, Sato PM, Lembke CG, Moore PH, Souza GM. Sugarcane for bioenergy production: an assessment of yield and regulation of sucrose content. PLANT BIOTECHNOLOGY JOURNAL 2010; 8:263-76. [PMID: 20388126 DOI: 10.1111/j.1467-7652.2009.00491.x] [Citation(s) in RCA: 144] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
An increasing number of plant scientists, including breeders, agronomists, physiologists and molecular biologists, are working towards the development of new and improved energy crops. Research is increasingly focused on how to design crops specifically for bioenergy production and increased biomass generation for biofuel purposes. The most important biofuel to date is bioethanol produced from sugars (sucrose and starch). Second generation bioethanol is also being targeted for studies to allow the use of the cell wall (lignocellulose) as a source of carbon. If a crop is to be used for bioenergy production, the crop should be high yielding, fast growing, low lignin content and requiring relatively small energy inputs for its growth and harvest. Obtaining high yields in nonprime agricultural land is a key for energy crop development to allow sustainability and avoid competition with food production. Sugarcane is the most efficient bioenergy crop of tropical and subtropical regions, and biotechnological tools for the improvement of this crop are advancing rapidly. We focus this review on the studies of sugarcane genes associated with sucrose content, biomass and cell wall metabolism and the preliminary physiological characterization of cultivars that contrast for sugar and biomass yield.
Collapse
|
9
|
Yokoyama KD, Ohler U, Wray GA. Measuring spatial preferences at fine-scale resolution identifies known and novel cis-regulatory element candidates and functional motif-pair relationships. Nucleic Acids Res 2009; 37:e92. [PMID: 19483094 PMCID: PMC2715254 DOI: 10.1093/nar/gkp423] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
Transcriptional regulation is mediated by the collective binding of proteins called transcription factors to cis-regulatory elements. A handful of factors are known to function at particular distances from the transcription start site, although the extent to which this occurs is not well understood. Spatial dependencies can also exist between pairs of binding motifs, facilitating factor-pair interactions. We sought to determine to what extent spatial preferences measured at high-scale resolution could be utilized to predict cis-regulatory elements as well as motif-pairs binding interacting proteins. We introduce the ‘motif positional function’ model which predicts spatial biases using regression analysis, differentiating noise from true position-specific overrepresentation at single-nucleotide resolution. Our method predicts 48 consensus motifs exhibiting positional enrichment within human promoters, including fourteen motifs without known binding partners. We then extend the model to analyze distance preferences between pairs of motifs. We find that motif-pairs binding interacting factors often co-occur preferentially at multiple distances, with intervals between preferred distances often corresponding to the turn of the DNA double-helix. This offers a novel means by which to predict sequence elements with a collective role in gene regulation.
Collapse
Affiliation(s)
- Ken Daigoro Yokoyama
- Biology Department, Institute for Genome Sciences and Policy, Duke University, Durham, NC 27708, USA
| | | | | |
Collapse
|
10
|
Srinivasan BS, Chen J, Cheng C, Conti D, Duan S, Fridley BL, Gu X, Haines JL, Jorgenson E, Kraja A, Lasky–Su J, Li L, Rodin A, Wang D, Province M, Ritchie MD. Methods for analysis in pharmacogenomics: lessons from the Pharmacogenetics Research Network Analysis Group. Pharmacogenomics 2009; 10:243-51. [PMID: 19207025 PMCID: PMC2737060 DOI: 10.2217/14622416.10.2.243] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Each year, the Pharmacogenetics Research Network (PGRN) holds an analysis workshop for the members of the PGRN to share new methodologies, study design approaches and to discuss real data applications. This event is closed to members of the PGRN, but the methods presented are relevant to others conducting pharmacogenomics research. This special report describes many of the novel approaches discussed at the workshop and provides a resource for investigators in the field performing pharmacogenomics data analysis. While the focus is pharmacogenomics, the methods discussed are far ranging and have relevance to all types of genetic association studies: identifying noncoding variants and tag-SNPs, haplotype analysis, multivariate techniques, quantitative trait analysis, gene-gene and gene-environment interactions, and genome-wide association studies. The goal is to introduce readers to the topics discussed at the workshop and provide a direction for future development of analysis tools and methods for analysis of pharmacogenomic data.
Collapse
Affiliation(s)
| | | | - Cheng Cheng
- St Jude Children’s Research Hospital, TN, USA
| | | | | | | | | | | | | | - Aldi Kraja
- Washington University School of Medicine, MO, USA
| | | | | | | | - Dai Wang
- Cedars–Sinai Medical Center, CA, USA
| | | | - Marylyn D Ritchie
- Vanderbilt University Medical Center, Nashville, TN, USA, Tel.: +1 615 343 5851; Fax: +1 615 343 8619;
| |
Collapse
|
11
|
Feng W, Liu Y, Wu J, Nephew KP, Huang THM, Li L. A Poisson mixture model to identify changes in RNA polymerase II binding quantity using high-throughput sequencing technology. BMC Genomics 2008; 9 Suppl 2:S23. [PMID: 18831789 PMCID: PMC2559888 DOI: 10.1186/1471-2164-9-s2-s23] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
We present a mixture model-based analysis for identifying differences in the distribution of RNA polymerase II (Pol II) in transcribed regions, measured using ChIP-seq (chromatin immunoprecipitation following massively parallel sequencing technology). The statistical model assumes that the number of Pol II-targeted sequences contained within each genomic region follows a Poisson distribution. A Poisson mixture model was then developed to distinguish Pol II binding changes in transcribed region using an empirical approach and an expectation-maximization (EM) algorithm developed for estimation and inference. In order to achieve a global maximum in the M-step, a particle swarm optimization (PSO) was implemented. We applied this model to Pol II binding data generated from hormone-dependent MCF7 breast cancer cells and antiestrogen-resistant MCF7 breast cancer cells before and after treatment with 17beta-estradiol (E2). We determined that in the hormone-dependent cells, approximately 9.9% (2527) genes showed significant changes in Pol II binding after E2 treatment. However, only approximately 0.7% (172) genes displayed significant Pol II binding changes in E2-treated antiestrogen-resistant cells. These results show that a Poisson mixture model can be used to analyze ChIP-seq data.
Collapse
Affiliation(s)
- Weixing Feng
- Division of Biostatistics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- College of Automation, Harbin Engineering University, Harbin, Heilongjiang 150001 PR China
| | - Yunlong Liu
- Division of Biostatistics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Center for Medical Genomics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Jiejun Wu
- Medical Sciences, Indiana University School of Medicine, Bloomington, IN 47405, USA
| | - Kenneth P Nephew
- Medical Sciences, Indiana University School of Medicine, Bloomington, IN 47405, USA
- Departments of Cellular and Integrative Physiology, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- IU Simon Cancer Center, Indianapolis, IN 46202, USA
| | - Tim HM Huang
- Division of Human Cancer Genetics, Department of Molecular Virology, Immunology, and Medical Genetics, Comprehensive Cancer Center, Ohio State University, Columbus, OH 43210, USA
| | - Lang Li
- Division of Biostatistics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- IU Simon Cancer Center, Indianapolis, IN 46202, USA
| |
Collapse
|