151
|
Zhao W, Chen JJ, Perkins R, Wang Y, Liu Z, Hong H, Tong W, Zou W. Erratum to: A novel procedure on next generation sequencing data analysis using text mining algorithm. BMC Bioinformatics 2016; 17:301. [PMID: 27489012 PMCID: PMC4972985 DOI: 10.1186/s12859-016-1156-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
152
|
Mansouri K, Abdelaziz A, Rybacka A, Roncaglioni A, Tropsha A, Varnek A, Zakharov A, Worth A, Richard AM, Grulke CM, Trisciuzzi D, Fourches D, Horvath D, Benfenati E, Muratov E, Wedebye EB, Grisoni F, Mangiatordi GF, Incisivo GM, Hong H, Ng HW, Tetko IV, Balabin I, Kancherla J, Shen J, Burton J, Nicklaus M, Cassotti M, Nikolov NG, Nicolotti O, Andersson PL, Zang Q, Politi R, Beger RD, Todeschini R, Huang R, Farag S, Rosenberg SA, Slavov S, Hu X, Judson RS. CERAPP: Collaborative Estrogen Receptor Activity Prediction Project. ENVIRONMENTAL HEALTH PERSPECTIVES 2016; 124:1023-33. [PMID: 26908244 PMCID: PMC4937869 DOI: 10.1289/ehp.1510267] [Citation(s) in RCA: 225] [Impact Index Per Article: 28.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Revised: 10/05/2015] [Accepted: 02/08/2016] [Indexed: 05/18/2023]
Abstract
BACKGROUND Humans are exposed to thousands of man-made chemicals in the environment. Some chemicals mimic natural endocrine hormones and, thus, have the potential to be endocrine disruptors. Most of these chemicals have never been tested for their ability to interact with the estrogen receptor (ER). Risk assessors need tools to prioritize chemicals for evaluation in costly in vivo tests, for instance, within the U.S. EPA Endocrine Disruptor Screening Program. OBJECTIVES We describe a large-scale modeling project called CERAPP (Collaborative Estrogen Receptor Activity Prediction Project) and demonstrate the efficacy of using predictive computational models trained on high-throughput screening data to evaluate thousands of chemicals for ER-related activity and prioritize them for further testing. METHODS CERAPP combined multiple models developed in collaboration with 17 groups in the United States and Europe to predict ER activity of a common set of 32,464 chemical structures. Quantitative structure-activity relationship models and docking approaches were employed, mostly using a common training set of 1,677 chemical structures provided by the U.S. EPA, to build a total of 40 categorical and 8 continuous models for binding, agonist, and antagonist ER activity. All predictions were evaluated on a set of 7,522 chemicals curated from the literature. To overcome the limitations of single models, a consensus was built by weighting models on scores based on their evaluated accuracies. RESULTS Individual model scores ranged from 0.69 to 0.85, showing high prediction reliabilities. Out of the 32,464 chemicals, the consensus model predicted 4,001 chemicals (12.3%) as high priority actives and 6,742 potential actives (20.8%) to be considered for further testing. CONCLUSION This project demonstrated the possibility to screen large libraries of chemicals using a consensus of different in silico approaches. This concept will be applied in future projects related to other end points. CITATION Mansouri K, Abdelaziz A, Rybacka A, Roncaglioni A, Tropsha A, Varnek A, Zakharov A, Worth A, Richard AM, Grulke CM, Trisciuzzi D, Fourches D, Horvath D, Benfenati E, Muratov E, Wedebye EB, Grisoni F, Mangiatordi GF, Incisivo GM, Hong H, Ng HW, Tetko IV, Balabin I, Kancherla J, Shen J, Burton J, Nicklaus M, Cassotti M, Nikolov NG, Nicolotti O, Andersson PL, Zang Q, Politi R, Beger RD, Todeschini R, Huang R, Farag S, Rosenberg SA, Slavov S, Hu X, Judson RS. 2016. CERAPP Collaborative Estrogen Receptor Activity Prediction Project. Environ Health Perspect 124:1023-1033; http://dx.doi.org/10.1289/ehp.1510267.
Collapse
|
153
|
Sakkiah S, Ng HW, Tong W, Hong H. Structures of androgen receptor bound with ligands: advancing understanding of biological functions and drug discovery. Expert Opin Ther Targets 2016; 20:1267-82. [PMID: 27195510 DOI: 10.1080/14728222.2016.1192131] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
INTRODUCTION Androgen receptor (AR) is a ligand-dependent transcription factor and a member of the nuclear receptor superfamily. It plays a vital role in male sexual development and regulates gene expression in various tissues, including prostate. Androgens are compounds that exert their biological effects via interaction with AR. Binding of androgens to AR initiates conformational changes in AR that affect binding of co-regulator proteins and DNA. AR agonists and antagonists are widely used in a variety of clinical applications (i.e. hypogonadism and prostate cancer therapy). AREAS COVERED This review provides a close look at structures of AR-ligand complexes and mutations in the receptor that have been revealed, discusses current challenges in the field, and sheds light on future directions. EXPERT OPINION AR is one of the primary targets for the treatment of prostate cancer, as AR antagonists inhibit prostate cancer growth. However, these drugs are not effective for long-term treatment and lead to castration-resistant prostate cancer. The structures of AR-ligand complexes are an invaluable scientific asset that enhances our understanding of biological functions and mechanisms of androgenic and anti-androgenic chemicals as well as promotes the discovery of superior drug candidates.
Collapse
|
154
|
Shin E, Hong H, Park J, Oh Y, Jung J, Lee Y. Characterization of Staphylococcus aureus faecal isolates associated with food-borne disease in Korea. J Appl Microbiol 2016; 121:277-86. [PMID: 26991816 DOI: 10.1111/jam.13133] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2015] [Revised: 02/17/2016] [Accepted: 02/26/2016] [Indexed: 11/30/2022]
Abstract
AIMS To characterize Staphylococcus aureus faecal isolates from people suspected to be infected with food poisoning by using antimicrobial susceptibility testing and molecular techniques. METHODS AND RESULTS A total of 340 Staph. aureus isolates from 6226 people suspected to be infected with food poisoning were identified and characterized by biochemical methods, antimicrobial susceptibility testing and PCR. Samples were obtained from January 2006 to December 2008 from the National Notifiable Diseases Surveillance System at the Research Institute of Public Health and Environment in Seoul Metropolitan, Korea. All strains carried at least one of the eight staphylococcal enterotoxin (se) genes tested and a total of 27 se profiles were produced; the most frequent se profile was seg-sei and the next was sea. Among the total isolates, 36 methicillin-resistant Staphylococcus aureus (MRSAs) isolates were further analysed by multilocus sequence typing (MLST), Staphylococcal cassette chromosome mec (SCCmec) typing, pulsed-field gel electrophoresis (PFGE) and PCR detection for pvl. ST72-SCCmec type IV was the most predominant clone (27 isolates, 75%) followed by ST1-SCCmec type IV (five isolates, 13·8%), ST20-SCCmec type IV (one isolate, 2·8%), ST493-SCCmec type IV (one isolate, 2·8%), ST903-SCCmec type IV (one isolate, 2·8%) and ST5-SCCmec type II (one isolate, 2·8%). By PFGE typing, MRSAs isolated during the same period were grouped together although they were isolated from different regions. None of MRSAs had PVL gene and nine MRSAs were multidrug resistant. CONCLUSIONS Analysis of MRSAs by MLST, SCCmec typing, PFGE and pvl detection showed that the majority of strain associated with food-borne diseases belonged to a Korean community-acquired (CA) MRSA clone with ST72-SCCmec type IV-PVL negative-SEG/SEI and its variations while one strain was hospital-acquired (HA) MRSA. SIGNIFICANCE AND IMPACT OF THE STUDY CA-MRSA clone which possessed ST72-SCCmec type IV-PVL negative-SEG/SEI was spread most commonly among MRSAs that were associated with food-borne diseases. This is the first report of ST903 strain in Korea.
Collapse
|
155
|
Zhao W, Chen JJ, Perkins R, Wang Y, Liu Z, Hong H, Tong W, Zou W. A novel procedure on next generation sequencing data analysis using text mining algorithm. BMC Bioinformatics 2016; 17:213. [PMID: 27177941 PMCID: PMC4866036 DOI: 10.1186/s12859-016-1075-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2015] [Accepted: 05/07/2016] [Indexed: 08/30/2023] Open
Abstract
Background Next-generation sequencing (NGS) technologies have provided researchers with vast possibilities in various biological and biomedical research areas. Efficient data mining strategies are in high demand for large scale comparative and evolutional studies to be performed on the large amounts of data derived from NGS projects. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Methods We report a novel procedure to analyse NGS data using topic modeling. It consists of four major procedures: NGS data retrieval, preprocessing, topic modeling, and data mining using Latent Dirichlet Allocation (LDA) topic outputs. The NGS data set of the Salmonella enterica strains were used as a case study to show the workflow of this procedure. The perplexity measurement of the topic numbers and the convergence efficiencies of Gibbs sampling were calculated and discussed for achieving the best result from the proposed procedure. Results The output topics by LDA algorithms could be treated as features of Salmonella strains to accurately describe the genetic diversity of fliC gene in various serotypes. The results of a two-way hierarchical clustering and data matrix analysis on LDA-derived matrices successfully classified Salmonella serotypes based on the NGS data. The implementation of topic modeling in NGS data analysis procedure provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data. Conclusion The implementation of topic modeling in NGS data analysis provides a new way to elucidate genetic information from NGS data, and identify the gene-phenotype relationships and biomarkers, especially in the era of biological and medical big data. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1075-9) contains supplementary material, which is available to authorized users.
Collapse
|
156
|
Xiao W, Wu L, Yavas G, Simonyan V, Ning B, Hong H. Challenges, Solutions, and Quality Metrics of Personal Genome Assembly in Advancing Precision Medicine. Pharmaceutics 2016; 8:E15. [PMID: 27110816 PMCID: PMC4932478 DOI: 10.3390/pharmaceutics8020015] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2015] [Revised: 03/11/2016] [Accepted: 04/06/2016] [Indexed: 01/15/2023] Open
Abstract
Even though each of us shares more than 99% of the DNA sequences in our genome, there are millions of sequence codes or structure in small regions that differ between individuals, giving us different characteristics of appearance or responsiveness to medical treatments. Currently, genetic variants in diseased tissues, such as tumors, are uncovered by exploring the differences between the reference genome and the sequences detected in the diseased tissue. However, the public reference genome was derived with the DNA from multiple individuals. As a result of this, the reference genome is incomplete and may misrepresent the sequence variants of the general population. The more reliable solution is to compare sequences of diseased tissue with its own genome sequence derived from tissue in a normal state. As the price to sequence the human genome has dropped dramatically to around $1000, it shows a promising future of documenting the personal genome for every individual. However, de novo assembly of individual genomes at an affordable cost is still challenging. Thus, till now, only a few human genomes have been fully assembled. In this review, we introduce the history of human genome sequencing and the evolution of sequencing platforms, from Sanger sequencing to emerging "third generation sequencing" technologies. We present the currently available de novo assembly and post-assembly software packages for human genome assembly and their requirements for computational infrastructures. We recommend that a combined hybrid assembly with long and short reads would be a promising way to generate good quality human genome assemblies and specify parameters for the quality assessment of assembly outcomes. We provide a perspective view of the benefit of using personal genomes as references and suggestions for obtaining a quality personal genome. Finally, we discuss the usage of the personal genome in aiding vaccine design and development, monitoring host immune-response, tailoring drug therapy and detecting tumors. We believe the precision medicine would largely benefit from bioinformatics solutions, particularly for personal genome assembly.
Collapse
|
157
|
Ye H, Luo H, Ng HW, Meehan J, Ge W, Tong W, Hong H. Applying network analysis and Nebula (neighbor-edges based and unbiased leverage algorithm) to ToxCast data. ENVIRONMENT INTERNATIONAL 2016; 89-90:81-92. [PMID: 26826365 DOI: 10.1016/j.envint.2016.01.010] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2015] [Revised: 01/08/2016] [Accepted: 01/13/2016] [Indexed: 06/05/2023]
Abstract
BACKGROUND ToxCast data have been used to develop models for predicting in vivo toxicity. To predict the in vivo toxicity of a new chemical using a ToxCast data based model, its ToxCast bioactivity data are needed but not normally available. The capability of predicting ToxCast bioactivity data is necessary to fully utilize ToxCast data in the risk assessment of chemicals. OBJECTIVES We aimed to understand and elucidate the relationships between the chemicals and bioactivity data of the assays in ToxCast and to develop a network analysis based method for predicting ToxCast bioactivity data. METHODS We conducted modularity analysis on a quantitative network constructed from ToxCast data to explore the relationships between the assays and chemicals. We further developed Nebula (neighbor-edges based and unbiased leverage algorithm) for predicting ToxCast bioactivity data. RESULTS Modularity analysis on the network constructed from ToxCast data yielded seven modules. Assays and chemicals in the seven modules were distinct. Leave-one-out cross-validation yielded a Q(2) of 0.5416, indicating ToxCast bioactivity data can be predicted by Nebula. Prediction domain analysis showed some types of ToxCast assay data could be more reliably predicted by Nebula than others. CONCLUSIONS Network analysis is a promising approach to understand ToxCast data. Nebula is an effective algorithm for predicting ToxCast bioactivity data, helping fully utilize ToxCast data in the risk assessment of chemicals.
Collapse
|
158
|
Hong H, Shen J, Ng HW, Sakkiah S, Ye H, Ge W, Gong P, Xiao W, Tong W. A Rat α-Fetoprotein Binding Activity Prediction Model to Facilitate Assessment of the Endocrine Disruption Potential of Environmental Chemicals. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2016; 13:372. [PMID: 27023588 PMCID: PMC4847034 DOI: 10.3390/ijerph13040372] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Revised: 03/10/2016] [Accepted: 03/22/2016] [Indexed: 11/21/2022]
Abstract
Endocrine disruptors such as polychlorinated biphenyls (PCBs), diethylstilbestrol (DES) and dichlorodiphenyltrichloroethane (DDT) are agents that interfere with the endocrine system and cause adverse health effects. Huge public health concern about endocrine disruptors has arisen. One of the mechanisms of endocrine disruption is through binding of endocrine disruptors with the hormone receptors in the target cells. Entrance of endocrine disruptors into target cells is the precondition of endocrine disruption. The binding capability of a chemical with proteins in the blood affects its entrance into the target cells and, thus, is very informative for the assessment of potential endocrine disruption of chemicals. α-fetoprotein is one of the major serum proteins that binds to a variety of chemicals such as estrogens. To better facilitate assessment of endocrine disruption of environmental chemicals, we developed a model for α-fetoprotein binding activity prediction using the novel pattern recognition method (Decision Forest) and the molecular descriptors calculated from two-dimensional structures by Mold² software. The predictive capability of the model has been evaluated through internal validation using 125 training chemicals (average balanced accuracy of 69%) and external validations using 22 chemicals (balanced accuracy of 71%). Prediction confidence analysis revealed the model performed much better at high prediction confidence. Our results indicate that the model is useful (when predictions are in high confidence) in endocrine disruption risk assessment of environmental chemicals though improvement by increasing number of training chemicals is needed.
Collapse
|
159
|
Xu J, Gong B, Wu L, Thakkar S, Hong H, Tong W. Comprehensive Assessments of RNA-seq by the SEQC Consortium: FDA-Led Efforts Advance Precision Medicine. Pharmaceutics 2016; 8:E8. [PMID: 26999190 PMCID: PMC4810084 DOI: 10.3390/pharmaceutics8010008] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2016] [Revised: 03/08/2016] [Accepted: 03/10/2016] [Indexed: 01/22/2023] Open
Abstract
Studies on gene expression in response to therapy have led to the discovery of pharmacogenomics biomarkers and advances in precision medicine. Whole transcriptome sequencing (RNA-seq) is an emerging tool for profiling gene expression and has received wide adoption in the biomedical research community. However, its value in regulatory decision making requires rigorous assessment and consensus between various stakeholders, including the research community, regulatory agencies, and industry. The FDA-led SEquencing Quality Control (SEQC) consortium has made considerable progress in this direction, and is the subject of this review. Specifically, three RNA-seq platforms (Illumina HiSeq, Life Technologies SOLiD, and Roche 454) were extensively evaluated at multiple sites to assess cross-site and cross-platform reproducibility. The results demonstrated that relative gene expression measurements were consistently comparable across labs and platforms, but not so for the measurement of absolute expression levels. As part of the quality evaluation several studies were included to evaluate the utility of RNA-seq in clinical settings and safety assessment. The neuroblastoma study profiled tumor samples from 498 pediatric neuroblastoma patients by both microarray and RNA-seq. RNA-seq offers more utilities than microarray in determining the transcriptomic characteristics of cancer. However, RNA-seq and microarray-based models were comparable in clinical endpoint prediction, even when including additional features unique to RNA-seq beyond gene expression. The toxicogenomics study compared microarray and RNA-seq profiles of the liver samples from rats exposed to 27 different chemicals representing multiple toxicity modes of action. Cross-platform concordance was dependent on chemical treatment and transcript abundance. Though both RNA-seq and microarray are suitable for developing gene expression based predictive models with comparable prediction performance, RNA-seq offers advantages over microarray in profiling genes with low expression. The rat BodyMap study provided a comprehensive rat transcriptomic body map by performing RNA-Seq on 320 samples from 11 organs in either sex of juvenile, adolescent, adult and aged Fischer 344 rats. Lastly, the transferability study demonstrated that signature genes of predictive models are reciprocally transferable between microarray and RNA-seq data for model development using a comprehensive approach with two large clinical data sets. This result suggests continued usefulness of legacy microarray data in the coming RNA-seq era. In conclusion, the SEQC project enhances our understanding of RNA-seq and provides valuable guidelines for RNA-seq based clinical application and safety evaluation to advance precision medicine.
Collapse
|
160
|
Naranbhai V, de Assis Rosa D, Werner L, Moodley R, Hong H, Kharsany A, Mlisana K, Sibeko S, Garrett N, Chopera D, Carr WH, Abdool Karim Q, Hill AVS, Abdool Karim SS, Altfeld M, Gray CM, Ndung'u T. Killer-cell Immunoglobulin-like Receptor (KIR) gene profiles modify HIV disease course, not HIV acquisition in South African women. BMC Infect Dis 2016; 16:27. [PMID: 26809736 PMCID: PMC4727384 DOI: 10.1186/s12879-016-1361-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2015] [Accepted: 01/18/2016] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Killer-cell Immunoglobulin-like Receptors (KIR) interact with Human Leukocyte Antigen (HLA) to modify natural killer- and T-cell function. KIR are implicated in HIV acquisition by small studies that have not been widely replicated. A role for KIR in HIV disease progression is more widely replicated and supported by functional studies. METHODS To assess the role of KIR and KIR ligands in HIV acquisition and disease course, we studied at-risk women in South Africa between 2004-2010. Logistic regression was used for nested case-control analysis of 154 women who acquired vs. 155 who did not acquire HIV, despite high exposure. Linear mixed-effects models were used for cohort analysis of 139 women followed prospectively for a median of 54 months (IQR 31-69) until 2014. RESULTS Neither KIR repertoires nor HLA alleles were associated with HIV acquisition. However, KIR haplotype BB was associated with lower viral loads (-0.44 log10 copies/ml; SE = 0.18; p = 0.03) and higher CD4+ T-cell counts (+80 cells/μl; SE = 42; p = 0.04). This was largely explained by the protective effect of KIR2DL2/KIR2DS2 on the B haplotype and reciprocal detrimental effect of KIR2DL3 on the A haplotype. CONCLUSIONS Although neither KIR nor HLA appear to have a role in HIV acquisition, our data are consistent with involvement of KIR2DL2 in HIV control. Additional studies to replicate these findings are indicated.
Collapse
|
161
|
Abstract
Quantitative structure-activity relationship (QSAR) has been used in the scientific research community for many decades and applied to drug discovery and development in the industry. QSAR technologies are advancing fast and attracting possible applications in regulatory science. To facilitate the development of reliable QSAR models, the FDA had invested a lot of efforts in constructing chemical databases with a variety of efficacy and safety endpoint data, as well as in the development of computational algorithms. In this chapter, we briefly describe some of the often used databases developed at the FDA such as EDKB (Endocrine Disruptor Knowledge Base), EADB (Estrogenic Activity Database), LTKB (Liver Toxicity Knowledge Base), and CERES (Chemical Evaluation and Risk Estimation System) and the technologies adopted by the agency such as Mold(2) program for calculation of a large and diverse set of molecular descriptors and decision forest algorithm for QSAR model development. We also summarize some QSAR models that have been developed for safety evaluation of the FDA-regulated products.
Collapse
|
162
|
Ye H, Meehan J, Tong W, Hong H. Alignment of Short Reads: A Crucial Step for Application of Next-Generation Sequencing Data in Precision Medicine. Pharmaceutics 2015; 7:523-41. [PMID: 26610555 PMCID: PMC4695832 DOI: 10.3390/pharmaceutics7040523] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2015] [Revised: 11/14/2015] [Accepted: 11/17/2015] [Indexed: 02/06/2023] Open
Abstract
Precision medicine or personalized medicine has been proposed as a modernized and promising medical strategy. Genetic variants of patients are the key information for implementation of precision medicine. Next-generation sequencing (NGS) is an emerging technology for deciphering genetic variants. Alignment of raw reads to a reference genome is one of the key steps in NGS data analysis. Many algorithms have been developed for alignment of short read sequences since 2008. Users have to make a decision on which alignment algorithm to use in their studies. Selection of the right alignment algorithm determines not only the alignment algorithm but also the set of suitable parameters to be used by the algorithm. Understanding these algorithms helps in selecting the appropriate alignment algorithm for different applications in precision medicine. Here, we review current available algorithms and their major strategies such as seed-and-extend and q-gram filter. We also discuss the challenges in current alignment algorithms, including alignment in multiple repeated regions, long reads alignment and alignment facilitated with known genetic variants.
Collapse
|
163
|
Ng HW, Doughty SW, Luo H, Ye H, Ge W, Tong W, Hong H. Development and Validation of Decision Forest Model for Estrogen Receptor Binding Prediction of Chemicals Using Large Data Sets. Chem Res Toxicol 2015; 28:2343-51. [PMID: 26524122 DOI: 10.1021/acs.chemrestox.5b00358] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Some chemicals in the environment possess the potential to interact with the endocrine system in the human body. Multiple receptors are involved in the endocrine system; estrogen receptor α (ERα) plays very important roles in endocrine activity and is the most studied receptor. Understanding and predicting estrogenic activity of chemicals facilitates the evaluation of their endocrine activity. Hence, we have developed a decision forest classification model to predict chemical binding to ERα using a large training data set of 3308 chemicals obtained from the U.S. Food and Drug Administration's Estrogenic Activity Database. We tested the model using cross validations and external data sets of 1641 chemicals obtained from the U.S. Environmental Protection Agency's ToxCast project. The model showed good performance in both internal (92% accuracy) and external validations (∼ 70-89% relative balanced accuracies), where the latter involved the validations of the model across different ER pathway-related assays in ToxCast. The important features that contribute to the prediction ability of the model were identified through informative descriptor analysis and were related to current knowledge of ER binding. Prediction confidence analysis revealed that the model had both high prediction confidence and accuracy for most predicted chemicals. The results demonstrated that the model constructed based on the large training data set is more accurate and robust for predicting ER binding of chemicals than the published models that have been developed using much smaller data sets. The model could be useful for the evaluation of ERα-mediated endocrine activity potential of environmental chemicals.
Collapse
|
164
|
Wang Y, Liu Z, Zou W, Hong H, Fang H, Tong W. Molecular regulation of miRNAs and potential biomarkers in the progression of hepatic steatosis to NASH. Biomark Med 2015; 9:1189-200. [DOI: 10.2217/bmm.15.70] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Increasing evidence suggests that microRNAs regulate diverse biological functions in the liver and play a very important function in metabolic-related disorders such as nonalcoholic fatty liver disease via regulating their target genes expression. In this review, we summarized the most recent progress in identification of miRNAs involving in the progression of liver steatosis and discussed the possible mechanisms by which miRNAs contribute to the diverse pathogenic liver injuries. We provide insights into the functional network of miRNAs by connecting miRNAs, their targets and biological pathways associated to hepatic steatosis and fibrosis, with important implications for our understanding of phenotypic-based disease pathogenesis. We also discuss the possible roles and challenges of miRNAs as biomarkers for drug-induced liver injury.
Collapse
|
165
|
Gong P, Hong H, Perkins EJ. Ionotropic GABA receptor antagonism-induced adverse outcome pathways for potential neurotoxicity biomarkers. Biomark Med 2015; 9:1225-39. [PMID: 26508561 DOI: 10.2217/bmm.15.58] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Antagonism of ionotropic GABA receptors (iGABARs) can occur at three distinct types of receptor binding sites causing chemically induced epileptic seizures. Here we review three adverse outcome pathways, each characterized by a specific molecular initiating event where an antagonist competitively binds to active sites, negatively modulates allosteric sites or noncompetitively blocks ion channel on the iGABAR. This leads to decreased chloride conductance, followed by depolarization of affected neurons, epilepsy-related death and ultimately decreased population. Supporting evidence for causal linkages from the molecular to population levels is presented and differential sensitivity to iGABAR antagonists in different GABA receptors and organisms discussed. Adverse outcome pathways are poised to become important tools for linking mechanism-based biomarkers to regulated outcomes in next-generation risk assessment.
Collapse
|
166
|
Zhang C, Hong H, Mendrick DL, Tang Y, Cheng F. Biomarker-based drug safety assessment in the age of systems pharmacology: from foundational to regulatory science. Biomark Med 2015; 9:1241-52. [PMID: 26506997 DOI: 10.2217/bmm.15.81] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Improved biomarker-based assessment of drug safety is needed in drug discovery and development as well as regulatory evaluation. However, identifying drug safety-related biomarkers such as genes, proteins, miRNA and single-nucleotide polymorphisms remains a big challenge. The advances of 'omics' and computational technologies such as genomics, transcriptomics, metabolomics, proteomics, systems biology, network biology and systems pharmacology enable us to explore drug actions at the organ and organismal levels. Computational and experimental systems pharmacology approaches could be utilized to facilitate biomarker-based drug safety assessment for drug discovery and development and to inform better regulatory decisions. In this article, we review the current status and advances of systems pharmacology approaches for the development of predictive models to identify biomarkers for drug safety assessment.
Collapse
|
167
|
Koturbash I, Tolleson WH, Guo L, Yu D, Chen S, Hong H, Mattes W, Ning B. microRNAs as pharmacogenomic biomarkers for drug efficacy and drug safety assessment. Biomark Med 2015; 9:1153-76. [PMID: 26501795 PMCID: PMC5712454 DOI: 10.2217/bmm.15.89] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Much evidence has documented that microRNAs (miRNAs) play an important role in the modulation of interindividual variability in the production of drug metabolizing enzymes and transporters (DMETs) and nuclear receptors (NRs) through multidirectional interactions involving environmental stimuli/stressors, the expression of miRNA molecules and genetic polymorphisms. MiRNA expression has been reported to be affected by drugs and miRNAs themselves may affect drug metabolism and toxicity. In cancer research, miRNA biomarkers have been identified to mediate intrinsic and acquired resistance to cancer therapies. In drug safety assessment, miRNAs have been found associated with cardiotoxicity, hepatotoxicity and nephrotoxicity. This review article summarizes published studies to show that miRNAs can serve as early biomarkers for the evaluation of drug efficacy and drug safety.
Collapse
|
168
|
Yu Y, Fuscoe JC, Zhao C, Guo C, Jia M, Qing T, Bannon DI, Lancashire L, Bao W, Du T, Luo H, Su Z, Jones WD, Moland CL, Branham WS, Qian F, Ning B, Li Y, Hong H, Guo L, Mei N, Shi T, Wang KY, Wolfinger RD, Nikolsky Y, Walker SJ, Duerksen-Hughes P, Mason CE, Tong W, Thierry-Mieg J, Thierry-Mieg D, Shi L, Wang C. A rat RNA-Seq transcriptomic BodyMap across 11 organs and 4 developmental stages. Nat Commun 2015; 5:3230. [PMID: 24510058 PMCID: PMC3926002 DOI: 10.1038/ncomms4230] [Citation(s) in RCA: 265] [Impact Index Per Article: 29.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2013] [Accepted: 01/10/2014] [Indexed: 02/07/2023] Open
Abstract
The rat has been used extensively as a model for evaluating chemical toxicities and for understanding drug mechanisms. However, its transcriptome across multiple organs, or developmental stages, has not yet been reported. Here we show, as part of the SEQC consortium efforts, a comprehensive rat transcriptomic BodyMap created by performing RNA-Seq on 320 samples from 11 organs of both sexes of juvenile, adolescent, adult and aged Fischer 344 rats. We catalogue the expression profiles of 40,064 genes, 65,167 transcripts, 31,909 alternatively spliced transcript variants and 2,367 non-coding genes/non-coding RNAs (ncRNAs) annotated in AceView. We find that organ-enriched, differentially expressed genes reflect the known organ-specific biological activities. A large number of transcripts show organ-specific, age-dependent or sex-specific differential expression patterns. We create a web-based, open-access rat BodyMap database of expression profiles with crosslinks to other widely used databases, anticipating that it will serve as a primary resource for biomedical research using the rat model. Gene expression is highly variable between tissues, and changes during development and with age. Here, the authors provide a comprehensive RNA-Seq analysis of the rat transcriptome, spanning eleven organs, four developmental stages and both sexes.
Collapse
|
169
|
Luo H, Ye H, Ng HW, Shi L, Tong W, Mendrick DL, Hong H. Machine Learning Methods for Predicting HLA-Peptide Binding Activity. Bioinform Biol Insights 2015; 9:21-9. [PMID: 26512199 PMCID: PMC4603527 DOI: 10.4137/bbi.s29466] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2015] [Revised: 07/30/2015] [Accepted: 08/02/2015] [Indexed: 11/23/2022] Open
Abstract
As major histocompatibility complexes in humans, the human leukocyte antigens (HLAs) have important functions to present antigen peptides onto T-cell receptors for immunological recognition and responses. Interpreting and predicting HLA–peptide binding are important to study T-cell epitopes, immune reactions, and the mechanisms of adverse drug reactions. We review different types of machine learning methods and tools that have been used for HLA–peptide binding prediction. We also summarize the descriptors based on which the HLA–peptide binding prediction models have been constructed and discuss the limitation and challenges of the current methods. Lastly, we give a future perspective on the HLA–peptide binding prediction method based on network analysis.
Collapse
|
170
|
Luo H, Ye H, Ng H, Shi L, Tong W, Mattes W, Mendrick D, Hong H. Understanding and predicting binding between human leukocyte antigens (HLAs) and peptides by network analysis. BMC Bioinformatics 2015; 16 Suppl 13:S9. [PMID: 26424483 PMCID: PMC4597169 DOI: 10.1186/1471-2105-16-s13-s9] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND As the major histocompatibility complex (MHC), human leukocyte antigens (HLAs) are one of the most polymorphic genes in humans. Patients carrying certain HLA alleles may develop adverse drug reactions (ADRs) after taking specific drugs. Peptides play an important role in HLA related ADRs as they are the necessary co-binders of HLAs with drugs. Many experimental data have been generated for understanding HLA-peptide binding. However, efficiently utilizing the data for understanding and accurately predicting HLA-peptide binding is challenging. Therefore, we developed a network analysis based method to understand and predict HLA-peptide binding. METHODS Qualitative Class I HLA-peptide binding data were harvested and prepared from four major databases. An HLA-peptide binding network was constructed from this dataset and modules were identified by the fast greedy modularity optimization algorithm. To examine the significance of signals in the yielded models, the modularity was compared with the modularity values generated from 1,000 random networks. The peptides and HLAs in the modules were characterized by similarity analysis. The neighbor-edges based and unbiased leverage algorithm (Nebula) was developed for predicting HLA-peptide binding. Leave-one-out (LOO) validations and two-fold cross-validations were conducted to evaluate the performance of Nebula using the constructed HLA-peptide binding network. RESULTS Nine modules were identified from analyzing the HLA-peptide binding network with a highest modularity compared to all the random networks. Peptide length and functional side chains of amino acids at certain positions of the peptides were different among the modules. HLA sequences were module dependent to some extent. Nebula archived an overall prediction accuracy of 0.816 in the LOO validations and average accuracy of 0.795 in the two-fold cross-validations and outperformed the method reported in the literature. CONCLUSIONS Network analysis is a useful approach for analyzing large and sparse datasets such as the HLA-peptide binding dataset. The modules identified from the network analysis clustered peptides and HLAs with similar sequences and properties of amino acids. Nebula performed well in the predictions of HLA-peptide binding. We demonstrated that network analysis coupled with Nebula is an efficient approach to understand and predict HLA-peptide binding interactions and thus, could further our understanding of ADRs.
Collapse
|
171
|
Ng HW, Shu M, Luo H, Ye H, Ge W, Perkins R, Tong W, Hong H. Estrogenic activity data extraction and in silico prediction show the endocrine disruption potential of bisphenol A replacement compounds. Chem Res Toxicol 2015; 28:1784-95. [PMID: 26308263 DOI: 10.1021/acs.chemrestox.5b00243] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Bisphenol A (BPA) replacement compounds are released to the environment and cause widespread human exposure. However, a lack of thorough safety evaluations on the BPA replacement compounds has raised public concerns. We assessed the endocrine disruption potential of BPA replacement compounds in the market to assist their safety evaluations. A literature search was conducted to ascertain the BPA replacement compounds in use. Available experimental estrogenic activity data of these compounds were extracted from the Estrogenic Activity Database (EADB) to assess their estrogenic potential. An in silico model was developed to predict the estrogenic activity of compounds lacking experimental data. Molecular dynamics (MD) simulations were performed to understand the mechanisms by which the estrogenic compounds bind to and activate the estrogen receptor (ER). Forty-five BPA replacement compounds were identified in the literature. Seven were more estrogenic and five less estrogenic than BPA, while six were nonestrogenic in EADB. A two-tier in silico model was developed based on molecular docking to predict the estrogenic activity of the 27 compounds lacking data. Eleven were predicted as ER binders and 16 as nonbinders. MD simulations revealed hydrophobic contacts and hydrogen bonds as the main interactions between ER and the estrogenic compounds.
Collapse
|
172
|
Kang H, Cho W, Hong H, Kim J, Cho Y, Kwon O, Bang J, Hwang G, Son Y, Oh C, Han M. P-019 stability of the cerebral aneurysms after stent-assisted coil embolization: a propensity score-matched analysis: Abstract P-019 Table 1. J Neurointerv Surg 2015. [DOI: 10.1136/neurintsurg-2015-011917.58] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
|
173
|
Zhang W, Yu Y, Hertwig F, Thierry-Mieg J, Zhang W, Thierry-Mieg D, Wang J, Furlanello C, Devanarayan V, Cheng J, Deng Y, Hero B, Hong H, Jia M, Li L, Lin SM, Nikolsky Y, Oberthuer A, Qing T, Su Z, Volland R, Wang C, Wang MD, Ai J, Albanese D, Asgharzadeh S, Avigad S, Bao W, Bessarabova M, Brilliant MH, Brors B, Chierici M, Chu TM, Zhang J, Grundy RG, He MM, Hebbring S, Kaufman HL, Lababidi S, Lancashire LJ, Li Y, Lu XX, Luo H, Ma X, Ning B, Noguera R, Peifer M, Phan JH, Roels F, Rosswog C, Shao S, Shen J, Theissen J, Tonini GP, Vandesompele J, Wu PY, Xiao W, Xu J, Xu W, Xuan J, Yang Y, Ye Z, Dong Z, Zhang KK, Yin Y, Zhao C, Zheng Y, Wolfinger RD, Shi T, Malkas LH, Berthold F, Wang J, Tong W, Shi L, Peng Z, Fischer M. Comparison of RNA-seq and microarray-based models for clinical endpoint prediction. Genome Biol 2015; 16:133. [PMID: 26109056 PMCID: PMC4506430 DOI: 10.1186/s13059-015-0694-1] [Citation(s) in RCA: 247] [Impact Index Per Article: 27.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2015] [Accepted: 06/12/2015] [Indexed: 12/22/2022] Open
Abstract
Background Gene expression profiling is being widely applied in cancer research to identify biomarkers for clinical endpoint prediction. Since RNA-seq provides a powerful tool for transcriptome-based applications beyond the limitations of microarrays, we sought to systematically evaluate the performance of RNA-seq-based and microarray-based classifiers in this MAQC-III/SEQC study for clinical endpoint prediction using neuroblastoma as a model. Results We generate gene expression profiles from 498 primary neuroblastomas using both RNA-seq and 44 k microarrays. Characterization of the neuroblastoma transcriptome by RNA-seq reveals that more than 48,000 genes and 200,000 transcripts are being expressed in this malignancy. We also find that RNA-seq provides much more detailed information on specific transcript expression patterns in clinico-genetic neuroblastoma subgroups than microarrays. To systematically compare the power of RNA-seq and microarray-based models in predicting clinical endpoints, we divide the cohort randomly into training and validation sets and develop 360 predictive models on six clinical endpoints of varying predictability. Evaluation of factors potentially affecting model performances reveals that prediction accuracies are most strongly influenced by the nature of the clinical endpoint, whereas technological platforms (RNA-seq vs. microarrays), RNA-seq data analysis pipelines, and feature levels (gene vs. transcript vs. exon-junction level) do not significantly affect performances of the models. Conclusions We demonstrate that RNA-seq outperforms microarrays in determining the transcriptomic characteristics of cancer, while RNA-seq and microarray-based models perform similarly in clinical endpoint prediction. Our findings may be valuable to guide future studies on the development of gene expression-based predictive models and their implementation in clinical practice. Electronic supplementary material The online version of this article (doi:10.1186/s13059-015-0694-1) contains supplementary material, which is available to authorized users.
Collapse
|
174
|
Hong H, Xiao H, Yuan H, Zhai J, Huang X. Cloning and characterisation of JAZ gene family in Hevea brasiliensis. PLANT BIOLOGY (STUTTGART, GERMANY) 2015; 17:618-24. [PMID: 25399518 DOI: 10.1111/plb.12288] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/08/2014] [Accepted: 11/10/2014] [Indexed: 05/11/2023]
Abstract
Mechanical wounding or treatment with exogenous jasmonates (JA) induces differentiation of the laticifer in Hevea brasiliensis. JA is a key signal for latex biosynthesis and wounding response in the rubber tree. Identification of JAZ (jasmonate ZIM-domain) family of proteins that repress JA responses has facilitated rapid progress in understanding how this lipid-derived hormone controls gene expression and related physiological processes in plants. In this work, the full-length cDNAs of six JAZ genes were cloned from H. brasiliensis (termed HbJAZ). These HbJAZ have different lengths and sequence diversity, but all of them contain Jas and ZIM domains, and two of them contain an ERF-associated amphiphilic repression (EAR) motif in the N-terminal. Real-time RT-PCR analyses revealed that HbJAZ have different expression patterns and tissue specificity. Four HbJAZ were up-regulated, one was down-regulated, while two were less effected by rubber tapping treatment, suggesting that they might play distinct roles in the wounding response. A yeast two-hybrid assay revealed that HbJAZ proteins interact with each other to form homologous or heterogeneous dimer complexes, indicating that the HbJAZ proteins may expand their function through diverse JAZ-JAZ interactions. This work lays a foundation for identification of the JA signalling pathway and molecular mechanisms of latex biosynthesis in rubber trees.
Collapse
|
175
|
Luo H, Du T, Zhou P, Yang L, Mei H, Ng H, Zhang W, Shu M, Tong W, Shi L, Mendrick D, Hong H. Molecular Docking to Identify Associations Between Drugs and Class I Human Leukocyte Antigens for Predicting Idiosyncratic Drug Reactions. Comb Chem High Throughput Screen 2015; 18:296-304. [DOI: 10.2174/1386207318666150305144015] [Citation(s) in RCA: 67] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2014] [Revised: 09/29/2014] [Accepted: 11/10/2014] [Indexed: 11/22/2022]
|