Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Ma S, Sung J, Magis AT, Wang Y, Geman D, Price ND. Measuring the effect of inter-study variability on estimating prediction error. PLoS One 2014;9:e110840. [PMID: 25330348 PMCID: PMC4201588 DOI: 10.1371/journal.pone.0110840] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Accepted: 09/18/2014] [Indexed: 11/19/2022] Open

For:	Ma S, Sung J, Magis AT, Wang Y, Geman D, Price ND. Measuring the effect of inter-study variability on estimating prediction error. PLoS One 2014;9:e110840. [PMID: 25330348 PMCID: PMC4201588 DOI: 10.1371/journal.pone.0110840] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Accepted: 09/18/2014] [Indexed: 11/19/2022] Open

Number

Cited by Other Article(s)

Chang D, Gupta VK, Hur B, Cobo-López S, Cunningham KY, Han NS, Lee I, Kronzer VL, Teigen LM, Karnatovskaia LV, Longbrake EE, Davis JM, Nelson H, Sung J. Gut Microbiome Wellness Index 2 enhances health status prediction from gut microbiome taxonomic profiles. Nat Commun 2024;15:7447. [PMID: 39198444 PMCID: PMC11358288 DOI: 10.1038/s41467-024-51651-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 08/09/2024] [Indexed: 09/01/2024] Open

Chanda D, De D. Meta-analysis reveals obesity associated gut microbial alteration patterns and reproducible contributors of functional shift. Gut Microbes 2024;16:2304900. [PMID: 38265338 PMCID: PMC10810176 DOI: 10.1080/19490976.2024.2304900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 01/09/2024] [Indexed: 01/25/2024] Open

Abstract

The majority of cohort-specific studies associating gut microbiota with obesity are often contradictory; thus, the replicability of the signature remains questionable. Moreover, the species that drive obesity-associated functional shifts and their replicability remain unexplored. Thus, we aimed to address these questions by analyzing gut microbial metagenome sequencing data to develop an in-depth understanding of obese host-gut microbiota interactions using 3329 samples (Obese, n = 1494; Control, n = 1835) from 17 different countries, including both 16S rRNA gene and metagenomic sequence data. Fecal metagenomic data from diverse geographical locations were curated, profiled, and pooled using a machine learning-based approach to identify robust global signatures of obesity. Furthermore, gut microbial species and pathways were systematically integrated through the genomic content of the species to identify contributors to obesity-associated functional shifts. The community structure of the obese gut microbiome was evaluated, and a reproducible depletion of diversity was observed in the obese compared to the lean gut. From this, we infer that the loss of diversity in the obese gut is responsible for perturbations in the healthy microbial functional repertoire. We identified 25 highly predictive species and 37 pathway associations as signatures of obesity, which were validated with remarkably high accuracy (AUC, Species: 0.85, and pathway: 0.80) with an independent validation dataset. We observed a reduction in short-chain fatty acid (SCFA) producers (several Alistipes species, Odoribacter splanchnicus, etc.) and depletion of promoters of gut barrier integrity (Akkermansia muciniphila and Bifidobacterium longum) in obese guts. Our analysis underlines SCFAs and purine/pyrimidine biosynthesis, carbohydrate metabolism pathways in control individuals, and amino acid, enzyme cofactor, and peptidoglycan biosynthesis pathway enrichment in obese individuals. We also mapped the contributors to important obesity-associated functional shifts and observed that these are both dataset-specific and shared across the datasets. In summary, a comprehensive analysis of diverse datasets unveils species specifically contributing to functional shifts and consistent gut microbial patterns associated to obesity.

Collapse

Chang D, Gupta VK, Hur B, Cobo-López S, Cunningham KY, Han NS, Lee I, Kronzer VL, Teigen LM, Karnatovskaia LV, Longbrake EE, Davis JM, Nelson H, Sung J. Gut Microbiome Wellness Index 2 for Enhanced Health Status Prediction from Gut Microbiome Taxonomic Profiles. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.30.560294. [PMID: 37873265 PMCID: PMC10592848 DOI: 10.1101/2023.09.30.560294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]

Zhang Y, Patil P, Johnson WE, Parmigiani G. Robustifying genomic classifiers to batch effects via ensemble learning. Bioinformatics 2021;37:1521-1527. [PMID: 33245114 DOI: 10.1093/bioinformatics/btaa986] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2020] [Revised: 10/20/2020] [Accepted: 11/13/2020] [Indexed: 01/08/2023] Open

Abstract

MOTIVATION

Genomic data are often produced in batches due to practical restrictions, which may lead to unwanted variation in data caused by discrepancies across batches. Such 'batch effects' often have negative impact on downstream biological analysis and need careful consideration. In practice, batch effects are usually addressed by specifically designed software, which merge the data from different batches, then estimate batch effects and remove them from the data. Here, we focus on classification and prediction problems, and propose a different strategy based on ensemble learning. We first develop prediction models within each batch, then integrate them through ensemble weighting methods.

RESULTS

We provide a systematic comparison between these two strategies using studies targeting diverse populations infected with tuberculosis. In one study, we simulated increasing levels of heterogeneity across random subsets of the study, which we treat as simulated batches. We then use the two methods to develop a genomic classifier for the binary indicator of disease status. We evaluate the accuracy of prediction in another independent study targeting a different population cohort. We observed that in independent validation, while merging followed by batch adjustment provides better discrimination at low level of heterogeneity, our ensemble learning strategy achieves more robust performance, especially at high severity of batch effects. These observations provide practical guidelines for handling batch effects in the development and evaluation of genomic classifiers.

AVAILABILITY AND IMPLEMENTATION

The data underlying this article are available in the article and in its online supplementary material. Processed data is available in the Github repository with implementation code, at https://github.com/zhangyuqing/bea_ensemble.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Zhang Y, Bernau C, Parmigiani G, Waldron L. The impact of different sources of heterogeneity on loss of accuracy from genomic prediction models. Biostatistics 2020;21:253-268. [PMID: 30202918 DOI: 10.1093/biostatistics/kxy044] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Revised: 07/22/2018] [Accepted: 08/04/2018] [Indexed: 11/13/2022] Open

Gupta VK, Kim M, Bakshi U, Cunningham KY, Davis JM, Lazaridis KN, Nelson H, Chia N, Sung J. A predictive index for health status using species-level gut microbiome profiling. Nat Commun 2020;11:4635. [PMID: 32934239 PMCID: PMC7492273 DOI: 10.1038/s41467-020-18476-8] [Citation(s) in RCA: 119] [Impact Index Per Article: 29.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2020] [Accepted: 08/19/2020] [Indexed: 12/26/2022] Open

Chang L. Partial order relations for classification comparisons. CAN J STAT 2019. [DOI: 10.1002/cjs.11524] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 2019;46:D8-D13. [PMID: 29140470 PMCID: PMC5753372 DOI: 10.1093/nar/gkx1095] [Citation(s) in RCA: 908] [Impact Index Per Article: 181.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Accepted: 11/09/2017] [Indexed: 12/26/2022] Open

Pan M, Zhang J. Quantile normalization for combining gene-expression datasets. BIOTECHNOL BIOTEC EQ 2018. [DOI: 10.1080/13102818.2017.1419376] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open

Patil P, Parmigiani G. Training replicable predictors in multiple studies. Proc Natl Acad Sci U S A 2018;115:2578-2583. [PMID: 29531060 PMCID: PMC5856504 DOI: 10.1073/pnas.1708283115] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open

Ghosh D, Funk CC, Caballero J, Shah N, Rouleau K, Earls JC, Soroceanu L, Foltz G, Cobbs CS, Price ND, Hood L. A Cell-Surface Membrane Protein Signature for Glioblastoma. Cell Syst 2017;4:516-529.e7. [PMID: 28365151 DOI: 10.1016/j.cels.2017.03.004] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Revised: 09/08/2016] [Accepted: 03/03/2017] [Indexed: 02/08/2023]

Kim S, Jhong JH, Lee J, Koo JY. Meta-analytic support vector machine for integrating multiple omics data. BioData Min 2017;10:2. [PMID: 28149325 PMCID: PMC5270233 DOI: 10.1186/s13040-017-0126-8] [Citation(s) in RCA: 82] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Accepted: 01/11/2017] [Indexed: 11/10/2022] Open

Biales AD, Kostich MS, Batt AL, See MJ, Flick RW, Gordon DA, Lazorchak JM, Bencic DC. Initial development of a multigene 'omics-based exposure biomarker for pyrethroid pesticides. AQUATIC TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2016;179:27-35. [PMID: 27564377 DOI: 10.1016/j.aquatox.2016.08.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Revised: 08/02/2016] [Accepted: 08/05/2016] [Indexed: 06/06/2023]

Abstract

Omics technologies have long since promised to address a number of long standing issues related to environmental regulation. Despite considerable resource investment, there are few examples where these tools have been adopted by the regulatory community, which is in part due to a focus of most studies on discovery rather than assay development. The current work describes the initial development of an omics based assay using 48h Pimephales promelas (FHM) larvae for identifying aquatic exposures to pyrethroid pesticides. Larval FHM were exposed to seven concentrations of each of four pyrethroids (permethrin, cypermethrin, esfenvalerate and bifenthrin) in order to establish dose response curves. Then, in three separate identical experiments, FHM were exposed to a single equitoxic concentration of each pyrethroid, corresponding to 33% of the calculated LC50. All exposures were separated by weeks and all materials were either cleaned or replaced between runs in an attempt to maintain independence among exposure experiments. Gene expression classifiers were developed using the random forest algorithm for each exposure and evaluated first by cross-validation using hold out organisms from the same exposure experiment and then against test sets of each pyrethroid from separate exposure experiments. Bifenthrin exposed organisms generated the highest quality classifier, demonstrating an empirical Area Under the Curve (eAUC) of 0.97 when tested against bifenthrin exposed organisms from other exposure experiments and 0.91 against organisms exposed to any of the pyrethroids. An eAUC of 1.0 represents perfect classification with no false positives or negatives. Additionally, the bifenthrin classifier was able to successfully classify organisms from all other pyrethroid exposures at multiple concentrations, suggesting a potential utility for detecting cumulative exposures. Considerable run-to-run variability was observed both in exposure concentrations and molecular responses of exposed fish across exposure experiments. The application of a calibration step in analysis successfully corrected this, resulting in a significantly improved classifier. Classifier evaluation suggested the importance of considering a number of aspects of experimental design when developing an expression based tool for general use in ecological monitoring and risk assessment, such as the inclusion of multiple experimental runs and high replicate numbers.

Collapse

Triple-layer dissection of the lung adenocarcinoma transcriptome: regulation at the gene, transcript, and exon levels. Oncotarget 2016;6:28755-73. [PMID: 26356813 PMCID: PMC4745690 DOI: 10.18632/oncotarget.4810] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 08/21/2015] [Indexed: 12/30/2022] Open

Abstract

Lung adenocarcinoma is one of the most deadly human diseases. However, the molecular mechanisms underlying this disease, particularly RNA splicing, have remained underexplored. Here, we report a triple-level (gene-, transcript-, and exon-level) analysis of lung adenocarcinoma transcriptomes from 77 paired tumor and normal tissues, as well as an analysis pipeline to overcome genetic variability for accurate differentiation between tumor and normal tissues. We report three major results. First, more than 5,000 differentially expressed transcripts/exonic regions occur repeatedly in lung adenocarcinoma patients. These transcripts/exonic regions are enriched in nicotine metabolism and ribosomal functions in addition to the pathways enriched for differentially expressed genes (cell cycle, extracellular matrix receptor interaction, and axon guidance). Second, classification models based on rationally selected transcripts or exonic regions can reach accuracies of 0.93 to 1.00 in differentiating tumor from normal tissues. Of the 28 selected exonic regions, 26 regions correspond to alternative exons located in such regulators as tumor suppressor (GDF10), signal receptor (LYVE1), vascular-specific regulator (RASIP1), ubiquitination mediator (RNF5), and transcriptional repressor (TRIM27). Third, classification systems based on 13 to 14 differentially expressed genes yield accuracies near 100%. Genes selected by both detection methods include C16orf59, DAP3, ETV4, GABARAPL1, PPAR, RADIL, RSPO1, SERTM1, SRPK1, ST6GALNAC6, and TNXB. Our findings imply a multilayered lung adenocarcinoma regulome in which transcript-/exon-level regulation may be dissociated from gene-level regulation. Our described method may be used to identify potentially important genes/transcripts/exonic regions for the tumorigenesis of lung adenocarcinoma and to construct accurate tumor vs. normal classification systems for this disease.

Collapse

Kim S, Lin CW, Tseng GC. MetaKTSP: a meta-analytic top scoring pair method for robust cross-study validation of omics prediction analysis. Bioinformatics 2016;32:1966-73. [PMID: 27153719 DOI: 10.1093/bioinformatics/btw115] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2015] [Accepted: 02/19/2016] [Indexed: 01/08/2023] Open

Abstract

MOTIVATION

Supervised machine learning is widely applied to transcriptomic data to predict disease diagnosis, prognosis or survival. Robust and interpretable classifiers with high accuracy are usually favored for their clinical and translational potential. The top scoring pair (TSP) algorithm is an example that applies a simple rank-based algorithm to identify rank-altered gene pairs for classifier construction. Although many classification methods perform well in cross-validation of single expression profile, the performance usually greatly reduces in cross-study validation (i.e. the prediction model is established in the training study and applied to an independent test study) for all machine learning methods, including TSP. The failure of cross-study validation has largely diminished the potential translational and clinical values of the models. The purpose of this article is to develop a meta-analytic top scoring pair (MetaKTSP) framework that combines multiple transcriptomic studies and generates a robust prediction model applicable to independent test studies.

RESULTS

We proposed two frameworks, by averaging TSP scores or by combining P-values from individual studies, to select the top gene pairs for model construction. We applied the proposed methods in simulated data sets and three large-scale real applications in breast cancer, idiopathic pulmonary fibrosis and pan-cancer methylation. The result showed superior performance of cross-study validation accuracy and biomarker selection for the new meta-analytic framework. In conclusion, combining multiple omics data sets in the public domain increases robustness and accuracy of the classification model that will ultimately improve disease understanding and clinical treatment decisions to benefit patients.

AVAILABILITY AND IMPLEMENTATION

An R package MetaKTSP is available online. (http://tsenglab.biostat.pitt.edu/software.htm).

CONTACT

ctseng@pitt.edu

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse