26
|
Lam HC, Cloonan SM, Bhashyam AR, Haspel JA, Singh A, Sathirapongsasuti JF, Cervo M, Yao H, Chung AL, Mizumura K, An CH, Shan B, Franks JM, Haley KJ, Owen CA, Tesfaigzi Y, Washko GR, Quackenbush J, Silverman EK, Rahman I, Kim HP, Mahmood A, Biswal SS, Ryter SW, Choi AM. Histone deacetylase 6-mediated selective autophagy regulates COPD-associated cilia dysfunction. J Clin Invest 2020; 130:6189. [PMID: 33136096 DOI: 10.1172/jci143863] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
|
27
|
Haibe-Kains B, Adam GA, Hosny A, Khodakarami F, Waldron L, Wang B, McIntosh C, Goldenberg A, Kundaje A, Greene CS, Broderick T, Hoffman MM, Leek JT, Korthauer K, Huber W, Brazma A, Pineau J, Tibshirani R, Hastie T, Ioannidis JPA, Quackenbush J, Aerts HJWL. Transparency and reproducibility in artificial intelligence. Nature 2020; 586:E14-E16. [PMID: 33057217 PMCID: PMC8144864 DOI: 10.1038/s41586-020-2766-y] [Citation(s) in RCA: 140] [Impact Index Per Article: 35.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2020] [Accepted: 08/10/2020] [Indexed: 01/15/2023]
Abstract
Breakthroughs in artificial intelligence (AI) hold enormous potential as it can automate complex tasks and go even beyond human performance. In their study, McKinney et al. showed the high potential of AI for breast cancer screening. However, the lack of methods’ details and algorithm code undermines its scientific value. Here, we identify obstacles hindering transparent and reproducible AI research as faced by McKinney et al., and provide solutions to these obstacles with implications for the broader field.
Collapse
|
28
|
Lopes-Ramos CM, Kuijjer M, Glass K, DeMeo D, Quackenbush J. Abstract 6569: Regulatory networks of liver carcinoma reveal sex specific patterns of gene regulation. Cancer Res 2020. [DOI: 10.1158/1538-7445.am2020-6569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Despite pronounced sex differences in cancer incidence, severity, and response to treatment, most current approaches to clinical management, as well as therapeutics development and selection, are sex-independent. For most cancer types, including liver cancer, males have a higher risk of developing the disease and a lower survival rate than women. However, the molecular features that drive these sex differences are poorly understood. We inferred patient-specific regulatory networks of liver hepatocellular carcinoma using data from TCGA. By comparing the female and male networks, we found marked sex differences in transcriptional regulatory processes relevant to disease development, progression, and response to therapy. We found that oncogenes have significantly higher regulatory targeting in males, while tumor suppressor genes have significantly higher targeting in females. Many “hallmark” cancer pathways, including the HEDGEHOG, WNT, and TGF-β signaling pathways, were significantly more highly targeted in males, while drug metabolism and immune related pathways were enriched for genes highly targeted in females. We also evaluated sex-biased somatic mutation patterns using mutation and copy number alteration data in TCGA. By summarizing mutations found in genes into pathway mutation scores, we found sex-biased mutation profiles for many pathways, providing additional support for biological sex differences associated with WNT, NOTCH, TGF-β, and HEDGEHOG signaling pathways. Our analysis uncovered patterns of gene regulation that differentiate male and female liver cancer and may be associated with sex differences in prognosis and treatment response. These findings provide insight into the mechanisms that drive clinically observed sex differences and underscore the importance of considering sex as a factor influencing disease etiology and in developing and prescribing therapies. Our network approach can provide insights into why some therapies have differential effect in males and females, and suggest new ways to optimize drug response in each sex.
Citation Format: Camila M. Lopes-Ramos, Marieke Kuijjer, Kimberly Glass, Dawn DeMeo, John Quackenbush. Regulatory networks of liver carcinoma reveal sex specific patterns of gene regulation [abstract]. In: Proceedings of the Annual Meeting of the American Association for Cancer Research 2020; 2020 Apr 27-28 and Jun 22-24. Philadelphia (PA): AACR; Cancer Res 2020;80(16 Suppl):Abstract nr 6569.
Collapse
|
29
|
Morrow JD, Make B, Regan E, Han M, Hersh CP, Tal-Singer R, Quackenbush J, Choi AMK, Silverman EK, DeMeo DL. DNA Methylation Is Predictive of Mortality in Current and Former Smokers. Am J Respir Crit Care Med 2020; 201:1099-1109. [PMID: 31995399 DOI: 10.1164/rccm.201902-0439oc] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Rationale: Smoking results in at least a decade lower life expectancy. Mortality among current smokers is two to three times as high as never smokers. DNA methylation is an epigenetic modification of the human genome that has been associated with both cigarette smoking and mortality.Objectives: We sought to identify DNA methylation marks in blood that are predictive of mortality in a subset of the COPDGene (Genetic Epidemiology of COPD) study, representing 101 deaths among 667 current and former smokers.Methods: We assayed genome-wide DNA methylation in non-Hispanic white smokers with and without chronic obstructive pulmonary disease (COPD) using blood samples from the COPDGene enrollment visit. We tested whether DNA methylation was associated with mortality in models adjusted for COPD status, age, sex, current smoking status, and pack-years of cigarette smoking. Replication was performed in a subset of 231 individuals from the ECLIPSE (Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints) study.Measurements and Main Results: We identified seven CpG sites associated with mortality (false discovery rate < 20%) that replicated in the ECLIPSE cohort (P < 0.05). None of these marks were associated with longitudinal lung function decline in survivors, smoking history, or current smoking status. However, differential methylation of two replicated PIK3CD (phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit delta) sites were associated with lung function at enrollment (P < 0.05). We also observed associations between DNA methylation and gene expression for the PIK3CD sites.Conclusions: This study is the first to identify variable DNA methylation associated with all-cause mortality in smokers with and without COPD. Evaluating predictive epigenomic marks of smokers in peripheral blood may allow for targeted risk stratification and aid in delivery of future tailored therapeutic interventions.
Collapse
|
30
|
Gaynor SM, Sun R, Lin X, Quackenbush J. Identification of differentially expressed gene sets using the Generalized Berk-Jones statistic. Bioinformatics 2020; 35:4568-4576. [PMID: 31062858 DOI: 10.1093/bioinformatics/btz277] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Revised: 02/14/2019] [Accepted: 04/23/2019] [Indexed: 01/09/2023] Open
Abstract
MOTIVATION Cancer genomics studies frequently aim to identify genes that are differentially expressed between clinically distinct patient subgroups, generally by testing single genes one at a time. However, the results of any individual transcriptomic study are often not fully reproducible. A particular challenge impeding statistical analysis is the difficulty of distinguishing between differential expression comprising part of the genomic disease etiology and that induced by downstream effects. More robust analytical approaches that are well-powered to detect potentially causative genes, are less prone to discovering spurious associations, and can deliver reproducible findings across different studies are needed. RESULTS We propose a set-based procedure for testing of differential expression and show that this set-based approach can produce more robust results by aggregating information across multiple, correlated genomic markers. Specifically, we adapt the Generalized Berk-Jones statistic to test for the transcription factors that may contribute to the progression of estrogen receptor positive breast cancer. We demonstrate the ability of our method to produce reproducible findings by applying the same analysis to 21 publicly available datasets, producing a similar list of significant transcription factors across most studies. Our Generalized Berk-Jones approach produces results that show improved consistency over three set-based testing algorithms: Generalized Higher Criticism, Gene Set Analysis and Gene Set Enrichment Analysis. AVAILABILITY AND IMPLEMENTATION Data are in the MetaGxBreast R package. Code is available at github.com/ryanrsun/gaynor_sun_GBJ_breast_cancer. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
31
|
Chauhan K, Nadkarni GN, Fleming F, McCullough J, He CJ, Quackenbush J, Murphy B, Donovan MJ, Coca SG, Bonventre JV. Initial Validation of a Machine Learning-Derived Prognostic Test (KidneyIntelX) Integrating Biomarkers and Electronic Health Record Data To Predict Longitudinal Kidney Outcomes. ACTA ACUST UNITED AC 2020; 1:731-739. [DOI: 10.34067/kid.0002252020] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2020] [Accepted: 06/25/2020] [Indexed: 11/27/2022]
Abstract
BackgroundIndividuals with type 2 diabetes (T2D) or the apolipoprotein L1 high-risk (APOL1-HR) genotypes are at increased risk of rapid kidney function decline (RKFD) and kidney failure. We hypothesized that a prognostic test using machine learning integrating blood biomarkers and longitudinal electronic health record (EHR) data would improve risk stratification.MethodsWe selected two cohorts from the Mount Sinai BioMe Biobank: T2D (n=871) and African ancestry with APOL1-HR (n=498). We measured plasma tumor necrosis factor receptors (TNFR) 1 and 2 and kidney injury molecule-1 (KIM-1) and used random forest algorithms to integrate biomarker and EHR data to generate a risk score for a composite outcome: RKFD (eGFR decline of ≥5 ml/min per year), or 40% sustained eGFR decline, or kidney failure. We compared performance to a validated clinical model and applied thresholds to assess the utility of the prognostic test (KidneyIntelX) to accurately stratify patients into risk categories.ResultsOverall, 23% of those with T2D and 18% of those with APOL1-HR experienced the composite kidney end point over a median follow-up of 4.6 and 5.9 years, respectively. The area under the receiver operator characteristic curve (AUC) of KidneyIntelX was 0.77 (95% CI, 0.75 to 0.79) in T2D, and 0.80 (95% CI, 0.77 to 0.83) in APOL1-HR, outperforming the clinical models (AUC, 0.66 [95% CI, 0.65 to 0.67] and 0.72 [95% CI, 0.71 to 0.73], respectively; P<0.001). The positive predictive values for KidneyIntelX were 62% and 62% versus 46% and 39% for the clinical models (P<0.01) in high-risk (top 15%) stratum for T2D and APOL1-HR, respectively. The negative predictive values for KidneyIntelX were 92% in T2D and 96% for APOL1-HR versus 85% and 93% for the clinical model, respectively (P=0.76 and 0.93, respectively), in low-risk stratum (bottom 50%).ConclusionsIn patients with T2D or APOL1-HR, a prognostic test (KidneyIntelX) integrating biomarker levels with longitudinal EHR data significantly improved prediction of a composite kidney end point of RKFD, 40% decline in eGFR, or kidney failure over validated clinical models.
Collapse
|
32
|
Altenbuchinger M, Weihs A, Quackenbush J, Grabe HJ, Zacharias HU. Gaussian and Mixed Graphical Models as (multi-)omics data analysis tools. BIOCHIMICA ET BIOPHYSICA ACTA. GENE REGULATORY MECHANISMS 2020; 1863:194418. [PMID: 31639475 PMCID: PMC7166149 DOI: 10.1016/j.bbagrm.2019.194418] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 08/21/2019] [Accepted: 08/21/2019] [Indexed: 11/30/2022]
Abstract
Gaussian Graphical Models (GGMs) are tools to infer dependencies between biological variables. Popular applications are the reconstruction of gene, protein, and metabolite association networks. GGMs are an exploratory research tool that can be useful to discover interesting relations between genes (functional clusters) or to identify therapeutically interesting genes, but do not necessarily infer a network in the mechanistic sense. Although GGMs are well investigated from a theoretical and applied perspective, important extensions are not well known within the biological community. GGMs assume, for instance, multivariate normal distributed data. If this assumption is violated Mixed Graphical Models (MGMs) can be the better choice. In this review, we provide the theoretical foundations of GGMs, present extensions such as MGMs or multi-class GGMs, and illustrate how those methods can provide insight in biological mechanisms. We summarize several applications and present user-friendly estimation software. This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.
Collapse
|
33
|
Silverman EK, Schmidt HHHW, Anastasiadou E, Altucci L, Angelini M, Badimon L, Balligand JL, Benincasa G, Capasso G, Conte F, Di Costanzo A, Farina L, Fiscon G, Gatto L, Gentili M, Loscalzo J, Marchese C, Napoli C, Paci P, Petti M, Quackenbush J, Tieri P, Viggiano D, Vilahur G, Glass K, Baumbach J. Molecular networks in Network Medicine: Development and applications. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2020; 12:e1489. [PMID: 32307915 DOI: 10.1002/wsbm.1489] [Citation(s) in RCA: 84] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2019] [Revised: 02/29/2020] [Accepted: 03/20/2020] [Indexed: 12/14/2022]
Abstract
Network Medicine applies network science approaches to investigate disease pathogenesis. Many different analytical methods have been used to infer relevant molecular networks, including protein-protein interaction networks, correlation-based networks, gene regulatory networks, and Bayesian networks. Network Medicine applies these integrated approaches to Omics Big Data (including genetics, epigenetics, transcriptomics, metabolomics, and proteomics) using computational biology tools and, thereby, has the potential to provide improvements in the diagnosis, prognosis, and treatment of complex diseases. We discuss briefly the types of molecular data that are used in molecular network analyses, survey the analytical methods for inferring molecular networks, and review efforts to validate and visualize molecular networks. Successful applications of molecular network analysis have been reported in pulmonary arterial hypertension, coronary heart disease, diabetes mellitus, chronic lung diseases, and drug development. Important knowledge gaps in Network Medicine include incompleteness of the molecular interactome, challenges in identifying key genes within genetic association regions, and limited applications to human diseases. This article is categorized under: Models of Systems Properties and Processes > Mechanistic Models Translational, Genomic, and Systems Medicine > Translational Medicine Analytical and Computational Methods > Analytical Methods Analytical and Computational Methods > Computational Methods.
Collapse
|
34
|
Qiang J, Ding W, Kuijjer M, Quackenbush J, Chen P. Clustering Sparse Data With Feature Correlation With Application to Discover Subtypes in Cancer. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 8:67775-67789. [PMID: 36329870 PMCID: PMC9629797 DOI: 10.1109/access.2020.2982569] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
In this paper, given data with high-dimensional features, we study this problem of how to calculate the similarity between two samples by considering feature interaction network, where a feature interaction network represents the relationship between features. This is different from some traditional methods, those of which learn similarities based on a sample network that represents the relationship between samples. Therefore, we propose a novel network-based similarity metric for computing the similarity between samples, which incorporates the knowledge of feature interaction network, in order to overcome the data sparseness problem. Our similarity metric uses a new Feature Alignment Similarity measure, which does not directly compute the similarities among samples, but projects each sample into a feature interaction network and measures the similarities between two samples using the similarities between the vertices of the samples in the network. As such, when two samples do not share any common features, they are likely to have higher similarity values when their features share the similar network regions. For ensuring that the metric is useful in a real-world application, we apply our metric to discover subtypes in tumor mutational data by incorporating the information of the gene interaction network. Our experimental results from using synthetic data and real-world tumor mutational data show that our approach outperforms the top competitors in cancer subtype discovery. Furthermore, our approach can identify cancer subtypes that cannot be detected by other clustering algorithms in real cancer data.
Collapse
|
35
|
Lietz CE, Garbutt C, Barry WT, Deshpande V, Chen YL, Lozano-Calderon SA, Wang Y, Lawney B, Ebb D, Cote GM, Duan Z, Hornicek FJ, Choy E, Petur Nielsen G, Haibe-Kains B, Quackenbush J, Spentzos D. MicroRNA-mRNA networks define translatable molecular outcome phenotypes in osteosarcoma. Sci Rep 2020; 10:4409. [PMID: 32157112 PMCID: PMC7064533 DOI: 10.1038/s41598-020-61236-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2019] [Accepted: 02/03/2020] [Indexed: 12/30/2022] Open
Abstract
There is a lack of well validated prognostic biomarkers in osteosarcoma, a rare, recalcitrant disease for which treatment standards have not changed in over 20 years. We performed microRNA sequencing in 74 frozen osteosarcoma biopsy samples, constituting the largest single center translationally analyzed osteosarcoma cohort to date, and we separately analyzed a multi-omic dataset from a large NCI supported national cooperative group cohort. We validated the prognostic value of candidate microRNA signatures and contextualized them in relevant transcriptomic and epigenomic networks. Our results reveal the existence of molecularly defined phenotypes associated with outcome independent of clinicopathologic features. Through machine learning based integrative pharmacogenomic analysis, the microRNA biomarkers identify novel therapeutics for stratified application in osteosarcoma. The previously unrecognized osteosarcoma subtypes with distinct clinical courses and response to therapy could be translatable for discerning patients appropriate for more intensified, less intensified, or alternate therapeutic regimens.
Collapse
|
36
|
Schwede M, Waldron L, Mok SC, Wei W, Basunia A, Merritt MA, Mitsiades CS, Parmigiani G, Harrington DP, Quackenbush J, Birrer MJ, Culhane AC. The Impact of Stroma Admixture on Molecular Subtypes and Prognostic Gene Signatures in Serous Ovarian Cancer. Cancer Epidemiol Biomarkers Prev 2019; 29:509-519. [PMID: 31871106 DOI: 10.1158/1055-9965.epi-18-1359] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Revised: 04/26/2019] [Accepted: 12/06/2019] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Recent efforts to improve outcomes for high-grade serous ovarian cancer, a leading cause of cancer death in women, have focused on identifying molecular subtypes and prognostic gene signatures, but existing subtypes have poor cross-study robustness. We tested the contribution of cell admixture in published ovarian cancer molecular subtypes and prognostic gene signatures. METHODS Gene signatures of tumor and stroma were developed using paired microdissected tissue from two independent studies. Stromal genes were investigated in two molecular subtype classifications and 61 published gene signatures. Prognostic performance of gene signatures of stromal admixture was evaluated in 2,527 ovarian tumors (16 studies). Computational simulations of increasing stromal cell proportion were performed by mixing gene-expression profiles of paired microdissected ovarian tumor and stroma. RESULTS Recently described ovarian cancer molecular subtypes are strongly associated with the cell admixture. Tumors were classified as different molecular subtypes in simulations where the percentage of stromal cells increased. Stromal gene expression in bulk tumors was associated with overall survival (hazard ratio, 1.17; 95% confidence interval, 1.11-1.23), and in one data set, increased stroma was associated with anatomic sampling location. Five published prognostic gene signatures were no longer prognostic in a multivariate model that adjusted for stromal content. CONCLUSIONS Cell admixture affects the interpretation and reproduction of ovarian cancer molecular subtypes and gene signatures derived from bulk tissue. Elucidating the role of stroma in the tumor microenvironment and in prognosis is important. IMPACT Single-cell analyses may be required to refine the molecular subtypes of high-grade serous ovarian cancer.
Collapse
|
37
|
Fagny M, Platig J, Kuijjer ML, Lin X, Quackenbush J. Nongenic cancer-risk SNPs affect oncogenes, tumour-suppressor genes, and immune function. Br J Cancer 2019; 122:569-577. [PMID: 31806877 PMCID: PMC7028992 DOI: 10.1038/s41416-019-0614-3] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Revised: 09/23/2019] [Accepted: 10/07/2019] [Indexed: 12/31/2022] Open
Abstract
Background Genome-wide association studies (GWASes) have identified many noncoding germline single-nucleotide polymorphisms (SNPs) that are associated with an increased risk of developing cancer. However, how these SNPs affect cancer risk is still largely unknown. Methods We used a systems biology approach to analyse the regulatory role of cancer-risk SNPs in thirteen tissues. By using data from the Genotype-Tissue Expression (GTEx) project, we performed an expression quantitative trait locus (eQTL) analysis. We represented both significant cis- and trans-eQTLs as edges in tissue-specific eQTL bipartite networks. Results Each tissue-specific eQTL network is organised into communities that group sets of SNPs and functionally related genes. When mapping cancer-risk SNPs to these networks, we find that in each tissue, these SNPs are significantly overrepresented in communities enriched for immune response processes, as well as tissue-specific functions. Moreover, cancer-risk SNPs are more likely to be ‘cores’ of their communities, influencing the expression of many genes within the same biological processes. Finally, cancer-risk SNPs preferentially target oncogenes and tumour-suppressor genes, suggesting that they may alter the expression of these key cancer genes. Conclusions This approach provides a new way of understanding genetic effects on cancer risk and provides a biological context for interpreting the results of GWAS cancer studies.
Collapse
|
38
|
St Hilaire MA, Kristal BS, Rahman SA, Sullivan JP, Quackenbush J, Duffy JF, Barger LK, Gooley JJ, Czeisler CA, Lockley SW. Using a Single Daytime Performance Test to Identify Most Individuals at High-Risk for Performance Impairment during Extended Wake. Sci Rep 2019; 9:16681. [PMID: 31723161 PMCID: PMC6853981 DOI: 10.1038/s41598-019-52930-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Accepted: 10/25/2019] [Indexed: 12/20/2022] Open
Abstract
We explored the predictive value of a neurobehavioral performance assessment under rested baseline conditions (evaluated at 8 hours awake following 8 hours of sleep) on neurobehavioral response to moderate sleep loss (evaluated at 20 hours awake two days later) in 151 healthy young participants (18-30 years). We defined each participant's response-to-sleep-loss phenotype based on the number of attentional failures on a 10-min visual psychomotor vigilance task taken at 20 hours awake (resilient: less than 6 attentional failures, n = 26 participants; non-resilient: 6 or more attentional failures, n = 125 participants). We observed that 97% of rested participants with 2 or more attentional failures (n = 73 of 151) and 100% of rested participants with 3 or more attentional failures (n = 57 of 151) were non-resilient after moderate sleep loss. Our approach can accurately identify a significant proportion of individuals who are at high risk for neurobehavioral performance impairment from staying up late with a single neurobehavioral performance assessment conducted during rested conditions. Additional methods are needed to predict the future performance of individuals who are not identified as high risk during baseline.
Collapse
|
39
|
Kuijjer ML, Hsieh PH, Quackenbush J, Glass K. lionessR: single sample network inference in R. BMC Cancer 2019; 19:1003. [PMID: 31653243 PMCID: PMC6815019 DOI: 10.1186/s12885-019-6235-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Accepted: 10/01/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In biomedical research, network inference algorithms are typically used to infer complex association patterns between biological entities, such as between genes or proteins, using data from a population. This resulting aggregate network, in essence, averages over the networks of those individuals in the population. LIONESS (Linear Interpolation to Obtain Network Estimates for Single Samples) is a method that can be used together with a network inference algorithm to extract networks for individual samples in a population. The method's key characteristic is that, by modeling networks for individual samples in a data set, it can capture network heterogeneity in a population. LIONESS was originally made available as a function within the PANDA (Passing Attributes between Networks for Data Assimilation) regulatory network reconstruction framework. However, the LIONESS algorithm is generalizable and can be used to model single sample networks based on a wide range of network inference algorithms. RESULTS In this software article, we describe lionessR, an R implementation of LIONESS that can be applied to any network inference method in R that outputs a complete, weighted adjacency matrix. As an example, we provide a vignette of an application of lionessR to model single sample networks based on correlated gene expression in a bone cancer dataset. We show how the tool can be used to identify differential patterns of correlation between two groups of patients. CONCLUSIONS We developed lionessR, an open source R package to model single sample networks. We show how lionessR can be used to inform us on potential precision medicine applications in cancer. The lionessR package is a user-friendly tool to perform such analyses. The package, which includes a vignette describing the application, is freely available at: https://github.com/kuijjerlab/lionessR and at: http://bioconductor.org/packages/lionessR .
Collapse
|
40
|
Campbell PT, Ambrosone CB, Nishihara R, Aerts HJWL, Bondy M, Chatterjee N, Garcia-Closas M, Giannakis M, Golden JA, Heng YJ, Kip NS, Koshiol J, Liu XS, Lopes-Ramos CM, Mucci LA, Nowak JA, Phipps AI, Quackenbush J, Schoen RE, Sholl LM, Tamimi RM, Wang M, Weijenberg MP, Wu CJ, Wu K, Yao S, Yu KH, Zhang X, Rebbeck TR, Ogino S. Proceedings of the fourth international molecular pathological epidemiology (MPE) meeting. Cancer Causes Control 2019; 30:799-811. [PMID: 31069578 PMCID: PMC6614001 DOI: 10.1007/s10552-019-01177-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Accepted: 04/27/2019] [Indexed: 02/06/2023]
Abstract
An important premise of epidemiology is that individuals with the same disease share similar underlying etiologies and clinical outcomes. In the past few decades, our knowledge of disease pathogenesis has improved, and disease classification systems have evolved to the point where no complex disease processes are considered homogenous. As a result, pathology and epidemiology have been integrated into the single, unified field of molecular pathological epidemiology (MPE). Advancing integrative molecular and population-level health sciences and addressing the unique research challenges specific to the field of MPE necessitates assembling experts in diverse fields, including epidemiology, pathology, biostatistics, computational biology, bioinformatics, genomics, immunology, and nutritional and environmental sciences. Integrating these seemingly divergent fields can lead to a greater understanding of pathogenic processes. The International MPE Meeting Series fosters discussion that addresses the specific research questions and challenges in this emerging field. The purpose of the meeting series is to: discuss novel methods to integrate pathology and epidemiology; discuss studies that provide pathogenic insights into population impact; and educate next-generation scientists. Herein, we share the proceedings of the Fourth International MPE Meeting, held in Boston, MA, USA, on 30 May-1 June, 2018. Major themes of this meeting included 'integrated genetic and molecular pathologic epidemiology', 'immunology-MPE', and 'novel disease phenotyping'. The key priority areas for future research identified by meeting attendees included integration of tumor immunology and cancer disparities into epidemiologic studies, further collaboration between computational and population-level scientists to gain new insight on exposure-disease associations, and future pooling projects of studies with comparable data.
Collapse
|
41
|
Hicks SC, Okrah K, Paulson JN, Quackenbush J, Irizarry RA, Bravo HC. Smooth quantile normalization. Biostatistics 2019; 19:185-198. [PMID: 29036413 DOI: 10.1093/biostatistics/kxx028] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2016] [Accepted: 05/07/2017] [Indexed: 11/14/2022] Open
Abstract
Between-sample normalization is a critical step in genomic data analysis to remove systematic bias and unwanted technical variation in high-throughput data. Global normalization methods are based on the assumption that observed variability in global properties is due to technical reasons and are unrelated to the biology of interest. For example, some methods correct for differences in sequencing read counts by scaling features to have similar median values across samples, but these fail to reduce other forms of unwanted technical variation. Methods such as quantile normalization transform the statistical distributions across samples to be the same and assume global differences in the distribution are induced by only technical variation. However, it remains unclear how to proceed with normalization if these assumptions are violated, for example, if there are global differences in the statistical distributions between biological conditions or groups, and external information, such as negative or control features, is not available. Here, we introduce a generalization of quantile normalization, referred to as smooth quantile normalization (qsmooth), which is based on the assumption that the statistical distribution of each sample should be the same (or have the same distributional shape) within biological groups or conditions, but allowing that they may differ between groups. We illustrate the advantages of our method on several high-throughput datasets with global differences in distributions corresponding to different biological conditions. We also perform a Monte Carlo simulation study to illustrate the bias-variance tradeoff and root mean squared error of qsmooth compared to other global normalization methods. A software implementation is available from https://github.com/stephaniehicks/qsmooth.
Collapse
|
42
|
Kuijjer ML, Tung MG, Yuan G, Quackenbush J, Glass K. Estimating Sample-Specific Regulatory Networks. iScience 2019; 14:226-240. [PMID: 30981959 PMCID: PMC6463816 DOI: 10.1016/j.isci.2019.03.021] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2018] [Revised: 01/30/2019] [Accepted: 03/21/2019] [Indexed: 10/28/2022] Open
Abstract
Biological systems are driven by intricate interactions among molecules. Many methods have been developed that draw on large numbers of expression samples to infer connections between genes (or their products). The result is an aggregate network representing a single estimate for the likelihood of each interaction, or "edge," in the network. Although informative, aggregate models fail to capture population heterogeneity. Here we propose a method to reverse engineer sample-specific networks from aggregate networks. We demonstrate our approach in several contexts, including simulated, yeast microarray, and human lymphoblastoid cell line RNA sequencing data. We use these sample-specific networks to study changes in network topology across time and to characterize shifts in gene regulation that were not apparent in the expression data. We believe that generating sample-specific networks will greatly facilitate the application of network methods to large, complex, and heterogeneous multi-omic datasets, supporting the emerging field of precision network medicine.
Collapse
|
43
|
Lopes-Ramos CM, Kuijjer ML, Ogino S, Fuchs CS, DeMeo DL, Glass K, Quackenbush J. Gene Regulatory Network Analysis Identifies Sex-Linked Differences in Colon Cancer Drug Metabolism. Cancer Res 2018; 78:5538-5547. [PMID: 30275053 PMCID: PMC6169995 DOI: 10.1158/0008-5472.can-18-0454] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2018] [Revised: 06/04/2018] [Accepted: 07/20/2018] [Indexed: 12/12/2022]
Abstract
Understanding sex differences in colon cancer is essential to advance disease prevention, diagnosis, and treatment. Males have a higher risk of developing colon cancer and a lower survival rate than women. However, the molecular features that drive these sex differences are poorly understood. In this study, we use both transcript-based and gene regulatory network methods to analyze RNA-seq data from The Cancer Genome Atlas for 445 patients with colon cancer. We compared gene expression between tumors in men and women and observed significant sex differences in sex chromosome genes only. We then inferred patient-specific gene regulatory networks and found significant regulatory differences between males and females, with drug and xenobiotics metabolism via cytochrome P450 pathways more strongly targeted in females. This finding was validated in a dataset of 1,193 patients from five independent studies. While targeting, the drug metabolism pathway did not change overall survival for males treated with adjuvant chemotherapy, females with greater targeting showed an increase in 10-year overall survival probability, 89% [95% confidence interval (CI), 78-100] survival compared with 61% (95% CI, 45-82) for women with lower targeting, respectively (P = 0.034). Our network analysis uncovers patterns of transcriptional regulation that differentiate male and female colon cancer and identifies differences in regulatory processes involving the drug metabolism pathway associated with survival in women who receive adjuvant chemotherapy. This approach can be used to investigate the molecular features that drive sex differences in other cancers and complex diseases.Significance: A network-based approach reveals that sex-specific patterns of gene targeting by transcriptional regulators are associated with survival outcome in colon cancer. This approach can be used to understand how sex influences progression and response to therapies in other cancers. Cancer Res; 78(19); 5538-47. ©2018 AACR.
Collapse
|
44
|
Parmar C, Barry JD, Hosny A, Quackenbush J, Aerts HJWL. Data Analysis Strategies in Medical Imaging. Clin Cancer Res 2018; 24:3492-3499. [PMID: 29581134 PMCID: PMC6082690 DOI: 10.1158/1078-0432.ccr-18-0385] [Citation(s) in RCA: 93] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Revised: 02/26/2018] [Accepted: 03/22/2018] [Indexed: 12/27/2022]
Abstract
Radiographic imaging continues to be one of the most effective and clinically useful tools within oncology. Sophistication of artificial intelligence has allowed for detailed quantification of radiographic characteristics of tissues using predefined engineered algorithms or deep learning methods. Precedents in radiology as well as a wealth of research studies hint at the clinical relevance of these characteristics. However, critical challenges are associated with the analysis of medical imaging data. Although some of these challenges are specific to the imaging field, many others like reproducibility and batch effects are generic and have already been addressed in other quantitative fields such as genomics. Here, we identify these pitfalls and provide recommendations for analysis strategies of medical imaging data, including data normalization, development of robust models, and rigorous statistical analyses. Adhering to these recommendations will not only improve analysis quality but also enhance precision medicine by allowing better integration of imaging data with other biomedical data sources. Clin Cancer Res; 24(15); 3492-9. ©2018 AACR.
Collapse
|
45
|
Abstract
Artificial intelligence (AI) algorithms, particularly deep learning, have demonstrated remarkable progress in image-recognition tasks. Methods ranging from convolutional neural networks to variational autoencoders have found myriad applications in the medical image analysis field, propelling it forward at a rapid pace. Historically, in radiology practice, trained physicians visually assessed medical images for the detection, characterization and monitoring of diseases. AI methods excel at automatically recognizing complex patterns in imaging data and providing quantitative, rather than qualitative, assessments of radiographic characteristics. In this Opinion article, we establish a general understanding of AI methods, particularly those pertaining to image-based tasks. We explore how these methods could impact multiple facets of radiology, with a general focus on applications in oncology, and demonstrate ways in which these methods are advancing the field. Finally, we discuss the challenges facing clinical implementation and provide our perspective on how the domain could be advanced.
Collapse
|
46
|
Abstract
Artificial intelligence (AI) algorithms, particularly deep learning, have demonstrated remarkable progress in image-recognition tasks. Methods ranging from convolutional neural networks to variational autoencoders have found myriad applications in the medical image analysis field, propelling it forward at a rapid pace. Historically, in radiology practice, trained physicians visually assessed medical images for the detection, characterization and monitoring of diseases. AI methods excel at automatically recognizing complex patterns in imaging data and providing quantitative, rather than qualitative, assessments of radiographic characteristics. In this Opinion article, we establish a general understanding of AI methods, particularly those pertaining to image-based tasks. We explore how these methods could impact multiple facets of radiology, with a general focus on applications in oncology, and demonstrate ways in which these methods are advancing the field. Finally, we discuss the challenges facing clinical implementation and provide our perspective on how the domain could be advanced.
Collapse
|
47
|
Barry JD, Fagny M, Paulson JN, Aerts HJWL, Platig J, Quackenbush J. Histopathological Image QTL Discovery of Immune Infiltration Variants. iScience 2018; 5:80-89. [PMID: 30240647 PMCID: PMC6123851 DOI: 10.1016/j.isci.2018.07.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2018] [Revised: 05/30/2018] [Accepted: 07/03/2018] [Indexed: 12/20/2022] Open
Abstract
Genotype-to-phenotype association studies typically use macroscopic physiological measurements or molecular readouts as quantitative traits. There are comparatively few suitable quantitative traits available between cell and tissue length scales, a limitation that hinders our ability to identify variants affecting phenotype at many clinically informative levels. Here we show that quantitative image features, automatically extracted from histopathological imaging data, can be used for image quantitative trait loci (iQTLs) mapping and variant discovery. Using thyroid pathology images, clinical metadata, and genomics data from the Genotype-Tissue Expression (GTEx) project, we establish and validate a quantitative imaging biomarker for immune cell infiltration. A total of 100,215 variants were selected for iQTL profiling and tested for genotype-phenotype associations with our quantitative imaging biomarker. Significant associations were found in HDAC9 and TXNDC5. We validated the TXNDC5 association using GTEx cis-expression QTL data and an independent hypothyroidism dataset from the Electronic Medical Records and Genomics network.
Collapse
|
48
|
Morrow JD, Glass K, Cho MH, Hersh CP, Pinto-Plata V, Celli B, Marchetti N, Criner G, Bueno R, Washko G, Choi AMK, Quackenbush J, Silverman EK, DeMeo DL. Human Lung DNA Methylation Quantitative Trait Loci Colocalize with Chronic Obstructive Pulmonary Disease Genome-Wide Association Loci. Am J Respir Crit Care Med 2018; 197:1275-1284. [PMID: 29313708 PMCID: PMC5955059 DOI: 10.1164/rccm.201707-1434oc] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Accepted: 01/03/2018] [Indexed: 12/23/2022] Open
Abstract
RATIONALE As the third leading cause of death in the United States, the impact of chronic obstructive pulmonary disease (COPD) makes identification of its molecular mechanisms of great importance. Genome-wide association studies (GWASs) have identified multiple genomic regions associated with COPD. However, genetic variation only explains a small fraction of the susceptibility to COPD, and sub-genome-wide significant loci may play a role in pathogenesis. OBJECTIVES Regulatory annotation with epigenetic evidence may give priority for further investigation, particularly for GWAS associations in noncoding regions. We performed integrative genomics analyses using DNA methylation profiling and genome-wide SNP genotyping from lung tissue samples from 90 subjects with COPD and 36 control subjects. METHODS We performed methylation quantitative trait loci (mQTL) analyses, testing for SNPs associated with percent DNA methylation and assessed the colocalization of these results with previous COPD GWAS findings using Bayesian methods in the R package coloc to highlight potential regulatory features of the loci. MEASUREMENTS AND MAIN RESULTS We identified 942,068 unique SNPs and 33,996 unique CpG sites among the significant (5% false discovery rate) cis-mQTL results. The genome-wide significant and subthreshold (P < 10-4) GWAS SNPs were enriched in the significant mQTL SNPs (hypergeometric test P < 0.00001). We observed enrichment for sites located in CpG shores and shelves, but not CpG islands. Using Bayesian colocalization, we identified loci in regions near KCNK3, EEFSEC, PIK3CD, DCDC2C, TCERG1L, FRMD4B, and IL27. CONCLUSIONS Colocalization of mQTL and GWAS loci provides regulatory characterization of significant and subthreshold GWAS findings, supporting a role for genetic control of methylation in COPD pathogenesis.
Collapse
|
49
|
Quackenbush J. Section 7: Bioinformatics: Computational Approaches to Analysis of DNA Microarray Data. Yearb Med Inform 2018. [DOI: 10.1055/s-0038-1638484] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022] Open
Abstract
SummaryTo review the current state of the art in computational methods for the analysis of DNA microarray data.The review considers methods of microarray data collection, transformation and representation, comparisons and predictions of gene expression from the data, their mechanistic analysis, related systems biology, and the application of clustering techniques.Functional genomics approaches have greatly increased the rate at which data on biological systems is generated, leading to corresponding challenges in analyzing the data through advanced computational techniques . The paper compares and contrasts the application of computational clustering for discovery, comparison, and prediction of gene expression classes, together with their evaluation and relation to mechanistic analyses of biological systems.Methods for assaying gene expression levels by DNA microarray experiments produce considerably more data than other techniques, and require a wide variety of computational techniques for identifying patterns of expression that may be biologically significant. These will have to be verified and validated by comparison to results from other methods, integrated with other systems data, and provide the feedback for further experimentation for testing mechanistic or other biological hypotheses.
Collapse
|
50
|
Domenyuk V, Gatalica Z, Santhanam R, Wei X, Stark A, Kennedy P, Toussaint B, Levenberg S, Wang R, Xiao N, Greil R, Rinnerthaler G, Gampenrieder S, Heimberger AB, Berry DJ, Barker A, Demetri GD, Quackenbush J, Marshall JL, Poste G, Vacirca JL, Vidal GA, Schwartzberg LS, Halbert DD, Voss A, Miglarese MR, Famulok M, Mayer G, Spetzler D. Abstract P2-09-09: Polyligand profiling differentiates cancer patients according to their benefit of treatment. Cancer Res 2018. [DOI: 10.1158/1538-7445.sabcs17-p2-09-09] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
Introduction: Deconvolution of multi-nodal perturbations in cancer network architecture demands highly multiplexed profiling assays. We demonstrate the value of polyligand profiling of tumor systems states using libraries of single stranded oligodeoxynucleotides (ssODN) to distinguish between tumor tissue from breast cancer patients who did or did not derive benefit from treatment regimens containing trastuzumab.
Methods: This study included cases from women with invasive breast cancer who received chemotherapy+ trastuzumab (C+T) or trastuzumab monotherapy with available retrospective data on the time to next treatment (TTNT). A library of 2x1012 unique ssODN was exposed to FFPE tissues from patients who benefited (B) or not (NB) from trastuzumab-based regimens in several rounds of positive and negative selection. Two enriched libraries were screened on independent set of 42 B and 19 NB cases using a modified IHC protocol for detection of bound ssODNs. Poly-Ligand Profiles (PLP) were scored by a blinded pathologist. Two libraries, EL-NB and EL-B, showed significant p-values between groups of responders and non-responders. A Cox-PH model was fitted using either tumors' HER2 status or PLP test results as the independent variable. Median survival time was calculated from the Kaplan-Meier estimate. A separate group of 63 cases with TTNT data from chemotherapy without trastuzumab was used as a control to distinguish prognostic from predictive performance.
Results: The PLP scores of EL-NB and EL-B were assessed by receiver operating characteristic (ROC) curves and resulted in a combined AUC value of 0.81. EL-NB and EL-B were able to effectively classify B and NB patients with either HER2-negative/equivocal (AUC = 0.73) or HER2-positive cancers (AUC = 0.84). In contrast, HER2 status alone yielded an AUC value of 0.47. The combined PLP scores for the independent set of 63 patients treated with C excluding trastuzumab resulted in an AUC value of 0.53, indicating that the assay was predictive and not simply prognostic. Kaplan-Meier curves analysis shows that PLP+ cases have 429 days median TTNT, while PLP- cases have 129 days (HR = 0.38, log-rank p = 0.001). Analysis based on HER2 status showed no significant difference in TTNT between patients that were HER2+ (280 days) or HER2-negative/equivocal (336 days, HR = 1.27, log-rank p =0.45).
Summary: Performance of the PLP assay in differentiating patients who did or did not benefit from trastuzumab therapy outperforms the standard IHC assay for HER2 status. These results represent a promising step towards the development of a CDx to identify the 50-70% of HER2+ patients who will not benefit from trastuzumab. In addition, PLP also has the potential to identify the HER2-negative/equivocal patients who may benefit from trastuzumab-containing regimens.
Citation Format: Domenyuk V, Gatalica Z, Santhanam R, Wei X, Stark A, Kennedy P, Toussaint B, Levenberg S, Wang R, Xiao N, Greil R, Rinnerthaler G, Gampenrieder S, Heimberger AB, Berry DJ, Barker A, Demetri GD, Quackenbush J, Marshall JL, Poste G, Vacirca JL, Vidal GA, Schwartzberg LS, Halbert DD, Voss A, Miglarese MR, Famulok M, Mayer G, Spetzler D. Polyligand profiling differentiates cancer patients according to their benefit of treatment [abstract]. In: Proceedings of the 2017 San Antonio Breast Cancer Symposium; 2017 Dec 5-9; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2018;78(4 Suppl):Abstract nr P2-09-09.
Collapse
|